[datatable-help] Curiosity with use of .SDcols

djmuseR djmuser at gmail.com
Fri Sep 23 17:01:14 CEST 2011


Hi:

I'm playing around with some baseball data and ran into an error whose cause
I don't quite understand.
A subset of the data is here, consisting of all season batting records of
five players:

tst <- 
structure(list(playerID = c("aardsda01", "aardsda01", "aardsda01", 
"aardsda01", "aardsda01", "aardsda01", "aaronha01", "aaronha01", 
"aaronha01", "aaronha01", "aaronha01", "aaronha01", "aaronha01", 
"aaronha01", "aaronha01", "aaronha01", "aaronha01", "aaronha01", 
"aaronha01", "aaronha01", "aaronha01", "aaronha01", "aaronha01", 
"aaronha01", "aaronha01", "aaronha01", "aaronha01", "aaronha01", 
"aaronha01", "aaronto01", "aaronto01", "aaronto01", "aaronto01", 
"aaronto01", "aaronto01", "aaronto01", "aasedo01", "aasedo01", 
"aasedo01", "aasedo01", "aasedo01", "aasedo01", "aasedo01", "aasedo01", 
"aasedo01", "aasedo01", "aasedo01", "aasedo01", "aasedo01", "abadan01", 
"abadan01", "abadan01"), yearID = c(2004L, 2006L, 2007L, 2008L, 
2009L, 2010L, 1954L, 1955L, 1956L, 1957L, 1958L, 1959L, 1960L, 
1961L, 1962L, 1963L, 1964L, 1965L, 1966L, 1967L, 1968L, 1969L, 
1970L, 1971L, 1972L, 1973L, 1974L, 1975L, 1976L, 1962L, 1963L, 
1965L, 1968L, 1969L, 1970L, 1971L, 1977L, 1978L, 1979L, 1980L, 
1981L, 1982L, 1984L, 1985L, 1986L, 1987L, 1988L, 1989L, 1990L, 
2001L, 2003L, 2006L), stint = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), teamID = c("SFN", 
"CHN", "CHA", "BOS", "SEA", "SEA", "ML1", "ML1", "ML1", "ML1", 
"ML1", "ML1", "ML1", "ML1", "ML1", "ML1", "ML1", "ML1", "ATL", 
"ATL", "ATL", "ATL", "ATL", "ATL", "ATL", "ATL", "ATL", "ML4", 
"ML4", "ML1", "ML1", "ML1", "ATL", "ATL", "ATL", "ATL", "BOS", 
"CAL", "CAL", "CAL", "CAL", "CAL", "CAL", "BAL", "BAL", "BAL", 
"BAL", "NYN", "LAN", "OAK", "BOS", "CIN"), lgID = c("NL", "NL", 
"AL", "AL", "AL", "AL", "NL", "NL", "NL", "NL", "NL", "NL", "NL", 
"NL", "NL", "NL", "NL", "NL", "NL", "NL", "NL", "NL", "NL", "NL", 
"NL", "NL", "NL", "AL", "AL", "NL", "NL", "NL", "NL", "NL", "NL", 
"NL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", 
"AL", "NL", "NL", "AL", "AL", "NL"), G = c(11L, 45L, 25L, 47L, 
73L, 53L, 122L, 153L, 153L, 151L, 153L, 154L, 153L, 155L, 156L, 
161L, 145L, 150L, 158L, 155L, 160L, 147L, 150L, 139L, 129L, 120L, 
112L, 137L, 85L, 141L, 72L, 8L, 98L, 49L, 44L, 25L, 13L, 29L, 
37L, 40L, 39L, 24L, 23L, 54L, 66L, 7L, 35L, 49L, 32L, 1L, 9L, 
5L), G_batting = c(11L, 43L, 2L, 5L, 3L, 4L, 122L, 153L, 153L, 
151L, 153L, 154L, 153L, 155L, 156L, 161L, 145L, 150L, 158L, 155L, 
160L, 147L, 150L, 139L, 129L, 120L, 112L, 137L, 85L, 141L, 72L, 
8L, 98L, 49L, 44L, 25L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 49L, 32L, 1L, 9L, 5L), AB = c(0L, 2L, 0L, 1L, 0L, 0L, 468L, 
602L, 609L, 615L, 601L, 629L, 590L, 603L, 592L, 631L, 570L, 570L, 
603L, 600L, 606L, 547L, 516L, 495L, 449L, 392L, 340L, 465L, 271L, 
334L, 135L, 16L, 283L, 60L, 63L, 53L, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, 5L, 0L, 1L, 17L, 3L), R = c(0L, 0L, 0L, 0L, 
0L, 0L, 58L, 105L, 106L, 118L, 109L, 116L, 102L, 115L, 127L, 
121L, 103L, 109L, 117L, 113L, 84L, 100L, 103L, 95L, 75L, 84L, 
47L, 45L, 22L, 54L, 6L, 1L, 21L, 13L, 3L, 4L, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, 0L, 0L, 0L, 1L, 0L), H = c(0L, 0L, 
0L, 0L, 0L, 0L, 131L, 189L, 200L, 198L, 196L, 223L, 172L, 197L, 
191L, 201L, 187L, 181L, 168L, 184L, 174L, 164L, 154L, 162L, 119L, 
118L, 91L, 109L, 62L, 77L, 27L, 3L, 69L, 15L, 13L, 12L, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, 0L, 0L, 0L, 2L, 0L), X2B = c(0L, 
0L, 0L, 0L, 0L, 0L, 27L, 37L, 34L, 27L, 34L, 46L, 20L, 39L, 28L, 
29L, 30L, 40L, 23L, 37L, 33L, 30L, 26L, 22L, 10L, 12L, 16L, 16L, 
8L, 20L, 6L, 0L, 10L, 2L, 2L, 2L, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, 0L, 0L, 0L, 0L, 0L), X3B = c(0L, 0L, 0L, 0L, 
0L, 0L, 6L, 9L, 14L, 6L, 4L, 7L, 11L, 10L, 6L, 4L, 2L, 1L, 1L, 
3L, 4L, 3L, 1L, 3L, 0L, 1L, 0L, 2L, 0L, 2L, 1L, 0L, 3L, 0L, 0L, 
0L, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0L, 0L, 0L, 0L, 
0L), HR = c(0L, 0L, 0L, 0L, 0L, 0L, 13L, 27L, 26L, 44L, 30L, 
39L, 40L, 34L, 45L, 44L, 24L, 32L, 44L, 39L, 29L, 44L, 38L, 47L, 
34L, 40L, 20L, 12L, 10L, 8L, 1L, 0L, 1L, 1L, 2L, 0L, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, 0L, 0L, 0L, 0L, 0L), RBI = c(0L, 
0L, 0L, 0L, 0L, 0L, 69L, 106L, 92L, 132L, 95L, 123L, 126L, 120L, 
128L, 130L, 95L, 89L, 127L, 109L, 86L, 97L, 118L, 118L, 77L, 
96L, 69L, 60L, 35L, 38L, 15L, 1L, 25L, 5L, 7L, 3L, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, 0L, 0L, 0L, 0L, 0L), SB = c(0L, 
0L, 0L, 0L, 0L, 0L, 2L, 3L, 2L, 1L, 4L, 8L, 16L, 21L, 15L, 31L, 
22L, 24L, 21L, 17L, 28L, 9L, 9L, 1L, 4L, 1L, 1L, 0L, 0L, 6L, 
0L, 0L, 3L, 0L, 0L, 0L, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, 0L, 0L, 0L, 0L, 0L), CS = c(0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 
4L, 1L, 1L, 0L, 7L, 9L, 7L, 5L, 4L, 4L, 3L, 6L, 5L, 10L, 0L, 
1L, 0L, 1L, 0L, 1L, 1L, 0L, 3L, 0L, 4L, 1L, 0L, 0L, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, 0L, 0L, 0L, 1L, 0L), BB = c(0L, 
0L, 0L, 0L, 0L, 0L, 28L, 49L, 37L, 57L, 59L, 51L, 60L, 56L, 66L, 
78L, 62L, 60L, 76L, 63L, 64L, 87L, 74L, 71L, 92L, 68L, 39L, 70L, 
35L, 41L, 11L, 1L, 21L, 6L, 3L, 3L, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, 0L, 0L, 0L, 2L, 2L), SO = c(0L, 0L, 0L, 1L, 0L, 
0L, 39L, 61L, 54L, 58L, 49L, 54L, 63L, 64L, 73L, 94L, 46L, 81L, 
96L, 97L, 62L, 47L, 63L, 58L, 55L, 51L, 29L, 51L, 38L, 58L, 27L, 
2L, 37L, 6L, 10L, 5L, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, 3L, 0L, 0L, 5L, 0L), IBB = c(0L, 0L, 0L, 0L, 0L, 0L, NA, 
5L, 6L, 15L, 16L, 17L, 13L, 20L, 14L, 18L, 9L, 10L, 15L, 19L, 
23L, 19L, 15L, 21L, 15L, 13L, 6L, 3L, 1L, 0L, 1L, 0L, 1L, 0L, 
0L, 1L, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0L, 0L, 0L, 
0L, 0L), HBP = c(0L, 0L, 0L, 0L, 0L, 0L, 3L, 3L, 2L, 0L, 1L, 
4L, 2L, 2L, 3L, 0L, 0L, 1L, 1L, 0L, 1L, 2L, 2L, 2L, 1L, 1L, 0L, 
1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, 0L, 0L, 0L, 0L, 0L), SH = c(0L, 1L, 0L, 0L, 0L, 
0L, 6L, 7L, 5L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 4L, 3L, 0L, 2L, 0L, 0L, 0L, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0L, 0L, 0L, 0L, 0L), 
    SF = c(0L, 0L, 0L, 0L, 0L, 0L, 4L, 4L, 7L, 3L, 3L, 9L, 12L, 
    9L, 6L, 5L, 2L, 8L, 8L, 6L, 5L, 3L, 6L, 5L, 2L, 4L, 2L, 6L, 
    2L, 3L, 2L, 0L, 1L, 0L, 0L, 0L, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, 0L, 0L, 0L, 0L, 0L), GIDP = c(0L, 0L, 0L, 
    0L, 0L, 0L, 13L, 20L, 21L, 13L, 21L, 19L, 8L, 16L, 14L, 11L, 
    22L, 15L, 14L, 11L, 21L, 14L, 13L, 9L, 17L, 7L, 6L, 15L, 
    8L, 10L, 7L, 1L, 9L, 1L, 5L, 3L, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, 0L, 0L, 0L, 1L, 0L), G_old = c(11L, 45L, 
    2L, 5L, NA, NA, 122L, 153L, 153L, 151L, 153L, 154L, 153L, 
    155L, 156L, 161L, 145L, 150L, 158L, 155L, 160L, 147L, 150L, 
    139L, 129L, 120L, 112L, 137L, 85L, 141L, 72L, 8L, 98L, 49L, 
    44L, 25L, 13L, 29L, 37L, 40L, 39L, 24L, 23L, 54L, 66L, 7L, 
    35L, 49L, 32L, 1L, 9L, 5L)), .Names = c("playerID", "yearID", 
"stint", "teamID", "lgID", "G", "G_batting", "AB", "R", "H", 
"X2B", "X3B", "HR", "RBI", "SB", "CS", "BB", "SO", "IBB", "HBP", 
"SH", "SF", "GIDP", "G_old"), row.names = c(NA, 52L), class = "data.frame")

# Variables I want to sum over each player:
vars <- c('G', 'AB', 'R', 'H', 'X2B', 'X3B',
          'HR', 'RBI', 'SB', 'CS', 'BB', 'SO', 'IBB', 'HBP',
          'SH', 'SF', 'GIDP', 'G_old')

# library('data.table')
DTtst <- data.table(tst, key = 'playerID')

The following works as I want:
DT1 <- DTtst[, list(beginYear = min(yearID), endYear = max(yearID),
              nyears = sum(stint == 1L), nteams = length(unique(teamID))),
         by = 'playerID']
DT2 <- DTtst[, lapply(.SD, sum), by = playerID, .SDcols = vars]
DT1[DT2]

# Combining the two into one call doesn't:

DTtst[, list( beginYear = min(yearID),
                                    endYear = max(yearID),
                                    nyears = sum(stint == 1L),
                                    nteams = length(unique(teamsID)),
                                    lapply(.SD, sum)),
                               by = playerID,
                               .SDcols = vars]
# Error in eval(expr, envir, enclos) : object 'yearID' not found

What am I missing? Is it the lapply() call within list()? 

Second question, more out of curiosity than anything else: is there an
analogue in data.table to within() or plyr::mutate, where one can define new
variables within a call and use them to create other variables? An example
of what I have in mind is

DT[, list(..., PA = AB + BB + HBP + SH + SF,
                  OBP = ifelse(PA > 0,
                                round((H + BB + HBP)/(PA - SH - SF), 3),
NA)),
       by = playerID]

I have a fairly strong prior on the answer to this question, but I'll let
others weigh in first.

TIA for any help,
Dennis



--
View this message in context: http://r.789695.n4.nabble.com/Curiosity-with-use-of-SDcols-tp3837112p3837112.html
Sent from the datatable-help mailing list archive at Nabble.com.


More information about the datatable-help mailing list