[datatable-help] Curiosity with use of .SDcols
djmuseR
djmuser at gmail.com
Fri Sep 23 17:01:14 CEST 2011
Hi:
I'm playing around with some baseball data and ran into an error whose cause
I don't quite understand.
A subset of the data is here, consisting of all season batting records of
five players:
tst <-
structure(list(playerID = c("aardsda01", "aardsda01", "aardsda01",
"aardsda01", "aardsda01", "aardsda01", "aaronha01", "aaronha01",
"aaronha01", "aaronha01", "aaronha01", "aaronha01", "aaronha01",
"aaronha01", "aaronha01", "aaronha01", "aaronha01", "aaronha01",
"aaronha01", "aaronha01", "aaronha01", "aaronha01", "aaronha01",
"aaronha01", "aaronha01", "aaronha01", "aaronha01", "aaronha01",
"aaronha01", "aaronto01", "aaronto01", "aaronto01", "aaronto01",
"aaronto01", "aaronto01", "aaronto01", "aasedo01", "aasedo01",
"aasedo01", "aasedo01", "aasedo01", "aasedo01", "aasedo01", "aasedo01",
"aasedo01", "aasedo01", "aasedo01", "aasedo01", "aasedo01", "abadan01",
"abadan01", "abadan01"), yearID = c(2004L, 2006L, 2007L, 2008L,
2009L, 2010L, 1954L, 1955L, 1956L, 1957L, 1958L, 1959L, 1960L,
1961L, 1962L, 1963L, 1964L, 1965L, 1966L, 1967L, 1968L, 1969L,
1970L, 1971L, 1972L, 1973L, 1974L, 1975L, 1976L, 1962L, 1963L,
1965L, 1968L, 1969L, 1970L, 1971L, 1977L, 1978L, 1979L, 1980L,
1981L, 1982L, 1984L, 1985L, 1986L, 1987L, 1988L, 1989L, 1990L,
2001L, 2003L, 2006L), stint = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), teamID = c("SFN",
"CHN", "CHA", "BOS", "SEA", "SEA", "ML1", "ML1", "ML1", "ML1",
"ML1", "ML1", "ML1", "ML1", "ML1", "ML1", "ML1", "ML1", "ATL",
"ATL", "ATL", "ATL", "ATL", "ATL", "ATL", "ATL", "ATL", "ML4",
"ML4", "ML1", "ML1", "ML1", "ATL", "ATL", "ATL", "ATL", "BOS",
"CAL", "CAL", "CAL", "CAL", "CAL", "CAL", "BAL", "BAL", "BAL",
"BAL", "NYN", "LAN", "OAK", "BOS", "CIN"), lgID = c("NL", "NL",
"AL", "AL", "AL", "AL", "NL", "NL", "NL", "NL", "NL", "NL", "NL",
"NL", "NL", "NL", "NL", "NL", "NL", "NL", "NL", "NL", "NL", "NL",
"NL", "NL", "NL", "AL", "AL", "NL", "NL", "NL", "NL", "NL", "NL",
"NL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL",
"AL", "NL", "NL", "AL", "AL", "NL"), G = c(11L, 45L, 25L, 47L,
73L, 53L, 122L, 153L, 153L, 151L, 153L, 154L, 153L, 155L, 156L,
161L, 145L, 150L, 158L, 155L, 160L, 147L, 150L, 139L, 129L, 120L,
112L, 137L, 85L, 141L, 72L, 8L, 98L, 49L, 44L, 25L, 13L, 29L,
37L, 40L, 39L, 24L, 23L, 54L, 66L, 7L, 35L, 49L, 32L, 1L, 9L,
5L), G_batting = c(11L, 43L, 2L, 5L, 3L, 4L, 122L, 153L, 153L,
151L, 153L, 154L, 153L, 155L, 156L, 161L, 145L, 150L, 158L, 155L,
160L, 147L, 150L, 139L, 129L, 120L, 112L, 137L, 85L, 141L, 72L,
8L, 98L, 49L, 44L, 25L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 49L, 32L, 1L, 9L, 5L), AB = c(0L, 2L, 0L, 1L, 0L, 0L, 468L,
602L, 609L, 615L, 601L, 629L, 590L, 603L, 592L, 631L, 570L, 570L,
603L, 600L, 606L, 547L, 516L, 495L, 449L, 392L, 340L, 465L, 271L,
334L, 135L, 16L, 283L, 60L, 63L, 53L, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, 5L, 0L, 1L, 17L, 3L), R = c(0L, 0L, 0L, 0L,
0L, 0L, 58L, 105L, 106L, 118L, 109L, 116L, 102L, 115L, 127L,
121L, 103L, 109L, 117L, 113L, 84L, 100L, 103L, 95L, 75L, 84L,
47L, 45L, 22L, 54L, 6L, 1L, 21L, 13L, 3L, 4L, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, 0L, 0L, 0L, 1L, 0L), H = c(0L, 0L,
0L, 0L, 0L, 0L, 131L, 189L, 200L, 198L, 196L, 223L, 172L, 197L,
191L, 201L, 187L, 181L, 168L, 184L, 174L, 164L, 154L, 162L, 119L,
118L, 91L, 109L, 62L, 77L, 27L, 3L, 69L, 15L, 13L, 12L, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, 0L, 0L, 0L, 2L, 0L), X2B = c(0L,
0L, 0L, 0L, 0L, 0L, 27L, 37L, 34L, 27L, 34L, 46L, 20L, 39L, 28L,
29L, 30L, 40L, 23L, 37L, 33L, 30L, 26L, 22L, 10L, 12L, 16L, 16L,
8L, 20L, 6L, 0L, 10L, 2L, 2L, 2L, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, 0L, 0L, 0L, 0L, 0L), X3B = c(0L, 0L, 0L, 0L,
0L, 0L, 6L, 9L, 14L, 6L, 4L, 7L, 11L, 10L, 6L, 4L, 2L, 1L, 1L,
3L, 4L, 3L, 1L, 3L, 0L, 1L, 0L, 2L, 0L, 2L, 1L, 0L, 3L, 0L, 0L,
0L, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0L, 0L, 0L, 0L,
0L), HR = c(0L, 0L, 0L, 0L, 0L, 0L, 13L, 27L, 26L, 44L, 30L,
39L, 40L, 34L, 45L, 44L, 24L, 32L, 44L, 39L, 29L, 44L, 38L, 47L,
34L, 40L, 20L, 12L, 10L, 8L, 1L, 0L, 1L, 1L, 2L, 0L, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, 0L, 0L, 0L, 0L, 0L), RBI = c(0L,
0L, 0L, 0L, 0L, 0L, 69L, 106L, 92L, 132L, 95L, 123L, 126L, 120L,
128L, 130L, 95L, 89L, 127L, 109L, 86L, 97L, 118L, 118L, 77L,
96L, 69L, 60L, 35L, 38L, 15L, 1L, 25L, 5L, 7L, 3L, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, 0L, 0L, 0L, 0L, 0L), SB = c(0L,
0L, 0L, 0L, 0L, 0L, 2L, 3L, 2L, 1L, 4L, 8L, 16L, 21L, 15L, 31L,
22L, 24L, 21L, 17L, 28L, 9L, 9L, 1L, 4L, 1L, 1L, 0L, 0L, 6L,
0L, 0L, 3L, 0L, 0L, 0L, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, 0L, 0L, 0L, 0L, 0L), CS = c(0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L,
4L, 1L, 1L, 0L, 7L, 9L, 7L, 5L, 4L, 4L, 3L, 6L, 5L, 10L, 0L,
1L, 0L, 1L, 0L, 1L, 1L, 0L, 3L, 0L, 4L, 1L, 0L, 0L, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, 0L, 0L, 0L, 1L, 0L), BB = c(0L,
0L, 0L, 0L, 0L, 0L, 28L, 49L, 37L, 57L, 59L, 51L, 60L, 56L, 66L,
78L, 62L, 60L, 76L, 63L, 64L, 87L, 74L, 71L, 92L, 68L, 39L, 70L,
35L, 41L, 11L, 1L, 21L, 6L, 3L, 3L, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, 0L, 0L, 0L, 2L, 2L), SO = c(0L, 0L, 0L, 1L, 0L,
0L, 39L, 61L, 54L, 58L, 49L, 54L, 63L, 64L, 73L, 94L, 46L, 81L,
96L, 97L, 62L, 47L, 63L, 58L, 55L, 51L, 29L, 51L, 38L, 58L, 27L,
2L, 37L, 6L, 10L, 5L, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, 3L, 0L, 0L, 5L, 0L), IBB = c(0L, 0L, 0L, 0L, 0L, 0L, NA,
5L, 6L, 15L, 16L, 17L, 13L, 20L, 14L, 18L, 9L, 10L, 15L, 19L,
23L, 19L, 15L, 21L, 15L, 13L, 6L, 3L, 1L, 0L, 1L, 0L, 1L, 0L,
0L, 1L, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0L, 0L, 0L,
0L, 0L), HBP = c(0L, 0L, 0L, 0L, 0L, 0L, 3L, 3L, 2L, 0L, 1L,
4L, 2L, 2L, 3L, 0L, 0L, 1L, 1L, 0L, 1L, 2L, 2L, 2L, 1L, 1L, 0L,
1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, 0L, 0L, 0L, 0L, 0L), SH = c(0L, 1L, 0L, 0L, 0L,
0L, 6L, 7L, 5L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 4L, 3L, 0L, 2L, 0L, 0L, 0L, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0L, 0L, 0L, 0L, 0L),
SF = c(0L, 0L, 0L, 0L, 0L, 0L, 4L, 4L, 7L, 3L, 3L, 9L, 12L,
9L, 6L, 5L, 2L, 8L, 8L, 6L, 5L, 3L, 6L, 5L, 2L, 4L, 2L, 6L,
2L, 3L, 2L, 0L, 1L, 0L, 0L, 0L, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, 0L, 0L, 0L, 0L, 0L), GIDP = c(0L, 0L, 0L,
0L, 0L, 0L, 13L, 20L, 21L, 13L, 21L, 19L, 8L, 16L, 14L, 11L,
22L, 15L, 14L, 11L, 21L, 14L, 13L, 9L, 17L, 7L, 6L, 15L,
8L, 10L, 7L, 1L, 9L, 1L, 5L, 3L, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, 0L, 0L, 0L, 1L, 0L), G_old = c(11L, 45L,
2L, 5L, NA, NA, 122L, 153L, 153L, 151L, 153L, 154L, 153L,
155L, 156L, 161L, 145L, 150L, 158L, 155L, 160L, 147L, 150L,
139L, 129L, 120L, 112L, 137L, 85L, 141L, 72L, 8L, 98L, 49L,
44L, 25L, 13L, 29L, 37L, 40L, 39L, 24L, 23L, 54L, 66L, 7L,
35L, 49L, 32L, 1L, 9L, 5L)), .Names = c("playerID", "yearID",
"stint", "teamID", "lgID", "G", "G_batting", "AB", "R", "H",
"X2B", "X3B", "HR", "RBI", "SB", "CS", "BB", "SO", "IBB", "HBP",
"SH", "SF", "GIDP", "G_old"), row.names = c(NA, 52L), class = "data.frame")
# Variables I want to sum over each player:
vars <- c('G', 'AB', 'R', 'H', 'X2B', 'X3B',
'HR', 'RBI', 'SB', 'CS', 'BB', 'SO', 'IBB', 'HBP',
'SH', 'SF', 'GIDP', 'G_old')
# library('data.table')
DTtst <- data.table(tst, key = 'playerID')
The following works as I want:
DT1 <- DTtst[, list(beginYear = min(yearID), endYear = max(yearID),
nyears = sum(stint == 1L), nteams = length(unique(teamID))),
by = 'playerID']
DT2 <- DTtst[, lapply(.SD, sum), by = playerID, .SDcols = vars]
DT1[DT2]
# Combining the two into one call doesn't:
DTtst[, list( beginYear = min(yearID),
endYear = max(yearID),
nyears = sum(stint == 1L),
nteams = length(unique(teamsID)),
lapply(.SD, sum)),
by = playerID,
.SDcols = vars]
# Error in eval(expr, envir, enclos) : object 'yearID' not found
What am I missing? Is it the lapply() call within list()?
Second question, more out of curiosity than anything else: is there an
analogue in data.table to within() or plyr::mutate, where one can define new
variables within a call and use them to create other variables? An example
of what I have in mind is
DT[, list(..., PA = AB + BB + HBP + SH + SF,
OBP = ifelse(PA > 0,
round((H + BB + HBP)/(PA - SH - SF), 3),
NA)),
by = playerID]
I have a fairly strong prior on the answer to this question, but I'll let
others weigh in first.
TIA for any help,
Dennis
--
View this message in context: http://r.789695.n4.nabble.com/Curiosity-with-use-of-SDcols-tp3837112p3837112.html
Sent from the datatable-help mailing list archive at Nabble.com.
More information about the datatable-help
mailing list