[datatable-help] Curiosity with use of .SDcols
Dennis Murphy
djmuser at gmail.com
Sun Sep 25 04:32:32 CEST 2011
Thanks, Steve.
The clarification about .SDcols was sufficient. In that context,
everything else was clear.
Best regards,
Dennis
On Fri, Sep 23, 2011 at 8:59 AM, Steve Lianoglou
<mailinglist.honeypot at gmail.com> wrote:
> Hi,
>
> Comments in line
>
> On Fri, Sep 23, 2011 at 11:01 AM, djmuseR <djmuser at gmail.com> wrote:
>> Hi:
>>
>> I'm playing around with some baseball data and ran into an error whose cause
>> I don't quite understand.
>> A subset of the data is here, consisting of all season batting records of
>> five players:
>
> [cut out data]
>
>> # Variables I want to sum over each player:
>> vars <- c('G', 'AB', 'R', 'H', 'X2B', 'X3B',
>> 'HR', 'RBI', 'SB', 'CS', 'BB', 'SO', 'IBB', 'HBP',
>> 'SH', 'SF', 'GIDP', 'G_old')
>>
>> # library('data.table')
>> DTtst <- data.table(tst, key = 'playerID')
>>
>> The following works as I want:
>> DT1 <- DTtst[, list(beginYear = min(yearID), endYear = max(yearID),
>> nyears = sum(stint == 1L), nteams = length(unique(teamID))),
>> by = 'playerID']
>> DT2 <- DTtst[, lapply(.SD, sum), by = playerID, .SDcols = vars]
>> DT1[DT2]
>>
>> # Combining the two into one call doesn't:
>>
>> DTtst[, list( beginYear = min(yearID),
>> endYear = max(yearID),
>> nyears = sum(stint == 1L),
>> nteams = length(unique(teamsID)),
>> lapply(.SD, sum)),
>> by = playerID,
>> .SDcols = vars]
>> # Error in eval(expr, envir, enclos) : object 'yearID' not found
>>
>> What am I missing? Is it the lapply() call within list()?
>
> Using .SDcols restricts the columns/vars that are injected in the
> scope of your j-statement (where your `list(...)` is) which are the
> same as the columns of .SD.
>
> yearID isn' in `vars`, and therefore isn't in .SD. To convince
> yourself, consider this:
>
> R> DTtst[, {
> xx <- .SD
> browser()
> }, by='playerID', .SDcols=vars]
>
> Called from: eval(expr, envir, enclos)
> Browse[1]> xx
> G AB R H X2B X3B HR RBI SB CS BB SO IBB HBP SH SF GIDP G_old
> [1,] 11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11
> [2,] 45 2 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 45
> [3,] 25 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2
> [4,] 47 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 5
> [5,] 73 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 NA
> [6,] 53 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 NA
>
> See? No yearID.
>
> Just make sure all the vars you reference in your j-expression are in
> your .SDcols
>
>
>> Second question, more out of curiosity than anything else: is there an
>> analogue in data.table to within() or plyr::mutate, where one can define new
>> variables within a call and use them to create other variables? An example
>> of what I have in mind is
>>
>> DT[, list(..., PA = AB + BB + HBP + SH + SF,
>> OBP = ifelse(PA > 0,
>> round((H + BB + HBP)/(PA - SH - SF), 3),
>> NA)),
>> by = playerID]
>>
>> I have a fairly strong prior on the answer to this question, but I'll let
>> others weigh in first.
>
> Matthew is fixing `within` in the development version (SVN from
> r-forge), but there is the recently introduced `:=` -- but this will
> add these columns to the data.table you are iterating over, which
> doesn't sound like what you want.
>
> Note that your `j-expression` isn't restricted to being a list. Look
> at the example I gave above for starters, but also you can do:
>
> DTtst[, {
> PA <- AB + BB + HBP + SH + SF
> list(PA=PA, OBP=ifelse(PA > 0, round((H + BB + HBP)/(PA - SH - SF), 3), NA))
> }, by='playerID']
>
> --
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
> | Memorial Sloan-Kettering Cancer Center
> | Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact
>
More information about the datatable-help
mailing list