[datatable-help] Curiosity with use of .SDcols

Dennis Murphy djmuser at gmail.com
Sun Sep 25 04:32:32 CEST 2011


Thanks, Steve.

The clarification about .SDcols was sufficient. In that context,
everything else was clear.

Best regards,
Dennis

On Fri, Sep 23, 2011 at 8:59 AM, Steve Lianoglou
<mailinglist.honeypot at gmail.com> wrote:
> Hi,
>
> Comments in line
>
> On Fri, Sep 23, 2011 at 11:01 AM, djmuseR <djmuser at gmail.com> wrote:
>> Hi:
>>
>> I'm playing around with some baseball data and ran into an error whose cause
>> I don't quite understand.
>> A subset of the data is here, consisting of all season batting records of
>> five players:
>
> [cut out data]
>
>> # Variables I want to sum over each player:
>> vars <- c('G', 'AB', 'R', 'H', 'X2B', 'X3B',
>>          'HR', 'RBI', 'SB', 'CS', 'BB', 'SO', 'IBB', 'HBP',
>>          'SH', 'SF', 'GIDP', 'G_old')
>>
>> # library('data.table')
>> DTtst <- data.table(tst, key = 'playerID')
>>
>> The following works as I want:
>> DT1 <- DTtst[, list(beginYear = min(yearID), endYear = max(yearID),
>>              nyears = sum(stint == 1L), nteams = length(unique(teamID))),
>>         by = 'playerID']
>> DT2 <- DTtst[, lapply(.SD, sum), by = playerID, .SDcols = vars]
>> DT1[DT2]
>>
>> # Combining the two into one call doesn't:
>>
>> DTtst[, list( beginYear = min(yearID),
>>                                    endYear = max(yearID),
>>                                    nyears = sum(stint == 1L),
>>                                    nteams = length(unique(teamsID)),
>>                                    lapply(.SD, sum)),
>>                               by = playerID,
>>                               .SDcols = vars]
>> # Error in eval(expr, envir, enclos) : object 'yearID' not found
>>
>> What am I missing? Is it the lapply() call within list()?
>
> Using .SDcols restricts the columns/vars that are injected in the
> scope of your j-statement (where your `list(...)` is) which are the
> same as the columns of .SD.
>
> yearID isn' in `vars`, and therefore isn't in .SD. To convince
> yourself, consider this:
>
> R> DTtst[, {
>  xx <- .SD
>  browser()
> }, by='playerID', .SDcols=vars]
>
> Called from: eval(expr, envir, enclos)
> Browse[1]> xx
>      G AB R H X2B X3B HR RBI SB CS BB SO IBB HBP SH SF GIDP G_old
> [1,] 11  0 0 0   0   0  0   0  0  0  0  0   0   0  0  0    0    11
> [2,] 45  2 0 0   0   0  0   0  0  0  0  0   0   0  1  0    0    45
> [3,] 25  0 0 0   0   0  0   0  0  0  0  0   0   0  0  0    0     2
> [4,] 47  1 0 0   0   0  0   0  0  0  0  1   0   0  0  0    0     5
> [5,] 73  0 0 0   0   0  0   0  0  0  0  0   0   0  0  0    0    NA
> [6,] 53  0 0 0   0   0  0   0  0  0  0  0   0   0  0  0    0    NA
>
> See? No yearID.
>
> Just make sure all the vars you reference in your j-expression are in
> your .SDcols
>
>
>> Second question, more out of curiosity than anything else: is there an
>> analogue in data.table to within() or plyr::mutate, where one can define new
>> variables within a call and use them to create other variables? An example
>> of what I have in mind is
>>
>> DT[, list(..., PA = AB + BB + HBP + SH + SF,
>>                  OBP = ifelse(PA > 0,
>>                                round((H + BB + HBP)/(PA - SH - SF), 3),
>> NA)),
>>       by = playerID]
>>
>> I have a fairly strong prior on the answer to this question, but I'll let
>> others weigh in first.
>
> Matthew is fixing `within` in the development version (SVN from
> r-forge), but there is the recently introduced `:=` -- but this will
> add these columns to the data.table you are iterating over, which
> doesn't sound like what you want.
>
> Note that your `j-expression` isn't restricted to being a list. Look
> at the example I gave above for starters, but also you can do:
>
> DTtst[, {
>  PA <- AB + BB + HBP + SH + SF
>  list(PA=PA, OBP=ifelse(PA > 0, round((H + BB + HBP)/(PA - SH - SF), 3), NA))
> }, by='playerID']
>
> --
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
>  | Memorial Sloan-Kettering Cancer Center
>  | Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact
>


More information about the datatable-help mailing list