[datatable-help] Summing over many variables

Steve Lianoglou mailinglist.honeypot at gmail.com
Thu Dec 23 20:51:46 CET 2010


Hi,

On Thu, Dec 23, 2010 at 2:35 PM, Joseph Voelkel <jgvcqa at rit.edu> wrote:
> Consider this set of code:
>
> DT1<-data.table(A1=1:100,A2=1:100,A3=1:100,B1=101:200,B2=101:200,B3=101:200,C1=301:400,D1=301:400,grp=rep(1:5,each=20))
>
> setkey(DT1,grp)
>
> (DT2<-DT1[,lapply(.SD,sum),by=grp]) # from data.table FAQ
>
> I have two questions:
>
> 1. I have many columns like C1 and D1 that I don't want to include in the
> new data.table (nor do I want grp.1 in it). How can I nicely have these not
> be part of my result? (If it helps, I know the indices for the A and the B
> columns)

Since you already know the indices, it might be faster to just pull
them out of DT2 after you're done processing it. Barring that, one way
you could do it "inline" is this:

R> skip <- match(c("C1", "D1", "grp"), colnames(DT1))
R> dt2 <- DT1[, {
  lapply(.SD[, -skip, with=FALSE], sum)
}, by='grp']

Perhaps there are more elegant ways ...

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact


More information about the datatable-help mailing list