[datatable-help] Summing over many variables

Joseph Voelkel jgvcqa at rit.edu
Thu Dec 23 20:35:10 CET 2010


Consider this set of code:

DT1<-data.table(A1=1:100,A2=1:100,A3=1:100,B1=101:200,B2=101:200,B3=101:200,C1=301:400,D1=301:400,grp=rep(1:5,each=20))
setkey(DT1,grp)
(DT2<-DT1[,lapply(.SD,sum),by=grp]) # from data.table FAQ


I have two questions:
1. I have many columns like C1 and D1 that I don't want to include in the new data.table (nor do I want grp.1 in it). How can I nicely have these not be part of my result? (If it helps, I know the indices for the A and the B columns)

2. However, in addition to (and sometimes instead of) DT2, what I want is this result:
DT2[,list(sum(A1+A2+A3),sum(B1+B2+B3)),by=grp]

Now, in the actual data set, it's more like A1 to A30 and B1 to B20, and I will be doing this for many subsets of the A's and B's.
So, I would like to have a way to easily find the sum (or sd, or ...) by some easier method than by referencing the column names--use of column indices would be nice for the actual problem. I know that the column numbers can be referenced with with=FALSE, but don't really see how to use that in this problem.

Any ideas? Thanks.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20101223/70b1f37c/attachment.htm>


More information about the datatable-help mailing list