[datatable-help] Subsetting columns in data.table

Berto cubalibro at gmail.com
Fri Nov 16 15:15:12 CET 2012


Dear all,

I am naive in data.table usage, so I write to you to get help on a perhaps
trivial question, that it's not clear to me after reading the "Introduction
to data.table" or the "FAQ" vignettes.

I'd like to subset the rows with contain a value above a certain threshold
for a certain column, but below another threshold for all the rest and then
do some operations (e.g. sum):

#Example data
DT = data.table(x=rep(c("a","b","c"),each=3), y=c(1,3,6), v=1:9, w=3:11,
z=LETTERS[1:9])
setkey(DT,z)
DT

   x y v  w z
1: a 1 1  3 A
2: a 3 2  4 B
3: a 6 3  5 C
4: b 1 4  6 D
5: b 3 5  7 E
6: b 6 6  8 F
7: c 1 7  9 G
8: c 3 8 10 H
9: c 6 9 11 I

#Easy case, 1 variable
DT[y>=3&v<=7,sum(y), by=x]

#Particular solution (low ncols)
DT[y>=3&v<=7&w<=7,sum(y), by=x]

x V1
1: a  9
2: b  3

#Attempt subgroup for general solution
DT[y>=3, .SD<=7,]

         x    y     v     w     z
[1,] FALSE TRUE  TRUE  TRUE FALSE
[2,] FALSE TRUE  TRUE  TRUE FALSE
[3,] FALSE TRUE  TRUE  TRUE FALSE
[4,] FALSE TRUE  TRUE FALSE FALSE
[5,] FALSE TRUE FALSE FALSE FALSE
[6,] FALSE TRUE FALSE FALSE FALSE

##gives a matrix with logical for <=7.

#Second attempt based on FAQ 2.1
DT[y>=3,lapply(.SD<=7,sum),by=x]

   x V1 V2 V3 V4 V5 V6 V7 V8
1: a  1  1  1  1  1  1  0  0
2: b  1  1  1  1  1  0  0  0
3: c  1  1  0  0  0  0  0  0

I would appreciate any suggestions, maybe this has been covered already in
some post of SO that I've missed.

Thanks in advance for your support!!



--
View this message in context: http://r.789695.n4.nabble.com/Subsetting-columns-in-data-table-tp4649736.html
Sent from the datatable-help mailing list archive at Nabble.com.


More information about the datatable-help mailing list