[datatable-help] assignment by reference in subset

Sun Nov 18 20:19:58 CET 2012

Dear all,

data.table is great! thanks for this life(time)saving package.

Now, I run into a difficult nut to crack using ':='.
I'd like to do a calculation using column information conditional on another column

first some jumbo data:

library(data.table)
DT <- data.table(
 1:50,
 rep(1:5,each=10),
 runif(50,0,1)
)
setnames(DT, 1:3, c("id","grp","p"))

id's are unique
grp's speaks for itself
think of p's as e.g. p-values

next, if I want to obtain the nr of p values at least as extreme as the p of each row from the whole set, this seems to work well:  

DT[,c1 := sum(DT[,p] <= p), by=id]

but then, I would like to get the nr of p values at least as extreme as the p of each row for the subset with identical grp, I am having a hard time, because these attempts fail:

DT[,c2 := sum(DT[grp,p] <= p),by=id]
DT[,c3 := sum(DT[DT[,grp]==grp,p] <= p), by=id]

What I am after is like the following output examplified by the first 20 rows.
So that p=0.286 for the first row is compared with the p's of grp=1, the result is 3 because three of these p's are equal or smaller than 0.286 (i.e. 0.286, 0.078, and 0.211).

    id grp          p c1 c2
 1:  1   1 0.28619666 12  3
 2:  2   1 0.31725169 14  4
 3:  3   1 0.82172331 41  8
 4:  4   1 0.07867874  4  1
 5:  5   1 0.69134289 35  7
 6:  6   1 0.21123476  7  2
 7:  7   1 0.39156432 20  5
 8:  8   1 0.98862365 50 10
 9:  9   1 0.55943136 25  6
10: 10   1 0.93543842 47  9
11: 11   2 0.48740254 21  5
12: 12   2 0.02974435  1  1
13: 13   2 0.14566443  6  2
14: 14   2 0.23408044 10  3
15: 15   2 0.63503196 32  7
16: 16   2 0.34114088 16  4
17: 17   2 0.56849053 26  6
18: 18   2 0.71877039 36  8
19: 19   2 0.84100007 42 10
20: 20   2 0.81776585 40  9

Any help to solve this issue is greatly appreciated.

Philip
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20121118/9d07a27d/attachment.html>