[datatable-help] assignment by reference in subset
Philip de Witt Hamer
pcvdwh at gmail.com
Sun Nov 18 20:19:58 CET 2012
Dear all,
data.table is great! thanks for this life(time)saving package.
Now, I run into a difficult nut to crack using ':='.
I'd like to do a calculation using column information conditional on another column
first some jumbo data:
library(data.table)
DT <- data.table(
1:50,
rep(1:5,each=10),
runif(50,0,1)
)
setnames(DT, 1:3, c("id","grp","p"))
id's are unique
grp's speaks for itself
think of p's as e.g. p-values
next, if I want to obtain the nr of p values at least as extreme as the p of each row from the whole set, this seems to work well:
DT[,c1 := sum(DT[,p] <= p), by=id]
but then, I would like to get the nr of p values at least as extreme as the p of each row for the subset with identical grp, I am having a hard time, because these attempts fail:
DT[,c2 := sum(DT[grp,p] <= p),by=id]
DT[,c3 := sum(DT[DT[,grp]==grp,p] <= p), by=id]
What I am after is like the following output examplified by the first 20 rows.
So that p=0.286 for the first row is compared with the p's of grp=1, the result is 3 because three of these p's are equal or smaller than 0.286 (i.e. 0.286, 0.078, and 0.211).
id grp p c1 c2
1: 1 1 0.28619666 12 3
2: 2 1 0.31725169 14 4
3: 3 1 0.82172331 41 8
4: 4 1 0.07867874 4 1
5: 5 1 0.69134289 35 7
6: 6 1 0.21123476 7 2
7: 7 1 0.39156432 20 5
8: 8 1 0.98862365 50 10
9: 9 1 0.55943136 25 6
10: 10 1 0.93543842 47 9
11: 11 2 0.48740254 21 5
12: 12 2 0.02974435 1 1
13: 13 2 0.14566443 6 2
14: 14 2 0.23408044 10 3
15: 15 2 0.63503196 32 7
16: 16 2 0.34114088 16 4
17: 17 2 0.56849053 26 6
18: 18 2 0.71877039 36 8
19: 19 2 0.84100007 42 10
20: 20 2 0.81776585 40 9
Any help to solve this issue is greatly appreciated.
Philip
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20121118/9d07a27d/attachment.html>
More information about the datatable-help
mailing list