[datatable-help] Question about by statements and subsetting

John Kerpel john.kerpel2 at gmail.com
Fri Aug 2 19:26:59 CEST 2013


I'm a noob to data.table and I've got a couple of questions:

1).  Why do I get different answers in the following example:

> DT = data.table(a=c(4:13),y=c(1,1,2,2,2,3,3,3,4,4),x=1:10,z=c(1,1,1,1,2,2,2,2,3,3),zz=c(1,1,1,1,1,2,2,2,2,2))> setkeyv(DT,cols=c("a","x","y","z","zz"))> DT[,if(.N>=4) {list(predict(smooth.spline(x,y),*c(4,5,6)*)$y)} ,by=z]   z        V1
1: 1 2.1000000
2: 1 2.5000000
3: 1 2.9000000
4: 2 0.9998959
5: 2 2.0453352
6: 2 2.9093247

Versus:

> DT[,if(.N>=4) {list(predict(smooth.spline(x,y),*a[1:3]*)$y)} ,by=z]   z       V1
1: 1 2.100000
2: 1 2.500000
3: 1 2.900000
4: 2 2.999995
5: 2 2.954664
6: 2 2.909333

Is some sort of recycling going on here?


2).  How can I do some sort of nested "by" statement?
Let's say I want to set by=zz, but run the spline statement within
each z subset.  Do I use .SD somehow?

This is great package - it's just taking me some time to get the
syntax right.  I've found this to be faster than clusterMap on 2
cores...
I hope I've used the correct terminology!

Best,

John
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130802/e29c8177/attachment-0001.html>


More information about the datatable-help mailing list