[datatable-help] Question about by statements and subsetting
John Kerpel
john.kerpel2 at gmail.com
Fri Aug 2 19:26:59 CEST 2013
I'm a noob to data.table and I've got a couple of questions:
1). Why do I get different answers in the following example:
> DT = data.table(a=c(4:13),y=c(1,1,2,2,2,3,3,3,4,4),x=1:10,z=c(1,1,1,1,2,2,2,2,3,3),zz=c(1,1,1,1,1,2,2,2,2,2))> setkeyv(DT,cols=c("a","x","y","z","zz"))> DT[,if(.N>=4) {list(predict(smooth.spline(x,y),*c(4,5,6)*)$y)} ,by=z] z V1
1: 1 2.1000000
2: 1 2.5000000
3: 1 2.9000000
4: 2 0.9998959
5: 2 2.0453352
6: 2 2.9093247
Versus:
> DT[,if(.N>=4) {list(predict(smooth.spline(x,y),*a[1:3]*)$y)} ,by=z] z V1
1: 1 2.100000
2: 1 2.500000
3: 1 2.900000
4: 2 2.999995
5: 2 2.954664
6: 2 2.909333
Is some sort of recycling going on here?
2). How can I do some sort of nested "by" statement?
Let's say I want to set by=zz, but run the spline statement within
each z subset. Do I use .SD somehow?
This is great package - it's just taking me some time to get the
syntax right. I've found this to be faster than clusterMap on 2
cores...
I hope I've used the correct terminology!
Best,
John
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130802/e29c8177/attachment-0001.html>
More information about the datatable-help
mailing list