[datatable-help] data.table by versus apply
Matthew Dowle
mdowle at mdowle.plus.com
Sun Feb 27 18:14:06 CET 2011
Hi,
How about this :
fns=c(max,min)
test.dt <- data.table(ID=1:10, SCORE_1=rnorm(10), SCORE_2=rnorm(10),
SCORE_3=rnorm(10), fn=c(rep(1, 5), rep(2, 5)))
test.dt[,fns[[fn]](SCORE_1,SCORE_2,SCORE_3),by=ID] # bug #1301 raised
test.dt[,{fn;fns[[fn]](SCORE_1,SCORE_2,SCORE_3)},by=ID] # workaround
ID V1
[1,] 1 -1.6788065
[2,] 2 -1.4021021
[3,] 3 -1.0469943
[4,] 4 -1.2663419
[5,] 5 -0.2765518
[6,] 6 0.3511581
[7,] 7 1.1809315
[8,] 8 0.3570631
[9,] 9 0.9680948
[10,] 10 1.3025652
The bug is because the variable 'fn' isn't being detected as used by j
(incorrectly) so it isn't being subset. Maybe because it appears inside
the [[]]. Using fn explicity in the workaround gets around that. Raised
bug #1301 to fix that.
Also, data.table could be enhanced to allow a column to contain a list
of functions directly, rather than a lookup. Should be ok provided it
was pointers to functions rather than the functions themselves repeated
over and over. Might be quite useful. FR#1302 raised to do that. You can
probably create data.frame and data.table with a list column containing
functions already, but whether operations on those columns work I doubt.
Might not be very difficult to do though.
Thanks for helping to discover a new bug and new fr !
Matthew
On Sat, 2011-02-26 at 16:55 -0600, Damian Betebenner wrote:
> All,
>
> I’m curious from a speed perspective what the analog of apply is in
> data.table as I have a problem where, for each row, I want to take
> either the min or the max of several columns depending upon the value
> of a third column:
>
> For example:
>
> test.dt <- data.table(ID=1:10, SCORE_1=rnorm(10), SCORE_2=rnorm(10),
> SCORE_3=rnorm(10), MAX_OR_MIN=c(rep("Max", 5), rep("Min", 5)))
>
> For each row I’d like to get the max of SCORE_1, SCORE_2, and SCORE_3
> if the MAX_OR_MIN value is MAX and the min of SCORE_1, SCORE_2, and
> SCORE_3 if the MAX_OR_MIN value is MIN.
>
> It isn’t too difficult to come up with a “bulky” and slow solution,
> but I’m wondering if I’m missing a way in which data.table would make
> such an effort elegant and quick.
>
> Any help greatly appreciated.
>
> Damian Betebenner
>
> Center for Assessment
>
> PO Box 351
>
> Dover, NH 03821-0351
>
>
>
> Phone (office): (603) 516-7900
>
> Phone (cell): (857) 234-2474
>
> Fax: (603) 516-7910
>
>
>
> dbetebenner at nciea.org
>
> www.nciea.org
>
>
>
>
>
>
>
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
More information about the datatable-help
mailing list