[datatable-help] data.table by versus apply

Matthew Dowle mdowle at mdowle.plus.com
Sun Feb 27 18:14:06 CET 2011


Hi,

How about this :

fns=c(max,min)
test.dt <- data.table(ID=1:10, SCORE_1=rnorm(10), SCORE_2=rnorm(10),
SCORE_3=rnorm(10), fn=c(rep(1, 5), rep(2, 5)))

test.dt[,fns[[fn]](SCORE_1,SCORE_2,SCORE_3),by=ID]  # bug #1301 raised

test.dt[,{fn;fns[[fn]](SCORE_1,SCORE_2,SCORE_3)},by=ID]  # workaround
      ID         V1
 [1,]  1 -1.6788065
 [2,]  2 -1.4021021
 [3,]  3 -1.0469943
 [4,]  4 -1.2663419
 [5,]  5 -0.2765518
 [6,]  6  0.3511581
 [7,]  7  1.1809315
 [8,]  8  0.3570631
 [9,]  9  0.9680948
[10,] 10  1.3025652

The bug is because the variable 'fn' isn't being detected as used by j
(incorrectly) so it isn't being subset. Maybe because it appears inside
the [[]]. Using fn explicity in the workaround gets around that. Raised
bug #1301 to fix that.

Also, data.table could be enhanced to allow a column to contain a list
of functions directly, rather than a lookup. Should be ok provided it
was pointers to functions rather than the functions themselves repeated
over and over. Might be quite useful. FR#1302 raised to do that. You can
probably create data.frame and data.table with a list column containing
functions already, but whether operations on those columns work I doubt.
Might not be very difficult to do though.

Thanks for helping to discover a new bug and new fr !

Matthew


On Sat, 2011-02-26 at 16:55 -0600, Damian Betebenner wrote:
> All,
>
> I’m curious from a speed perspective what the analog of apply is in
> data.table as I have a problem where, for each row,  I want to take
> either the min or the max of several columns depending upon the value
> of a third column:
> 
> For example:
> 
> test.dt <- data.table(ID=1:10, SCORE_1=rnorm(10), SCORE_2=rnorm(10),
> SCORE_3=rnorm(10), MAX_OR_MIN=c(rep("Max", 5), rep("Min", 5)))
> 
> For each row I’d like to get the max of SCORE_1, SCORE_2, and SCORE_3
> if the MAX_OR_MIN value is MAX and the min of SCORE_1, SCORE_2, and
> SCORE_3 if the MAX_OR_MIN value is MIN. 
> 
> It isn’t too difficult to come up with a “bulky” and slow solution,
> but I’m wondering if I’m missing a way in which data.table would make
> such an effort elegant and quick.
>
> Any help greatly appreciated.  
> 
> Damian Betebenner
> 
> Center for Assessment
> 
> PO Box 351
> 
> Dover, NH   03821-0351
> 
>  
> 
> Phone (office): (603) 516-7900
> 
> Phone (cell): (857) 234-2474
> 
> Fax: (603) 516-7910
> 
>  
> 
> dbetebenner at nciea.org
> 
> www.nciea.org
> 
>  
> 
>  
> 
>  
> 
> 
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help




More information about the datatable-help mailing list