[datatable-help] data.table by versus apply
Matthew Dowle
mdowle at mdowle.plus.com
Tue Apr 19 00:00:51 CEST 2011
Damian,
Bug #1301 fixed. Workaround below no longer needed.
Matthew
On Sun, 2011-02-27 at 17:14 +0000, Matthew Dowle wrote:
> Hi,
>
> How about this :
>
> fns=c(max,min)
> test.dt <- data.table(ID=1:10, SCORE_1=rnorm(10), SCORE_2=rnorm(10),
> SCORE_3=rnorm(10), fn=c(rep(1, 5), rep(2, 5)))
>
> test.dt[,fns[[fn]](SCORE_1,SCORE_2,SCORE_3),by=ID] # bug #1301 raised
>
> test.dt[,{fn;fns[[fn]](SCORE_1,SCORE_2,SCORE_3)},by=ID] # workaround
> ID V1
> [1,] 1 -1.6788065
> [2,] 2 -1.4021021
> [3,] 3 -1.0469943
> [4,] 4 -1.2663419
> [5,] 5 -0.2765518
> [6,] 6 0.3511581
> [7,] 7 1.1809315
> [8,] 8 0.3570631
> [9,] 9 0.9680948
> [10,] 10 1.3025652
>
> The bug is because the variable 'fn' isn't being detected as used by j
> (incorrectly) so it isn't being subset. Maybe because it appears inside
> the [[]]. Using fn explicity in the workaround gets around that. Raised
> bug #1301 to fix that.
>
> Also, data.table could be enhanced to allow a column to contain a list
> of functions directly, rather than a lookup. Should be ok provided it
> was pointers to functions rather than the functions themselves repeated
> over and over. Might be quite useful. FR#1302 raised to do that. You can
> probably create data.frame and data.table with a list column containing
> functions already, but whether operations on those columns work I doubt.
> Might not be very difficult to do though.
>
> Thanks for helping to discover a new bug and new fr !
>
> Matthew
>
>
> On Sat, 2011-02-26 at 16:55 -0600, Damian Betebenner wrote:
> > All,
> >
> > I’m curious from a speed perspective what the analog of apply is in
> > data.table as I have a problem where, for each row, I want to take
> > either the min or the max of several columns depending upon the value
> > of a third column:
> >
> > For example:
> >
> > test.dt <- data.table(ID=1:10, SCORE_1=rnorm(10), SCORE_2=rnorm(10),
> > SCORE_3=rnorm(10), MAX_OR_MIN=c(rep("Max", 5), rep("Min", 5)))
> >
> > For each row I’d like to get the max of SCORE_1, SCORE_2, and SCORE_3
> > if the MAX_OR_MIN value is MAX and the min of SCORE_1, SCORE_2, and
> > SCORE_3 if the MAX_OR_MIN value is MIN.
> >
> > It isn’t too difficult to come up with a “bulky” and slow solution,
> > but I’m wondering if I’m missing a way in which data.table would make
> > such an effort elegant and quick.
> >
> > Any help greatly appreciated.
> >
> > Damian Betebenner
> >
> > Center for Assessment
> >
> > PO Box 351
> >
> > Dover, NH 03821-0351
> >
> >
> >
> > Phone (office): (603) 516-7900
> >
> > Phone (cell): (857) 234-2474
> >
> > Fax: (603) 516-7910
> >
> >
> >
> > dbetebenner at nciea.org
> >
> > www.nciea.org
> >
> >
> >
> >
> >
> >
> >
> >
> > _______________________________________________
> > datatable-help mailing list
> > datatable-help at lists.r-forge.r-project.org
> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
More information about the datatable-help
mailing list