[datatable-help] datatable roll="next" takes 150 times longer than findInterval

Arunkumar Srinivasan aragorn168b at gmail.com
Thu Feb 6 14:23:31 CET 2014


In this case? Then nothing'll be different.

I'm not sure what you mean because the problem here is that this *doesn't*
require *by-without-by* as the j-operations are not necessary to be
performed *during* the join. So, we can just perform the join and then take
the "abs" once at the end, rather than calling it about 1e5+ times (the
number of groups).

So, if your question is: "apart from this question, how would an explicit
by-without-by look like?", then I guess it'd be the same as the normal
aggregation, but "by" would take a data.table as well. This has not yet
been discussed or conceptualised. But this is how I imagine it to be:

DT1[, list(...), by=DT2]

Where, DT1's key columns have to be set as usual.


On Thu, Feb 6, 2014 at 12:55 PM, Gabor Grothendieck <ggrothendieck at gmail.com
> wrote:

> On Wed, Feb 5, 2014 at 10:42 AM, Arunkumar Srinivasan
> <aragorn168b at gmail.com> wrote:
> > Seems like the "by-without-by" is what's slowing things down:
> >
> > require(data.table)
> > dtx <- data.table(x=which(X), key="x")
> > dty <- data.table(y=which(Y), key="y")
> > dtx[, x1 := x]
> > dty[, y1 := y]
> > system.time(ans <- dty[dtx, roll="nearest"][, abs(x1-y1)])
> >    user  system elapsed
> >   1.321   0.076   1.396
> > system.time(ans2 <- flodel(x,y))
> >    user  system elapsed
> >   0.936   0.044   0.977
> >
> > identical(ans, ans2) # [1] TRUE
>
> What will the code look like after the explicit by-without-by feature is
> added?
>
> --
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140206/2c0164a6/attachment.html>


More information about the datatable-help mailing list