[datatable-help] datatable roll="next" takes 150 times longer than findInterval

Thu Feb 6 14:53:18 CET 2014

Not really. Because it still doing a "by". Meaning, for every grouping in
"by"  - abs(x-y) will be evaluated. If there are 1e5 groups, there'll be
1e5 calls. And that can be expensive depending on the function + the time
to call eval from within C.

However, since it's not necessary to do a by-without-by, we can perform the
join and then compute once the difference between columns. There's no
grouping, no eval from C, and no multiple calls to abs. Hope this clears it
up?

On Thu, Feb 6, 2014 at 2:45 PM, Gabor Grothendieck
<ggrothendieck at gmail.com>wrote:

> On Thu, Feb 6, 2014 at 8:23 AM, Arunkumar Srinivasan
> <aragorn168b at gmail.com> wrote:
> > In this case? Then nothing'll be different.
> >
> > I'm not sure what you mean because the problem here is that this
> *doesn't*
> > require *by-without-by* as the j-operations are not necessary to be
> > performed *during* the join. So, we can just perform the join and then
> take
> > the "abs" once at the end, rather than calling it about 1e5+ times (the
> > number of groups).
> >
> > So, if your question is: "apart from this question, how would an explicit
> > by-without-by look like?", then I guess it'd be the same as the normal
> > aggregation, but "by" would take a data.table as well. This has not yet
> been
> > discussed or conceptualised. But this is how I imagine it to be:
> >
> > DT1[, list(...), by=DT2]
> >
> > Where, DT1's key columns have to be set as usual.
>
> My original code was this:
>
> dtx <- data.table(x = which(x))
> dty <- data.table(y = which(y), key = "y")
> dty[dtx, abs(x - y), roll = "nearest"]
>
> With that feature would this code not use by-within-by and therefore
> become fast?
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140206/0de1336d/attachment.html>