[datatable-help] datatable roll="next" takes 150 times longer than findInterval

Gabor Grothendieck ggrothendieck at gmail.com
Thu Feb 6 14:45:10 CET 2014


On Thu, Feb 6, 2014 at 8:23 AM, Arunkumar Srinivasan
<aragorn168b at gmail.com> wrote:
> In this case? Then nothing'll be different.
>
> I'm not sure what you mean because the problem here is that this *doesn't*
> require *by-without-by* as the j-operations are not necessary to be
> performed *during* the join. So, we can just perform the join and then take
> the "abs" once at the end, rather than calling it about 1e5+ times (the
> number of groups).
>
> So, if your question is: "apart from this question, how would an explicit
> by-without-by look like?", then I guess it'd be the same as the normal
> aggregation, but "by" would take a data.table as well. This has not yet been
> discussed or conceptualised. But this is how I imagine it to be:
>
> DT1[, list(...), by=DT2]
>
> Where, DT1's key columns have to be set as usual.

My original code was this:

dtx <- data.table(x = which(x))
dty <- data.table(y = which(y), key = "y")
dty[dtx, abs(x - y), roll = "nearest"]

With that feature would this code not use by-within-by and therefore
become fast?


More information about the datatable-help mailing list