[datatable-help] datatable roll="next" takes 150 times longer than findInterval

Arunkumar Srinivasan aragorn168b at gmail.com
Thu Feb 6 15:58:28 CET 2014


Gabor,

I think now I understand what your earlier post was about. You mean after
the external by-without-by, doing DT1[DT2, ..., ] will be faster as it
shouldn't do a by-without-by. Yes, that's true. So basically, the statement:

dty[dtx, abs(x - y), roll = "nearest"]

once external by-without-by is implemented, will/should first do the join
and then do the "j' operation. And therefore it'll be as fast as the
solution I wrote. If one wants to perform the j-operation for each group,
then they'll have to do something like

DT1[, j, by=DT2] (or any other solutions we end up on)

Sorry for the misunderstanding.


On Thu, Feb 6, 2014 at 3:20 PM, Gabor Grothendieck
<ggrothendieck at gmail.com>wrote:

> On Thu, Feb 6, 2014 at 8:53 AM, Arunkumar Srinivasan
> <aragorn168b at gmail.com> wrote:
> > Not really. Because it still doing a "by". Meaning, for every grouping in
> > "by"  - abs(x-y) will be evaluated. If there are 1e5 groups, there'll be
> 1e5
> > calls. And that can be expensive depending on the function + the time to
> > call eval from within C.
> >
> > However, since it's not necessary to do a by-without-by, we can perform
> the
> > join and then compute once the difference between columns. There's no
> > grouping, no eval from C, and no multiple calls to abs. Hope this clears
> it
> > up?
> >
> >
>
> In that case what is the proposed user interface?
>
> I thought that the idea was that one would have to explicitly specify
> the by= clause for by-within-by  it to occur.  In the code I had just
> posted there is a join = "nearest" but no by= clause is specified.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140206/a8e7b72b/attachment-0001.html>


More information about the datatable-help mailing list