[datatable-help] datatable roll="next" takes 150 times longer than findInterval

Arunkumar Srinivasan aragorn168b at gmail.com
Wed Feb 5 16:32:03 CET 2014


Just tested. Works just fine (on 1.8.11). Takes 16 seconds as opposed to
Flodel's which takes 1.4 seconds on my laptop. Also identical returned TRUE.
Will see where's the delay coming from.


On Wed, Feb 5, 2014 at 4:22 PM, Gabor Grothendieck
<ggrothendieck at gmail.com>wrote:

> There was anoither benchmark posted with larger data and longer times
> but this time data.table stopped with an error.  See:
>
>
> http://stackoverflow.com/questions/21499742/fast-minimum-distance-interval-between-elements-of-2-logical-vectors-take-2/21500855#21500855
>
> On Mon, Feb 3, 2014 at 6:46 AM, Matt Dowle <mdowle at mdowle.plus.com> wrote:
> > Gabor,
> >
> > With that said about it being a micro benchmark,  by-without-by might be
> at
> > play in GG2(X,Y) here; i.e. running j for each row of i, where it could
> run
> > once.  I remember you and others quite rightly said by-without-by should
> be
> > explicit ... still got to make that change.  A similar speed issue came
> up
> > recently somewhere else as well which the change in default should help.
> >
> > Matt
> >
> >
> > On 02/02/14 18:57, Matt Dowle wrote:
> >
> >
> > But this is at the *micro* second level ?!!
> >
> > I confirm those results on my slow netbook but remember these are
> **micro**
> > seconds i.e. 71,000 here is less than 0.1 of a second.
> >
> >> microbenchmark(flodel(X,Y), GG1(X,Y), GG2(X,Y))
> > Unit: microseconds
> >          expr       min        lq      median          uq       max neval
> >  flodel(X, Y)   330.798   369.369    402.7935    455.3225  17996.26   100
> >     GG1(X, Y) 14287.380 14370.038  14466.5990  16010.5440 121082.77   100
> >     GG2(X, Y) 71164.270 85751.437 107951.3415 161676.5720 366003.62   100
> >
> > To put it in some perspective :
> >
> >> system.time(GG2(X,Y))
> >    user  system elapsed
> >   0.072   0.000   0.072
> >> system.time(GG2(X,Y))
> >    user  system elapsed
> >   0.080   0.000   0.079
> >> system.time(GG2(X,Y))
> >    user  system elapsed
> >   0.072   0.000   0.072
> >
> > Where those times are in seconds.   So the task in question here,  takes
> > 0.07 seconds ?!
> >
> > The 150x longer figure is actually (using figures from the S.O. answer)
> > 24695 microseconds (i.e. 0.024 seconds) divided by 168 microseconds
> > (0.000168 seconds).  0.024 seconds / 0.000168 = "150 times".   If you
> > rounded to milliseconds you could say data.table is infinitely slower
>  (24ms
> > / 0ms = Inf).
> >
> > I can believe there's scope for improvement, sure,  but not from this
> > benchmark. The vectors need to be *much* bigger and replications needs
> to be
> > *much* smaller, say 3.   The task being timed needs to take a meaningful
> > amount of time (say 5 seconds) *for a single run*.
> >
> > Matt
> >
> >
> > On 02/02/14 12:27, Gabor Grothendieck wrote:
> >
> > The benchmark at the bottom of this post shows a problem where a
> data.table
> > roll="next" took nearly 150x longer than a base findInterval() solution.
> > (The data.table solution is easier to write though.) This suggests an
> area
> > for possible speed improvement.
> >
> >
> http://stackoverflow.com/questions/21499742/fast-minimum-distance-interval-between-elements-of-2-logical-vectors-take-2/21500855#21500855
> >
> > --
> > Statistics & Software Consulting
> > GKX Group, GKX Associates Inc.
> > tel: 1-877-GKX-GROUP
> > email: ggrothendieck at gmail.com
> >
> >
> > _______________________________________________
> > datatable-help mailing list
> > datatable-help at lists.r-forge.r-project.org
> >
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> >
> >
> >
>
>
>
> --
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140205/69ca4083/attachment.html>


More information about the datatable-help mailing list