[datatable-help] datatable roll="next" takes 150 times longer than findInterval
Arunkumar Srinivasan
aragorn168b at gmail.com
Wed Feb 5 16:32:03 CET 2014
Just tested. Works just fine (on 1.8.11). Takes 16 seconds as opposed to
Flodel's which takes 1.4 seconds on my laptop. Also identical returned TRUE.
Will see where's the delay coming from.
On Wed, Feb 5, 2014 at 4:22 PM, Gabor Grothendieck
<ggrothendieck at gmail.com>wrote:
> There was anoither benchmark posted with larger data and longer times
> but this time data.table stopped with an error. See:
>
>
> http://stackoverflow.com/questions/21499742/fast-minimum-distance-interval-between-elements-of-2-logical-vectors-take-2/21500855#21500855
>
> On Mon, Feb 3, 2014 at 6:46 AM, Matt Dowle <mdowle at mdowle.plus.com> wrote:
> > Gabor,
> >
> > With that said about it being a micro benchmark, by-without-by might be
> at
> > play in GG2(X,Y) here; i.e. running j for each row of i, where it could
> run
> > once. I remember you and others quite rightly said by-without-by should
> be
> > explicit ... still got to make that change. A similar speed issue came
> up
> > recently somewhere else as well which the change in default should help.
> >
> > Matt
> >
> >
> > On 02/02/14 18:57, Matt Dowle wrote:
> >
> >
> > But this is at the *micro* second level ?!!
> >
> > I confirm those results on my slow netbook but remember these are
> **micro**
> > seconds i.e. 71,000 here is less than 0.1 of a second.
> >
> >> microbenchmark(flodel(X,Y), GG1(X,Y), GG2(X,Y))
> > Unit: microseconds
> > expr min lq median uq max neval
> > flodel(X, Y) 330.798 369.369 402.7935 455.3225 17996.26 100
> > GG1(X, Y) 14287.380 14370.038 14466.5990 16010.5440 121082.77 100
> > GG2(X, Y) 71164.270 85751.437 107951.3415 161676.5720 366003.62 100
> >
> > To put it in some perspective :
> >
> >> system.time(GG2(X,Y))
> > user system elapsed
> > 0.072 0.000 0.072
> >> system.time(GG2(X,Y))
> > user system elapsed
> > 0.080 0.000 0.079
> >> system.time(GG2(X,Y))
> > user system elapsed
> > 0.072 0.000 0.072
> >
> > Where those times are in seconds. So the task in question here, takes
> > 0.07 seconds ?!
> >
> > The 150x longer figure is actually (using figures from the S.O. answer)
> > 24695 microseconds (i.e. 0.024 seconds) divided by 168 microseconds
> > (0.000168 seconds). 0.024 seconds / 0.000168 = "150 times". If you
> > rounded to milliseconds you could say data.table is infinitely slower
> (24ms
> > / 0ms = Inf).
> >
> > I can believe there's scope for improvement, sure, but not from this
> > benchmark. The vectors need to be *much* bigger and replications needs
> to be
> > *much* smaller, say 3. The task being timed needs to take a meaningful
> > amount of time (say 5 seconds) *for a single run*.
> >
> > Matt
> >
> >
> > On 02/02/14 12:27, Gabor Grothendieck wrote:
> >
> > The benchmark at the bottom of this post shows a problem where a
> data.table
> > roll="next" took nearly 150x longer than a base findInterval() solution.
> > (The data.table solution is easier to write though.) This suggests an
> area
> > for possible speed improvement.
> >
> >
> http://stackoverflow.com/questions/21499742/fast-minimum-distance-interval-between-elements-of-2-logical-vectors-take-2/21500855#21500855
> >
> > --
> > Statistics & Software Consulting
> > GKX Group, GKX Associates Inc.
> > tel: 1-877-GKX-GROUP
> > email: ggrothendieck at gmail.com
> >
> >
> > _______________________________________________
> > datatable-help mailing list
> > datatable-help at lists.r-forge.r-project.org
> >
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> >
> >
> >
>
>
>
> --
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140205/69ca4083/attachment.html>
More information about the datatable-help
mailing list