[datatable-help] datatable roll="next" takes 150 times longer than findInterval
Matt Dowle
mdowle at mdowle.plus.com
Mon Feb 3 12:46:23 CET 2014
Gabor,
With that said about it being a micro benchmark, by-without-by might be
at play in GG2(X,Y) here; i.e. running j for each row of i, where it
could run once. I remember you and others quite rightly said
by-without-by should be explicit ... still got to make that change. A
similar speed issue came up recently somewhere else as well which the
change in default should help.
Matt
On 02/02/14 18:57, Matt Dowle wrote:
>
> But this is at the *micro* second level ?!!
>
> I confirm those results on my slow netbook but remember these are
> **micro** seconds i.e. 71,000 here is less than 0.1 of a second.
>
> > microbenchmark(flodel(X,Y), GG1(X,Y), GG2(X,Y))
> Unit: microseconds
> expr min lq median uq max neval
> flodel(X, Y) 330.798 369.369 402.7935 455.3225 17996.26 100
> GG1(X, Y) 14287.380 14370.038 14466.5990 16010.5440 121082.77 100
> GG2(X, Y) 71164.270 85751.437 107951.3415 161676.5720 366003.62 100
>
> To put it in some perspective :
>
> > system.time(GG2(X,Y))
> user system elapsed
> 0.072 0.000 0.072
> > system.time(GG2(X,Y))
> user system elapsed
> 0.080 0.000 0.079
> > system.time(GG2(X,Y))
> user system elapsed
> 0.072 0.000 0.072
>
> Where those times are in seconds. So the task in question here,
> takes 0.07 seconds ?!
>
> The 150x longer figure is actually (using figures from the S.O.
> answer) 24695 microseconds (i.e. 0.024 seconds) divided by 168
> microseconds (0.000168 seconds). 0.024 seconds / 0.000168 = "150
> times". If you rounded to milliseconds you could say data.table is
> infinitely slower (24ms / 0ms = Inf).
>
> I can believe there's scope for improvement, sure, but not from this
> benchmark. The vectors need to be *much* bigger and replications needs
> to be *much* smaller, say 3. The task being timed needs to take a
> meaningful amount of time (say 5 seconds) *for a single run*.
>
> Matt
>
>
> On 02/02/14 12:27, Gabor Grothendieck wrote:
>> The benchmark at the bottom of this post shows a problem where a
>> data.table roll="next" took nearly 150x longer than a base
>> findInterval() solution. (The data.table solution is easier to write
>> though.) This suggests an area for possible speed improvement.
>>
>> http://stackoverflow.com/questions/21499742/fast-minimum-distance-interval-between-elements-of-2-logical-vectors-take-2/21500855#21500855
>>
>> --
>> Statistics & Software Consulting
>> GKX Group, GKX Associates Inc.
>> tel: 1-877-GKX-GROUP
>> email: ggrothendieck at gmail.com <http://gmail.com>
>>
>>
>> _______________________________________________
>> datatable-help mailing list
>> datatable-help at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140203/42c1828e/attachment.html>
More information about the datatable-help
mailing list