<div dir="ltr"><span style="font-family:arial,sans-serif;font-size:13px">Have edited here now:</span><div style="font-family:arial,sans-serif;font-size:13px"><a href="http://stackoverflow.com/a/21500855/559784" target="_blank">http://stackoverflow.com/a/21500855/559784</a><div style="width:16px;height:16px;display:inline-block">
</div><div class=""><div id=":8z" class="" tabindex="0"><img class="" src="https://mail.google.com/mail/u/0/images/cleardot.gif" style=""></div></div></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Wed, Feb 5, 2014 at 4:42 PM, Arunkumar Srinivasan <span dir="ltr"><<a href="mailto:aragorn168b@gmail.com" target="_blank">aragorn168b@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Seems like the "by-without-by" is what's slowing things down:<div><br></div><div>require(data.table)</div>
<div>dtx <- data.table(x=which(X), key="x")</div><div>dty <- data.table(y=which(Y), key="y")</div>
<div><div>dtx[, x1 := x]</div><div>dty[, y1 := y]</div></div><div>system.time(ans <- dty[dtx, roll="nearest"][, abs(x1-y1)])<br></div><div><div><div> user system elapsed</div><div> 1.321 0.076 1.396</div>
</div></div><div>system.time(ans2 <- flodel(x,y))<br></div><div><div> user system elapsed</div><div> <a href="tel:0.936%20%C2%A0%200.044%20%C2%A0%200.977" value="+4993600440977" target="_blank">0.936 0.044 0.977</a></div>
</div><div><br></div><div>identical(ans, ans2) # [1] TRUE</div></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra">
<br><br><div class="gmail_quote">On Wed, Feb 5, 2014 at 4:32 PM, Arunkumar Srinivasan <span dir="ltr"><<a href="mailto:aragorn168b@gmail.com" target="_blank">aragorn168b@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">Just tested. Works just fine (on 1.8.11). Takes 16 seconds as opposed to Flodel's which takes 1.4 seconds on my laptop. Also identical returned TRUE.<div>Will see where's the delay coming from.</div>
</div><div><div><div class="gmail_extra"><br><br><div class="gmail_quote">On Wed, Feb 5, 2014 at 4:22 PM, Gabor Grothendieck <span dir="ltr"><<a href="mailto:ggrothendieck@gmail.com" target="_blank">ggrothendieck@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">There was anoither benchmark posted with larger data and longer times<br>
but this time data.table stopped with an error. See:<br>
<br>
<a href="http://stackoverflow.com/questions/21499742/fast-minimum-distance-interval-between-elements-of-2-logical-vectors-take-2/21500855#21500855" target="_blank">http://stackoverflow.com/questions/21499742/fast-minimum-distance-interval-between-elements-of-2-logical-vectors-take-2/21500855#21500855</a><br>
<div><div><br>
On Mon, Feb 3, 2014 at 6:46 AM, Matt Dowle <<a href="mailto:mdowle@mdowle.plus.com" target="_blank">mdowle@mdowle.plus.com</a>> wrote:<br>
> Gabor,<br>
><br>
> With that said about it being a micro benchmark, by-without-by might be at<br>
> play in GG2(X,Y) here; i.e. running j for each row of i, where it could run<br>
> once. I remember you and others quite rightly said by-without-by should be<br>
> explicit ... still got to make that change. A similar speed issue came up<br>
> recently somewhere else as well which the change in default should help.<br>
><br>
> Matt<br>
><br>
><br>
> On 02/02/14 18:57, Matt Dowle wrote:<br>
><br>
><br>
> But this is at the *micro* second level ?!!<br>
><br>
> I confirm those results on my slow netbook but remember these are **micro**<br>
> seconds i.e. 71,000 here is less than 0.1 of a second.<br>
><br>
>> microbenchmark(flodel(X,Y), GG1(X,Y), GG2(X,Y))<br>
> Unit: microseconds<br>
> expr min lq median uq max neval<br>
> flodel(X, Y) 330.798 369.369 402.7935 455.3225 17996.26 100<br>
> GG1(X, Y) 14287.380 14370.<a href="tel:038%20%C2%A014466" value="+493814466" target="_blank">038 14466</a>.5990 16010.5440 121082.77 100<br>
> GG2(X, Y) 71164.270 85751.437 107951.3415 161676.5720 366003.62 100<br>
><br>
> To put it in some perspective :<br>
><br>
>> system.time(GG2(X,Y))<br>
> user system elapsed<br>
> 0.072 0.000 0.072<br>
>> system.time(GG2(X,Y))<br>
> user system elapsed<br>
> 0.080 0.000 0.079<br>
>> system.time(GG2(X,Y))<br>
> user system elapsed<br>
> 0.072 0.000 0.072<br>
><br>
> Where those times are in seconds. So the task in question here, takes<br>
> 0.07 seconds ?!<br>
><br>
> The 150x longer figure is actually (using figures from the S.O. answer)<br>
> 24695 microseconds (i.e. 0.024 seconds) divided by 168 microseconds<br>
> (0.000168 seconds). 0.024 seconds / 0.000168 = "150 times". If you<br>
> rounded to milliseconds you could say data.table is infinitely slower (24ms<br>
> / 0ms = Inf).<br>
><br>
> I can believe there's scope for improvement, sure, but not from this<br>
> benchmark. The vectors need to be *much* bigger and replications needs to be<br>
> *much* smaller, say 3. The task being timed needs to take a meaningful<br>
> amount of time (say 5 seconds) *for a single run*.<br>
><br>
> Matt<br>
><br>
><br>
> On 02/02/14 12:27, Gabor Grothendieck wrote:<br>
><br>
> The benchmark at the bottom of this post shows a problem where a data.table<br>
> roll="next" took nearly 150x longer than a base findInterval() solution.<br>
> (The data.table solution is easier to write though.) This suggests an area<br>
> for possible speed improvement.<br>
><br>
> <a href="http://stackoverflow.com/questions/21499742/fast-minimum-distance-interval-between-elements-of-2-logical-vectors-take-2/21500855#21500855" target="_blank">http://stackoverflow.com/questions/21499742/fast-minimum-distance-interval-between-elements-of-2-logical-vectors-take-2/21500855#21500855</a><br>
><br>
> --<br>
> Statistics & Software Consulting<br>
> GKX Group, GKX Associates Inc.<br>
> tel: 1-877-GKX-GROUP<br>
> email: ggrothendieck at <a href="http://gmail.com" target="_blank">gmail.com</a><br>
><br>
><br>
> _______________________________________________<br>
> datatable-help mailing list<br>
> <a href="mailto:datatable-help@lists.r-forge.r-project.org" target="_blank">datatable-help@lists.r-forge.r-project.org</a><br>
> <a href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help" target="_blank">https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help</a><br>
><br>
><br>
><br>
<br>
<br>
<br>
--<br>
Statistics & Software Consulting<br>
GKX Group, GKX Associates Inc.<br>
tel: 1-877-GKX-GROUP<br>
email: ggrothendieck at <a href="http://gmail.com" target="_blank">gmail.com</a><br>
_______________________________________________<br>
datatable-help mailing list<br>
<a href="mailto:datatable-help@lists.r-forge.r-project.org" target="_blank">datatable-help@lists.r-forge.r-project.org</a><br>
<a href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help" target="_blank">https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help</a><br>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>