<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN">
<html><body>
<p> </p>
<p>Interesting. Well asked.</p>
<p>On my netbook :</p>
<pre>> Rprof()<br />> system.time(do.call(cbind, lst.USArrests.dt))<br /> user system elapsed <br /> 4.008 0.000 4.012 <br />> Rprof(NULL)<br />> summaryRprof()<br />$by.self<br /> self.time self.pct total.time total.pct<br />"make.names" 1.82 44.39 1.82 44.39<br />"data.table" 1.74 42.44 4.00 97.56<br />"[[.data.frame" 0.12 2.93 0.26 6.34<br />"gc" 0.10 2.44 0.10 2.44<br />"match" 0.08 1.95 0.10 2.44<br />"length" 0.06 1.46 0.06 1.46<br />"[[" 0.04 0.98 0.30 7.32<br />"%in%" 0.04 0.98 0.14 3.41<br />"NROW" 0.02 0.49 0.12 2.93<br />"is.data.frame" 0.02 0.49 0.02 0.49<br />"names" 0.02 0.49 0.02 0.49<br />"paste" 0.02 0.49 0.02 0.49<br />"sys.call" 0.02 0.49 0.02 0.49<br /><br /></pre>
<pre>So almost half of it is in make.names() [notice that cbind.data.frame calls data.frame with check.names=FALSE] and the other half in data.table() but not sure exactly where. So we can do better, or maybe we need a cbindlist (analogous to the existing rbindlist). But as you allude, we've spent most effort on := and set() to add columns by reference rather than copying using cbind().</pre>
<pre> </pre>
<pre>I've added a feature request to tackle this anyway. Thanks for highlighting, great test.</pre>
<pre> </pre>
<pre><a href="https://r-forge.r-project.org/tracker/?group_id=240&atid=978&func=detail&aid=2636">https://r-forge.r-project.org/tracker/?group_id=240&atid=978&func=detail&aid=2636</a></pre>
<pre> </pre>
<pre>Matthew</pre>
<pre> </pre>
<pre> </pre>
<p>On 22.03.2013 22:23, Sadao Milberg wrote:</p>
<blockquote type="cite" style="padding-left:5px; border-left:#1010ff 2px solid; margin-left:5px; width:100%"><!-- html ignored --><!-- head ignored --><!-- meta ignored -->
<div dir="ltr">I've recently discovered the dramatic performance improvements data.table provides over ddply() and merge(), and I'm looking forward to integrating it into my work. While messing around with benchmarks, I ran into an unexpected outcome with cbind(), where operations are actually much faster with data frames than data tables. Don't ask my why I'd ever do the following, but I am curious as to why it is so much slower:<br /><br /><span style="font-family: 'Courier New';"><span style="font-family: 'Courier New';">USArrests.dt <span style="font-family: 'Courier New';"><br /></span><span style="font-family: 'Courier New';">lst.USArrests <span style="font-family: 'Courier New';"><br /></span><span style="font-family: 'Courier New';">lst.USArrests.dt <span style="font-family: 'Courier New';"><br /><br />microbenchmark(do.call(cbind, lst.USArrests),<br /> do.call(cbind, lst.USArrests.dt),<br /> times=10)</span><br /></span></span></span></span>
<pre class="ecxGJWPQFQDK4" style="font-family: Consolas,; font-size: 14px; border: none; white-space: pre-wrap; line-height: 15px; color: #ffffff; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: -webkit-left; text-indent: 0px; text-transform: none; widows: auto; word-spacing: 0px; background-color: #323232;">Unit: milliseconds
expr min lq median uq max neval
do.call(cbind, lst.USArrests) 42.26891 47.70086 48.71271 49.88542 51.25453 10
do.call(cbind, lst.USArrests.dt) 750.70469 761.70511 773.91232 816.85707 880.45896 10</pre>
<span style="font-family: 'Courier New';"><span style="font-family: 'Courier New';"><span style="font-family: 'Courier New';"><br />This is run on an Ubuntu system. </span></span></span></div>
</blockquote>
<p> </p>
<div> </div>
</body></html>