[datatable-help] data.table and cbind()

Matthew Dowle mdowle at mdowle.plus.com
Sat Mar 23 02:39:28 CET 2013


 

Interesting. Well asked. 

On my netbook : 

> Rprof()
>
system.time(do.call(cbind, lst.USArrests.dt))
 user system elapsed 

4.008 0.000 4.012 
> Rprof(NULL)
> summaryRprof()
$by.self
 self.time
self.pct total.time total.pct
"make.names" 1.82 44.39 1.82
44.39
"data.table" 1.74 42.44 4.00 97.56
"[[.data.frame" 0.12 2.93 0.26
6.34
"gc" 0.10 2.44 0.10 2.44
"match" 0.08 1.95 0.10 2.44
"length" 0.06
1.46 0.06 1.46
"[[" 0.04 0.98 0.30 7.32
"%in%" 0.04 0.98 0.14
3.41
"NROW" 0.02 0.49 0.12 2.93
"is.data.frame" 0.02 0.49 0.02
0.49
"names" 0.02 0.49 0.02 0.49
"paste" 0.02 0.49 0.02 0.49
"sys.call"
0.02 0.49 0.02 0.49

So almost half of it is in make.names() [notice
that cbind.data.frame calls data.frame with check.names=FALSE] and the
other half in data.table() but not sure exactly where. So we can do
better, or maybe we need a cbindlist (analogous to the existing
rbindlist). But as you allude, we've spent most effort on := and set()
to add columns by reference rather than copying using cbind().

I've
added a feature request to tackle this anyway. Thanks for highlighting,
great
test.

https://r-forge.r-project.org/tracker/?group_id=240&atid=978&func=detail&aid=2636

Matthew

On
22.03.2013 22:23, Sadao Milberg wrote: 

> I've recently discovered the
dramatic performance improvements data.table provides over ddply() and
merge(), and I'm looking forward to integrating it into my work. While
messing around with benchmarks, I ran into an unexpected outcome with
cbind(), where operations are actually much faster with data frames than
data tables. Don't ask my why I'd ever do the following, but I am
curious as to why it is so much slower:
> 
> USArrests.dt 
>
lst.USArrests 
> lst.USArrests.dt 
> 
> microbenchmark(do.call(cbind,
lst.USArrests),
> do.call(cbind, lst.USArrests.dt),
> times=10)
> 
>
Unit: milliseconds
> expr min lq median uq max neval
> do.call(cbind,
lst.USArrests) 42.26891 47.70086 48.71271 49.88542 51.25453 10
>
do.call(cbind, lst.USArrests.dt) 750.70469 761.70511 773.91232 816.85707
880.45896 10
> 
> This is run on an Ubuntu system.

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130323/436e66a9/attachment-0001.html>


More information about the datatable-help mailing list