[datatable-help] adding names to j columns is costly

Wed Sep 11 23:35:35 CEST 2013

I find myself using setnames(...,"V1","...") very often because setting
them in aggregation is expensive:

--8<---------------cut here---------------start------------->8---
> delays.short <- delays.dt[,sum(count),by="delay"]
Finding groups (bysameorder=TRUE) ... done in 1.262secs. bysameorder=TRUE and o__ is length 0
Detected that j uses these columns: count 
Optimization is on but j left unchanged as 'sum(count)'
Starting dogroups ... done dogroups in 8.612 secs
> delays.short <- delays.dt[,list(count=sum(count)),by="delay"]
Finding groups (bysameorder=TRUE) ... done in 1.051secs. bysameorder=TRUE and o__ is length 0
Detected that j uses these columns: count 
Optimization is on but j left unchanged as 'list(sum(count))'
Starting dogroups ... done dogroups in 11.918 secs
--8<---------------cut here---------------end--------------->8---

38% difference is a lot (3 seconds is not a big deal, but this is just a
toy dataset).

ISTR that I have asked this question before - is this still (data.table
1.8.10) the state of the art, or am I doing something stupid?

Thanks!

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 13.04 (raring) X 11.0.11303000
http://www.childpsy.net/ http://think-israel.org http://truepeace.org
http://thereligionofpeace.com http://americancensorship.org http://iris.org.il
Money does not "play a role", it writes the scenario.