[datatable-help] Sharing a win!

Thell Fowler tbfowler4 at gmail.com
Sat Nov 22 23:46:08 CET 2014


Thank you for the great work you do on data.table and for being so active
in publicly (SO) answering users question and in giving tips/advice.

Every so often I need to go back to research scripts and get them updated
to newer versions of data.table which is a major PITA, updating hardly ever
goes smoothly (ie: from 1.8.8 to 1.8.11 was no fun) and updating to the
most recent from a 1.8.11 wasn't either but it can sure be worth it!

One of the tasks I do is propagating calculated values from unique entries
to the remaining group members. The propagation routine broke after
updating `data.table` and forced me (again) to look at the routine.
Thankfully it wasn't too bad to get a working command. Then, as you know,
it was time to give a little tweaking for performance.

This is where the WIN comes in.

Altering the main propagation command

from:
dt[,(vcols):=cbind(dtG,dtV)[dt,roll=TRUE][,vcols,with=FALSE]][,ugid:=NULL]

to:
dt[,(vcols):=dtG[,(vcols):=dtV][dt[,.(id)],roll=TRUE][,id:=NULL]][,ugid:=NULL]

yielded a mean time (in seconds) change
from: 30053.231
to: <drum roll> 1367.989 !!!!!

Notice the elimination of the cbind, the dt[.(id)] instead of dt for the
rolling join and no more selecting the columns using 'with=FALSE'.

Very cool! Thank you again for what you do!

--
Sincerely,
Thell
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20141122/b5398826/attachment.html>


More information about the datatable-help mailing list