[datatable-help] Speeding up column references with roll

Stavros Macrakis (Σταῦρος Μακράκης) macrakis at alum.mit.edu
Mon Jun 30 17:37:56 CEST 2014


In the following example, it is about 15-25% faster to use setnames rather
than j=list(name=var). Is there some better approach to referencing the
other joined column when using roll?

# Use j=list(name=var)
calc1 <- function(d) {
  d[ hit==1
   ][ d,list(hittime=time),roll=-20
   ][ !is.na(hittime)
   ]
}

# Use setnames
calc2 <- function(d) {
  temp <- d[ hit==1
           ][ d,time,roll=-20
           ]
  setnames(temp,3,"hittime")
  temp[!is.na(hittime)]
}

# Generate sample data
set.seed(12312391)
data <- data.table(
          group = sample(1e3,1e7,replace=T),
          time = ceiling(runif(1e7, 0, 1e5)),
          hit = rbinom(1e7, 1, p = 0.1),
  key=c("group","time"))

# Timing

system.time(replicate(10,{gc();calc1(data)})) => 69 sec
system.time(replicate(10,{gc();calc2(data)})) => 52 sec
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140630/b78a53f4/attachment.html>


More information about the datatable-help mailing list