[datatable-help] data.table vs matrix speed
Steve Bellan
steve.bellan at gmail.com
Wed Jul 9 17:30:02 CEST 2014
I'm trying to optimize the speed of a script that iteratively updates state variables for several thousands of individuals through time though only some individuals are active at each point in time. I had been doing this with matrices but was wondering how it compared with data.table since the latter seems to be more readable. I'm finding that my data.table implementation is about 2-3 times faster, which seems surprising since I thought matrices should be faster. It makes me wonder if there are ways to speed up either implementation. Any help is much appreciated! Here's an example of the code:
n <- 10^5
k <- 9
serostates <- matrix(0,n,k)
serostates <- as.data.table(serostates)
setnames(serostates, 1:k, c('s..', 'mb.a1', 'mb.a2', 'mb.', 'f.ba1', 'f.ba2', 'f.b', 'hb1b2', 'hb2b1'))
serostates[, `:=`(s.. = 1)]
serostates
serostatesMat <- as.matrix(serostates)
pre.coupleDT <- function(serostates, sexually.active) {
serostates[sexually.active , `:=`(
s.. = s.. * (1-p.m.bef) * (1-p.f.bef),
mb.a1 = s.. * p.m.bef * (1-p.f.bef),
mb.a2 = mb.a1 * (1 - p.f.bef),
mb. = mb.a2 * (1 - p.f.bef) + mb. * (1 - p.f.bef),
f.ba1 = s.. * p.f.bef * (1-p.m.bef),
f.ba2 = f.ba1 * (1 - p.m.bef),
f.b = f.ba2 * (1 - p.m.bef) + f.b * (1 - p.m.bef),
hb1b2 = hb1b2 + .5 * s.. * p.m.bef * p.f.bef + (mb.a1 + mb.a2 + mb.) * p.f.bef,
hb2b1 = hb2b1 + .5 * s.. * p.m.bef * p.f.bef + (f.ba1 + f.ba2 + f.b) * p.m.bef)
]
return(serostates)
}
pre.coupleMat <- function(serostates, sexually.active) {
temp <- serostates[sexually.active,]
temp[,'s..'] = temp[,'s..'] * (1-p.m.bef) * (1-p.f.bef)
temp[,'mb.a1'] = temp[,'s..'] * p.m.bef * (1-p.f.bef)
temp[,'mb.a2'] = temp[,'mb.a1'] * (1 - p.f.bef)
temp[,'mb.'] = temp[,'mb.a2'] * (1 - p.f.bef) + temp[,'mb.'] * (1 - p.f.bef)
temp[,'f.ba1'] = temp[,'s..'] * p.f.bef * (1-p.m.bef)
temp[,'f.ba2'] = temp[,'f.ba1'] * (1 - p.m.bef)
temp[,'f.b'] = temp[,'f.ba2'] * (1 - p.m.bef) + temp[,'f.b'] * (1 - p.m.bef)
temp[,'hb1b2'] = temp[,'hb1b2'] + .5 * temp[,'s..'] * p.m.bef * p.f.bef + (temp[,'mb.a1'] + temp[,'mb.a2'] + temp[,'mb.']) * p.f.bef
temp[,'hb2b1'] = temp[,'hb2b1'] + .5 * temp[,'s..'] * p.m.bef * p.f.bef + (temp[,'f.ba1'] + temp[,'f.ba2'] + temp[,'f.b']) * p.m.bef
serostates[sexually.active,] <- temp
return(serostates)
}
sexually.active <- rbinom(n, 1,.5)==1
p.m.bef <- .5
p.f.bef <- .8
system.time(
for(ii in 1:100) {
serostates <- pre.couple(serostates, sexually.active)
}
) ## about 2.25 seconds
system.time(
for(ii in 1:100) {
serostatesMat <- pre.coupleMat(serostatesMat, sexually.active)
}
) ## about 6 seconds
More information about the datatable-help
mailing list