[datatable-help] data.table vs matrix speed

Steve Bellan steve.bellan at gmail.com
Wed Jul 9 17:30:02 CEST 2014


I'm trying to optimize the speed of a script that iteratively updates state variables for several thousands of individuals through time though only some individuals are active at each point in time. I had been doing this with matrices but was wondering how it compared with data.table since the latter seems to be more readable. I'm finding that my data.table implementation is about 2-3 times faster, which seems surprising since I thought matrices should be faster. It makes me wonder if there are ways to speed up either implementation. Any help is much appreciated! Here's an example of the code:


n <- 10^5
k <- 9
serostates <- matrix(0,n,k)
serostates <- as.data.table(serostates)
setnames(serostates, 1:k, c('s..', 'mb.a1', 'mb.a2', 'mb.', 'f.ba1', 'f.ba2', 'f.b', 'hb1b2', 'hb2b1'))
serostates[, `:=`(s.. = 1)]
serostates
serostatesMat <- as.matrix(serostates)

pre.coupleDT <- function(serostates, sexually.active) {
    serostates[sexually.active , `:=`(
        s..   = s.. * (1-p.m.bef) * (1-p.f.bef),
        mb.a1 = s.. * p.m.bef * (1-p.f.bef),
        mb.a2 = mb.a1 * (1 - p.f.bef),
        mb.   = mb.a2 * (1 - p.f.bef) + mb. * (1 - p.f.bef),
        f.ba1 = s.. * p.f.bef * (1-p.m.bef),
        f.ba2 = f.ba1 * (1 - p.m.bef),
        f.b   = f.ba2 * (1 - p.m.bef) + f.b * (1 - p.m.bef),
        hb1b2 = hb1b2 + .5  *  s.. * p.m.bef * p.f.bef + (mb.a1 + mb.a2 + mb.)  *  p.f.bef,
        hb2b1 = hb2b1 + .5  *  s.. * p.m.bef * p.f.bef + (f.ba1 + f.ba2 + f.b)  *  p.m.bef)
           ]
    return(serostates)
}


pre.coupleMat <- function(serostates, sexually.active) {
    temp <- serostates[sexually.active,]
    temp[,'s..']   = temp[,'s..'] * (1-p.m.bef) * (1-p.f.bef)
    temp[,'mb.a1'] = temp[,'s..'] * p.m.bef * (1-p.f.bef)
    temp[,'mb.a2'] = temp[,'mb.a1'] * (1 - p.f.bef)
    temp[,'mb.'] = temp[,'mb.a2'] * (1 - p.f.bef) + temp[,'mb.'] * (1 - p.f.bef)
    temp[,'f.ba1'] = temp[,'s..'] * p.f.bef * (1-p.m.bef)
    temp[,'f.ba2'] = temp[,'f.ba1'] * (1 - p.m.bef)
    temp[,'f.b'] = temp[,'f.ba2'] * (1 - p.m.bef) + temp[,'f.b'] * (1 - p.m.bef)
    temp[,'hb1b2'] = temp[,'hb1b2'] + .5  *  temp[,'s..'] * p.m.bef * p.f.bef + (temp[,'mb.a1'] + temp[,'mb.a2'] + temp[,'mb.'])  *  p.f.bef
    temp[,'hb2b1'] = temp[,'hb2b1'] + .5  *  temp[,'s..'] * p.m.bef * p.f.bef + (temp[,'f.ba1'] + temp[,'f.ba2'] + temp[,'f.b'])  *  p.m.bef
serostates[sexually.active,] <- temp
return(serostates)
}

sexually.active <- rbinom(n, 1,.5)==1
p.m.bef <- .5
p.f.bef <- .8

system.time(
    for(ii in 1:100) {
        serostates <- pre.couple(serostates, sexually.active)
    }
    ) ## about 2.25 seconds


system.time(
    for(ii in 1:100) {
        serostatesMat <- pre.coupleMat(serostatesMat, sexually.active)
    }
    ) ## about 6 seconds



More information about the datatable-help mailing list