[Rcpp-devel] performance profile from Rcpp or RcppArmadillo code? Matrix subviews, etc

Paul Johnson pauljohn32 at gmail.com
Sun Jan 6 03:50:23 CET 2013


I received a big pile of Rcpp code from some folks and am using it as
a way to learn about Armadillo and Rcpp.

>From gazing at the code, and guessing about style and usage, I some
ways to make a program run faster.  But it is all by guessing and
testing.

I wish I could get a profile (via gcc) to find out where the Rcpp part
is spending its time.

Has anybody made a list of "dos and don'ts" for making faster
RcppArmadillo code?

So far, I'm absolutely sure that RcppArmadillo follows many of the
usual rules about fast code. Don't allocate huge matrices over and
over for no good reason, for example. Armadillo has the delayed
processing feature that automatically accelerates some calculations.
That's quite amazing to me.

But some of the practical details of writing code are uncertain. In
particular, I wonder about the slowdown in making calculations on
submatrices and the relative penalty you pay for re-calculating things
versus allocating space to store them. (I once saw a demonstration
that it is faster to recalculate multiplication problems than it is to
store the results in some cases....)

Here are some of the things I would like to read about. I have noticed
in R that accessing subsections of a matrix is slow, compared to
analyzing the whole thing. For example. It is SLOWER to repeatedly
work on a subsection of a matrix x than it is to extract that piece
one time and work on the copy. I'm pasting in the R code to
demonstrate this at the end of this message.

Does Armadillo follow that same logic, so that repeatedly accessing a
subview of a matrix is slower than creating a new copy of the
submatrix and then working on it?

Along those lines, has the performance of the Armadillo non-contiguous
matrix access been assessed? Are we well advised to just allocate new
matrices and access the non-contiguous subviews one time, than to
access them over and over?

Given a matrix g and vectors k1 and k2 that contain row numbers, we
want to do math on these sub matrices.

g(k1, k1)
g(k1, k2)
g(k2, k2)

I am *guessing* that repeated access to a non-contiguous matrix
subview is slower than accessing a contiguous block out of the matrix.
And accessing a whole matrix would be faster still. So maybe a person
ought to allocate.

arma::mat g11 = g(k1, k1);

rather than grabbing g(k1, k1) over and over. Know what I mean?

Oh, well. here's the example about R submatrix access

## fasterR-matrix-1.R
## Paul Johnson
## 2012-12-23

nReps <- 5000
nVars <- 100
nRows <- 200


## fn1 with elements is the fastest case scenario
##
fn1 <- function(x){
    solve(x)
}


## extract 2:(nVars+1) rows and columns, returns "anonymously"
fn2 <- function(x){
    y <- x[2:(nVars+1), 2:(nVars+1)]
    solve(y)
}


## names new matrix to return
fn3 <- function(x){
    y <- x[2:(nVars+1), 2:(nVars+1)]
    yinv <- solve(y)
}

## manufacture new matrix, copy results to it
fn4 <- function(x){
    y <- x[2:(nVars+1), 2:(nVars+1)]
    yinv <- solve(y)
    z <- matrix(0, nVars, nVars)
    z <- yinv
}

## manufacuture bigger matrix, copy results into part of it
fn5 <- function(x){
    y <- x[2:(nVars+1), 2:(nVars+1)]
    yinv <- solve(y)
    z <- matrix(0, (nVars+1), (nVars+1))
    z[2:(nVars+1), 2:(nVars+1)] <- yinv
}


## Now make up test data.
## Manufacture symmetric matrices
Xlist1 <- vector("list", nReps)
for(i in 1:nReps){
    x <- matrix(rnorm(nRows * nVars), nRows, nVars)
    Xlist1[[i]] <- crossprod(x)
}

## create padded theta-like matrix
Xlist2 <- vector("list", nReps)
for(i in 1:nReps){
    y <- Xlist1[[i]]
    y <- rbind(c(rep(1, nVars)), y)
    y <- cbind(c(-1, rep(1,nVars)), y)
    Xlist2[[i]] <- y
}


system.time(
    fn1l <- lapply(Xlist1, fn1)
    )

system.time(
    fn2l <- lapply(Xlist2, fn2)
    )

system.time(
    fn3l <- lapply(Xlist2, fn3)
    )


system.time(
    fn4l <- lapply(Xlist2, fn4)
    )


system.time(
    fn5l <- lapply(Xlist2, fn5)
    )

## Conclusion. It IS slower to repeatedly extract rows & columns 2:(nVars+1)
## than to just work with a whole matrix of nVars x nVars values.


-- 
Paul E. Johnson
Professor, Political Science      Assoc. Director
1541 Lilac Lane, Room 504      Center for Research Methods
University of Kansas                 University of Kansas
http://pj.freefaculty.org               http://quant.ku.edu


More information about the Rcpp-devel mailing list