[Rcppdevel] Best way to compute M'M if M is triangular using RcppArmadillo
Dirk Eddelbuettel
edd at debian.org
Wed Apr 17 15:16:57 CEST 2013
On 17 April 2013 at 14:24, F.Tusell wrote:
 Not a question strictly about Rcpp but hope this is a right
 place to ask.

 I am trying to find out what is the fastest possible way to compute
 M'M for M upper triangular. I have coded,

 // [[Rcpp::export]]
 SEXP Prod1(SEXP M_) {
 const mat M = as<mat>(M_) ;
 mat prodM = M ;
 prodM = trans(M) * M ;
 return(wrap(prodM)) ;
 }

 // [[Rcpp::export]]
 SEXP Prod2(SEXP M_) {
 const mat M = as<mat>(M_) ;
 mat prodM = M ;
 prodM = trimatu(M).t() * trimatu(M) ;
 return(wrap(prodM)) ;
 }


 // [[Rcpp::export]]
 SEXP Prod3(SEXP M_) {
 mat M = as<mat>(M_) ;
 int d = M.n_rows ;
 mat prodM = M ;
 double * vM = M.memptr() ;
 double * vprodM = prodM.memptr() ;
 const double one = 1 ;
 const double zero = 0 ;
 F77_CALL(dsyrk)("L","T",&d,&d,&one,vM,&d,&zero,vprodM,&d) ;
 return(wrap(prodM)) ;
 }

 and then tested with:

 require(RcppArmadillo)
 require(microbenchmark)
 sourceCpp("prods.cpp")
 d < 50
 a < chol(crossprod(matrix(rnorm(d*d),d,d)))

 m < microbenchmark(
 r1 < Prod1(a),
 r2 < Prod2(a),
 r3 < Prod3(a),
 times=100
 )

 This is what I get:

 > m
 Unit: microseconds
 expr min lq median uq max neval
 r1 < Prod1(a) 138.749 144.2260 148.4815 159.0830 2456.146 100
 r2 < Prod2(a) 296.193 320.0275 329.4770 342.4185 2763.041 100
 r3 < Prod3(a) 132.150 138.2590 140.9270 152.3675 218.719 100

 Prod3 using BLAS dsyrk is about as fast as Prod1, using Armadillo.
 I expected that telling Armadillo that M is upper triangular would
 make for a fastest product, and the contrary seems true; Prod2 takes
 about twice as much time as Prod1 and Prod3.

 Is there a faster way?
You are going about this the right way by starting with empirics.
Now, d=50 is not big so your (relative) savings will be rather small. Plus,
the way you instantiate the Arma object DOES create extra copies on the arma
object instantiation [ see discussion this list just last week ] as well as
on exit  which may dominate the timing. Try d=100, 200, ... 500 for
comparison.
Lastly, you could write prod3 without going to arma.
Dirk

Dirk Eddelbuettel  edd at debian.org  http://dirk.eddelbuettel.com
More information about the Rcppdevel
mailing list