[Rcpp-devel] Best way to compute M'M if M is triangular using RcppArmadillo

Wed Apr 17 16:01:35 CEST 2013

I agree with Dirk. Take a look at the arma::mat constructor in fastLm. If you are not modifying M, use the raw memory constructor.

http://gallery.rcpp.org/articles/fast-linear-model-with-armadillo/

I don't see any other way to avoid using Rcpp::wrap in the return statement. Note there may be other places you copy memory, for instance, in Prod1 you write

|   mat prodM = M ;
|   prodM = trans(M) * M  ;

I think the first statement is not necessary.

Dale Smith, Ph.D.
Senior Financial Quantitative Analyst
Risk & Compliance
Fiserv
Office: 678-375-5315
www.fiserv.com

-----Original Message-----
From: rcpp-devel-bounces at r-forge.wu-wien.ac.at [mailto:rcpp-devel-bounces at r-forge.wu-wien.ac.at] On Behalf Of Dirk Eddelbuettel
Sent: Wednesday, April 17, 2013 9:17 AM
To: fernando.tusell at ehu.es
Cc: rcpp-devel at r-forge.wu-wien.ac.at
Subject: Re: [Rcpp-devel] Best way to compute M'M if M is triangular using RcppArmadillo

On 17 April 2013 at 14:24, F.Tusell wrote:
| Not a question strictly about Rcpp but hope this is a right place to 
| ask.
| 
| I am trying to find out what is the fastest possible way to compute 
| M'M for M upper triangular. I have coded,
| 
| // [[Rcpp::export]]
| SEXP Prod1(SEXP M_) {
|   const mat M = as<mat>(M_) ;
|   mat prodM = M ;
|   prodM = trans(M) * M  ;
|   return(wrap(prodM)) ;
| }
| 
| // [[Rcpp::export]]
| SEXP Prod2(SEXP M_) {
|   const mat M = as<mat>(M_) ;
|   mat prodM = M ;
|   prodM = trimatu(M).t() * trimatu(M)  ;
|   return(wrap(prodM)) ;
| }
| 
| 
| // [[Rcpp::export]]
| SEXP Prod3(SEXP M_) {
|   mat M = as<mat>(M_) ;
|   int d = M.n_rows ;
|   mat prodM = M ;
|   double * vM = M.memptr() ;
|   double * vprodM = prodM.memptr() ;
|   const double one = 1 ;
|   const double zero = 0 ;
|   F77_CALL(dsyrk)("L","T",&d,&d,&one,vM,&d,&zero,vprodM,&d)  ;
|   return(wrap(prodM)) ;
| }
| 
| and then tested with:
| 
| require(RcppArmadillo)
| require(microbenchmark)
| sourceCpp("prods.cpp")
| d <- 50
| a <- chol(crossprod(matrix(rnorm(d*d),d,d)))
| 
| m <- microbenchmark(
|        r1 <- Prod1(a),
|        r2 <- Prod2(a),
|        r3 <- Prod3(a),
|        times=100
| 	       )
| 
| This is what I get:
| 
| > m
| Unit: microseconds
|            expr     min       lq   median       uq      max neval
|  r1 <- Prod1(a) 138.749 144.2260 148.4815 159.0830 2456.146   100
|  r2 <- Prod2(a) 296.193 320.0275 329.4770 342.4185 2763.041   100
|  r3 <- Prod3(a) 132.150 138.2590 140.9270 152.3675  218.719   100
| 
| Prod3 using BLAS dsyrk is about as fast as Prod1, using Armadillo.
| I expected that telling Armadillo that M is upper triangular would 
| make for a fastest product, and the contrary seems true; Prod2 takes 
| about twice as much time as Prod1 and Prod3.
| 
| Is there a faster way?

You are going about this the right way by starting with empirics. 

Now, d=50 is not big so your (relative) savings will be rather small.  Plus, the way you instantiate the Arma object DOES create extra copies on the arma object instantiation [ see discussion this list just last week ] as well as on exit -- which may dominate the timing. Try d=100, 200, ... 500 for comparison. 

Lastly, you could write prod3 without going to arma.

Dirk

--
Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com _______________________________________________
Rcpp-devel mailing list
Rcpp-devel at lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel