[Rcpp-devel] Best way to compute M'M if M is triangular using RcppArmadillo

F.Tusell fernando.tusell at ehu.es
Wed Apr 17 17:24:19 CEST 2013


El mié, 17-04-2013 a las 14:01 +0000, Smith, Dale (Norcross) escribió:
> I agree with Dirk. Take a look at the arma::mat constructor in fastLm. If you are not modifying M, use the raw memory constructor.
> 
> http://gallery.rcpp.org/articles/fast-linear-model-with-armadillo/
> 
> I don't see any other way to avoid using Rcpp::wrap in the return statement. Note there may be other places you copy memory, for instance, in Prod1 you write
> 
> |   mat prodM = M ;
> |   prodM = trans(M) * M  ;
> 

True. But in Prod3 I needed a pre-existent target array to store the
result, so I defined it also in Prod1 and Prod2 to make things alike.
Taking a copy of M is indeed a very clumsy way to initialize an array
of the right size.

> I think the first statement is not necessary.
> 
> Dale Smith, Ph.D.
> Senior Financial Quantitative Analyst
> Risk & Compliance
> Fiserv
> Office: 678-375-5315
> www.fiserv.com
> 
> 
> -----Original Message-----
> From: rcpp-devel-bounces at r-forge.wu-wien.ac.at [mailto:rcpp-devel-bounces at r-forge.wu-wien.ac.at] On Behalf Of Dirk Eddelbuettel
> Sent: Wednesday, April 17, 2013 9:17 AM
> To: fernando.tusell at ehu.es
> Cc: rcpp-devel at r-forge.wu-wien.ac.at
> Subject: Re: [Rcpp-devel] Best way to compute M'M if M is triangular using RcppArmadillo
> 
> 
> On 17 April 2013 at 14:24, F.Tusell wrote:
> | Not a question strictly about Rcpp but hope this is a right place to 
> | ask.
> | 
> | I am trying to find out what is the fastest possible way to compute 
> | M'M for M upper triangular. I have coded,
> | 
> | // [[Rcpp::export]]
> | SEXP Prod1(SEXP M_) {
> |   const mat M = as<mat>(M_) ;
> |   mat prodM = M ;
> |   prodM = trans(M) * M  ;
> |   return(wrap(prodM)) ;
> | }
> | 
> | // [[Rcpp::export]]
> | SEXP Prod2(SEXP M_) {
> |   const mat M = as<mat>(M_) ;
> |   mat prodM = M ;
> |   prodM = trimatu(M).t() * trimatu(M)  ;
> |   return(wrap(prodM)) ;
> | }
> | 
> | 
> | // [[Rcpp::export]]
> | SEXP Prod3(SEXP M_) {
> |   mat M = as<mat>(M_) ;
> |   int d = M.n_rows ;
> |   mat prodM = M ;
> |   double * vM = M.memptr() ;
> |   double * vprodM = prodM.memptr() ;
> |   const double one = 1 ;
> |   const double zero = 0 ;
> |   F77_CALL(dsyrk)("L","T",&d,&d,&one,vM,&d,&zero,vprodM,&d)  ;
> |   return(wrap(prodM)) ;
> | }
> | 
> | and then tested with:
> | 
> | require(RcppArmadillo)
> | require(microbenchmark)
> | sourceCpp("prods.cpp")
> | d <- 50
> | a <- chol(crossprod(matrix(rnorm(d*d),d,d)))
> | 
> | m <- microbenchmark(
> |        r1 <- Prod1(a),
> |        r2 <- Prod2(a),
> |        r3 <- Prod3(a),
> |        times=100
> | 	       )
> | 
> | This is what I get:
> | 
> | > m
> | Unit: microseconds
> |            expr     min       lq   median       uq      max neval
> |  r1 <- Prod1(a) 138.749 144.2260 148.4815 159.0830 2456.146   100
> |  r2 <- Prod2(a) 296.193 320.0275 329.4770 342.4185 2763.041   100
> |  r3 <- Prod3(a) 132.150 138.2590 140.9270 152.3675  218.719   100
> | 
> | Prod3 using BLAS dsyrk is about as fast as Prod1, using Armadillo.
> | I expected that telling Armadillo that M is upper triangular would 
> | make for a fastest product, and the contrary seems true; Prod2 takes 
> | about twice as much time as Prod1 and Prod3.
> | 
> | Is there a faster way?
> 
> You are going about this the right way by starting with empirics. 
> 
> Now, d=50 is not big so your (relative) savings will be rather small.  Plus, the way you instantiate the Arma object DOES create extra copies on the arma object instantiation [ see discussion this list just last week ] as well as on exit -- which may dominate the timing. Try d=100, 200, ... 500 for comparison. 
> 
> Lastly, you could write prod3 without going to arma.
> 
> Dirk

I did try larger d, and to my surprise the relative timings stay about
the same. Conrad S provides one piece of information I was missing, that
trimatu() and trimatl() only have (as yet) effect with inv() and
solve(), hence their use in Prod3 is just a waste of time. This explains
the results.

For the time being, I will stick with trans(M) * M. A function like R's
crossprod() in Armadillo would be great. Thank you all for your help.

ft.



More information about the Rcpp-devel mailing list