[Rcpp-devel] Significant difference in performance when computing Ax

Sun Apr 27 20:34:10 CEST 2014

I feel foolish now. I forgot to note that I build R using a Win64
version of OpenBLAS (see
<http://www.avrahamadler.com/2014/04/20/r-3-1-0-openblas-speed-comparisons/>
for more). I'm certain that the speed in tcrossprod is coming from an
optimized compile of R and BLAS. My apologies for any confusion.

Avi

On Sun, Apr 27, 2014 at 12:44 PM, Søren Højsgaard <sorenh at math.aau.dk> wrote:
> I also tried the tcrossprod but with results very different from what you
> report. I wonder why???
>
> I also used RcppArmadillo (mvprod3 below) which performs similar to mvprod2.
>
>
>
> Cheers
>
> Søren
>
>
>
>> all.equal(mvprod1(A,x), mvprod2(A,x), mvprod3(A,x),
>
> + A%*%x, tcrossprod(x, A))
>
> [1] TRUE
>
>
>
>> library(microbenchmark)
>
>
>
>> microbenchmark(mvprod1(A,x), mvprod2(A,x), mvprod3(A,x), A%*%x,
>
> + tcrossprod(x,A))
>
> Unit: milliseconds
>
>              expr       min        lq    median        uq       max neval
>
>     mvprod1(A, x) 35.854102 36.151495 36.362352 36.658112 38.680147   100
>
>     mvprod2(A, x)  4.409340  4.429632  4.451791  4.522232  4.975201   100
>
>     mvprod3(A, x)  2.965528  2.977657  2.988386  3.043900  3.270617   100
>
>           A %*% x  8.506591  8.566303  8.637910  8.752902  9.233395   100
>
> tcrossprod(x, A) 36.787099 37.232138 37.695370 41.378139 43.208443   100
>
>
>
>
>
> // [[Rcpp::export]]
>
> arma::vec mvprod3(arma::mat& A, arma::vec& x ){
>
>    return A*x;
>
> }
>
>
>
>> sessionInfo()
>
> R version 3.1.0 Patched (2014-04-15 r65398)
>
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
>
>
> locale:
>
> [1] LC_COLLATE=Danish_Denmark.1252  LC_CTYPE=Danish_Denmark.1252
> LC_MONETARY=Danish_Denmark.1252
>
> [4] LC_NUMERIC=C                    LC_TIME=Danish_Denmark.1252
>
>
>
> attached base packages:
>
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
>
>
> other attached packages:
>
> [1] microbenchmark_1.3-0    RcppArmadillo_0.4.200.0 Rcpp_0.11.1
> devtools_1.5
>
> [5] shTools_1.0             markdown_0.6.5          knitr_1.5
>
>
>
> loaded via a namespace (and not attached):
>
> [1] compiler_3.1.0 digest_0.6.4   evaluate_0.5.3 formatR_0.10   httr_0.3
> memoise_0.1    parallel_3.1.0
>
> [8] RCurl_1.95-4.1 stringr_0.6.2  tools_3.1.0    whisker_0.3-2
>
>
>
>
>
>
>
>
>
>
>
> From: Avraham Adler [mailto:avraham.adler at gmail.com]
> Sent: 25. april 2014 19:43
> To: Søren Højsgaard
> Cc: rcpp-devel at lists.r-forge.r-project.org
> Subject: Re: [Rcpp-devel] Significant difference in performance when
> computing Ax
>
>
>
> There is actually one other command to test, `crossprod`, or in this case
> `tcrossprod(x, A)`.
>
> Following your code, first run and ensure TRUE is returned.
>
> d <- 2000
> A <- matrix(1.0*(1:d^2),nrow=d)
> x <- 1.0*(1:d)
> all.equal(A%*%x, tcrossprod(x, A), mvprod1(A,x), mvprod2(A,x))
>
>
>
> Now, using an i7-3740QM 2.7Ghz CPU with 8GB of RAM as a testbed on Windows 7
> (with R & Rcpp compiled from source throwing "-march=core-avx-i -O3
> -std=gnu++0x --param l1-cache-line-size=64 --param l1-cache-size=32 --param
> l2-cache-size=256" FWIW) I get the following timings:
>
> microbenchmark(A%*%x, tcrossprod(x, A), mvprod1(A, x), mvprod2(A, x), times
> = 1000L, control = list(order = 'block'))
>
>
>
>
>
>
> Unit: milliseconds
>
>              expr       min        lq    median        uq       max neval
>
>           A %*% x 13.183992 13.242965 13.269027 13.317917 16.680117  1000
>
>  tcrossprod(x, A)  6.967139  6.991869  7.004424  7.020214  9.795921  1000
>
>     mvprod1(A, x) 43.089952 43.134848 43.163953 43.489824 45.129644  1000
>
>     mvprod2(A, x)  4.564480  4.883693  4.911848  4.928208  7.435495  1000
>
>
>
> So the crossprod function is decently optimized as it is, but the prod2 is
> still faster.
>
> Thanks for idea and post!
>
> Avi
>
>
>
>
>
> On Fri, Apr 25, 2014 at 12:47 PM, Søren Højsgaard <sorenh at math.aau.dk>
> wrote:
>
> Dear all,
>
> When forming the matrix-vector-product one can form the inner products
> between As rows and x (mvprod1 below) or form a linear combination of As
> columns (mvprod2 below). The difference in computing time is striking (at
> least for me) for large matrices and it really illustrates the “caching
> issue” (I assume):
>
>
>
>> microbenchmark(mvprod1(A,x), mvprod2(A,x), A%*%x)
>
> Unit: milliseconds
>
>           expr               min            lq                  median
> uq          max              neval
>
> mvprod1(A, x) 35.783812 36.310956 36.600186 36.991113 38.922889   100
>
> mvprod2(A, x)   4.408892    4.436882   4.475368   4.568668   5.477176   100
>
>        A %*% x         8.514091    8.592230   8.734978   8.811951   9.738653
> 100
>
>
>
> Just wanted to share this!
>
> Cheers
>
> Søren
>
>
>
>
>
>
>
> #include <Rcpp.h>
>
> using namespace Rcpp;
>
>
>
> //[[Rcpp::export]]
>
> NumericVector mvprod1 (NumericMatrix A, NumericVector x){
>
>   int nrA=A.nrow(), ncA=A.ncol(), i, j;
>
>   NumericVector y(nrA);
>
>   for (i=0; i<nrA; ++i){
>
>     for (j=0; j<ncA; ++j){
>
>       y[i] += A(i,j)*x[j];
>
>     }
>
>   }
>
>   return y;
>
> }
>
>
>
> //[[Rcpp::export]]
>
> NumericVector mvprod2 (NumericMatrix A, NumericVector x){
>
>   int nrA=A.nrow(), ncA=A.ncol(), i, j;
>
>   NumericVector y(nrA);
>
>   for (j=0; j<ncA; ++j){
>
>     for (i=0; i<nrA; ++i){
>
>       y[i] += A(i,j)*x[j];
>
>     }
>
>   }
>
>   return y;
>
> }
>
>
>
>
>
>
>
> /*** R
>
> d <- 5
>
> A <- matrix(1.0*(1:d^2),nrow=d)
>
> x <- 1.0*(1:d)
>
>
>
> A%*%x
>
> mvprod1(A,x)
>
> mvprod2(A,x)
>
>
>
> d <- 2000
>
> A <- matrix(1.0*(1:d^2),nrow=d)
>
> x <- 1.0*(1:d)
>
>
>
> library(microbenchmark)
>
> microbenchmark(mvprod1(A,x), mvprod2(A,x), A%*%x)
>
> */
>
>
> _______________________________________________
> Rcpp-devel mailing list
> Rcpp-devel at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
>
>