[Rcpp-devel] [Rd] Performance of .C and .Call functions vs. native R code

Tue Jul 19 17:00:47 CEST 2011

On Thu, Jul 14, 2011 at 10:21 AM, Alireza Mahani
<alireza.s.mahani at gmail.com> wrote:
> (I am using a LINUX machine)
>
> Jeff,
>
> In creating reproducible results, I 'partially' answered my question. I have
> attached two scripts, 'mvMultiply.r' and 'mvMultiply.cc'. Please copy both
> files into your chosen directory, then run 'Rscript mvMultiply.r' in that
> directory while changing the two boolean parameters 'INCLUDE_DATAPREP' and
> 'ROWMAJOR' to all four permutations. (The variable 'diffVec' is there to
> verify that the two methods produce the same output vector.)
>
> Below are the results that I get, along with discussion (tR and tCall are in
> sec):
>
> INCLUDE_DATAPREP,ROWMAJOR,tR,tCall
> F,F,13.536,13.875
> F,T,13.824,14.299
> T,F,13.688,18.167
> T,T,13.982,30.730
>
> Interpretation: The execution time for the .Call line is nearly identical to
> the call to R operator '%*%'. Two data preparation lines for the .Call
> method create the overhead:
>
> A <- t(A) (~12sec, or 12usec per call)
> dim(A) <- dim(A)[1] * dim(A)[2] (~4sec, or 4usec per call)
>
> While the first line can be avoided by providing options in c++ function (as
> is done in the BLAS API), I wonder if the second line can be avoided, aside
> from the obvious option of rewriting the R scripts to use vectors instead of
> matrices. But this defies one of the primary advantages of using R, which is
> succinct, high-level coding. In particular, if one has several matrices as
> input into a .Call function, then the overhead from matrix-to-vector
> transformations can add up. To summarize, my questions are:
>
> 1- Do the above results seem reasonable to you? Is there a similar penalty
> in R's '%*%' operator for transforming matrices to vectors before calling
> BLAS functions?
> 2- Are there techniques for reducing the above overhead for developers
> looking to augment their R code with compiled code?
>
> Regards,
> Alireza
>
> ---------------------------------------
> # mvMultiply.r
> # comparing performance of matrix multiplication in R (using '%*%' operator)
> vs. calling compiled code (using .Call function)
> # y [m x 1] = A [m x n] %*% x [n x 1]
>
> rm(list = ls())
> system("R CMD SHLIB mvMultiply.cc")
> dyn.load("mvMultiply.so")
>
> INCLUDE_DATAPREP <- F
> ROWMAJOR <- F #indicates whether the c++ function treats A in a row-major or
> column-major fashion
>
> m <- 100
> n <- 10
> N <- 1000000
>
> diffVec <- array(0, dim = N)
> tR <- 0.0
> tCall <- 0.0
> for (i in 1:N) {
>
>        A <- runif(m*n); dim(A) <- c(m,n)
>        x <- runif(n)
>
>        t1 <- proc.time()[3]
>        y1 <- A %*% x
>        tR <- tR + proc.time()[3] - t1
>
>        if (INCLUDE_DATAPREP) {
>                t1 <- proc.time()[3]
>        }
>        if (ROWMAJOR) { #since R will convert matrix to vector in a column-major
> fashion, if the c++ function expects a row-major format, we need to
> transpose A before converting it to a vector
>                A <- t(A)
>        }
>        dim(A) <- dim(A)[1] * dim(A)[2]
>        if (!INCLUDE_DATAPREP) {
>                t1 <- proc.time()[3]
>        }
>        y2 <- .Call("matvecMultiply", as.double(A), as.double(x),
> as.logical(c(ROWMAJOR)))
>        tCall <- tCall + proc.time()[3] - t1
>
>        diffVec[i] <- max(abs(y2 - y1))
> }
> cat("Data prep time for '.Call' included: ", INCLUDE_DATAPREP, "\n")
> cat("C++ function expects row-major matrix: ", ROWMAJOR, "\n")
> cat("Time - Using '%*%' operator in R: ", tR, "sec\n")
> cat("Time - Using '.Call' function: ", tCall, "sec\n")
> cat("Maximum difference between methods: ", max(diffVec), "\n")
>
> dyn.unload("mvMultiply.so")
> ---------------------------------------
> # mvMultiply.cc
> #include <Rinternals.h>
> #include <R.h>
>
> extern "C" {
>
> SEXP matvecMultiply(SEXP A, SEXP x, SEXP rowmajor) {
>        double *rA = REAL(A), *rx = REAL(x), *ry;
>        int *rrm = LOGICAL(rowmajor);
>        int n = length(x);
>        int m = length(A) / n;
>        SEXP y;
>        PROTECT(y = allocVector(REALSXP, m));
>        ry = REAL(y);
>        for (int i = 0; i < m; i++) {
>                ry[i] = 0.0;
>                for (int j = 0; j < n; j++) {
>                        if (rrm[0] == 1) {
>                                ry[i] += rA[i * n + j] * rx[j];
>                        } else {
>                                ry[i] += rA[j * m + i] * rx[j];
>                        }
>                }
>        }
>        UNPROTECT(1);
>        return(y);
> }
>
> }
>

I realize that you are just beginning to use the .C and .Call
interfaces and your example is therefore a simple one.  However, if
you plan to continue with such development it is worthwhile learning
of some of the tools available.  I think one of the most important is
the "inline" package that can take a C or C++ code segment as a text
string and go through all the steps of creating and loading a
.Call'able compiled function.

Second, if you are going to use C++ (the code you show could be C code
as it doesn't use any C++ extensions) then you should look at the Rcpp
package written by Dirk Eddelbuettel and Romain Francois which allows
for comparatively painless interfacing of R objects and C++ objects.
The Rcpp-devel list, which I have copied on this reply, is for
questions related to that system.  The inline package allows for
various "plugin" constructions to wrap your code in the appropriate
headers and point the compiler to the locations of header files and
libraries.  There are two extensions to Rcpp for numerical linear
algebra in C++, RcppArmadillo and RcppEigen.  I show the use of
RcppEigen here.

Third there are several packages in R that do the busy work of
benchmarking expressions and neatly formulating the results.  I use
the rbenchmark package.

Putting all these together yields the enclosed script and results.

In Eigen, a MatrixXd object is the equivalent of R's numeric matrix
(similarly MatrixXi for integer and MatrixXcd for complex) and a
VectorXd object is the equivalent of a numeric vector.  A "mapped"
matrix or vector is one that uses the storage allocated by R, thereby
avoiding a copy operation (similar to your accessing elements of the
arrays through the pointer returned by REAL()).  To adhere to R's
functional programming semantics it is a good idea to declare such
objects as const.  The 'as' and 'wrap' functions are provided by Rcpp
with extensions in RcppEigen to the Eigen classes in the development
version.  In the released versions of Rcpp and RcppEigen we use
intermediate Rcpp objects. These functions have the advantage of
checking the types of R objects being passed.  The Eigen code for
matrix multiplication will check the consistency of dimensions in the
operation.

Given the size of the matrix you are working with it is not surprising
that interpretation overhead and checking will be a large part of the
elapsed time, hence the relative differences between different methods
of doing the numerical calculation will be small.  The operation of
multiplying a 100 x 10 matrix by a 10-vector involves "only" 1000
floating point operations.  Furthermore, each element of the matrix is
used only once so sophisticated methods of manipulating cache contents
won't buy you much.  These benchmark results are from a system that
uses Atlas BLAS (basic linear algebra subroutines); other systems will
provide different results.  Interestingly, I found on some systems
using R's BLAS, which are not accelerated, the R code is closer in
speed to the code using Eigen.  An example is given in the second
version of the output.
-------------- next part --------------

R Under development (unstable) (2011-07-19 r56429)
Copyright (C) 2011 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-unknown-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(rbenchmark)
> library(inline)
> library(RcppEigen)
Loading required package: Rcpp
> ## create an R function from Eigen-based C++ code
> ff <- if (packageVersion("RcppEigen") > "0.1.1") { #  development version
+     cxxfunction(signature(Xs = "matrix", ys = "numeric"), '
+     typedef  Eigen::Map<Eigen::MatrixXd>  MMatrixXd;
+     typedef  Eigen::Map<Eigen::VectorXd>  MVectorXd;
+ 
+     const MMatrixXd      Xe(as<MMatrixXd>(Xs));
+     const MVectorXd      ye(as<MVectorXd>(ys));
+     return   wrap(Xe * ye);
+ ', plugin = "RcppEigen")
+ } else {
+     cxxfunction(signature(Xs = "matrix", ys = "numeric"), '
+     typedef  Eigen::Map<Eigen::MatrixXd>  MMatrixXd;
+     typedef  Eigen::Map<Eigen::VectorXd>  MVectorXd;
+ 
+     const NumericMatrix   X(Xs);
+     const NumericVector   y(ys);
+     const MMatrixXd      Xe(X.rows(), X.cols(), X.begin());
+     const MVectorXd      ye(y.size(), y.begin());
+     Eigen::VectorXd     res = Xe * ye;
+     return    NumericVector(res.data(), res.data() + res.size());
+ ', plugin = "RcppEigen")
+ }
> 
> set.seed(1)                             # for reproducible results
> m <- 100
> n <- 10
> A <- matrix(runif(m * n), ncol = n)
> y <- runif(n)
> all.equal(as.vector(A %*% y), ff(A, y))
[1] TRUE
> benchmark(Rcode = expression(A %*% y), Eigen = ff(A, y), replications=100000)
   test replications elapsed relative user.self sys.self user.child sys.child
2 Eigen       100000   1.198 1.000000      1.19     0.00          0         0
1 Rcode       100000   1.890 1.577629      1.88     0.01          0         0
> m <- 1000
> n <- 1000
> A <- matrix(runif(m * n), ncol = n)
> y <- runif(n)
> all.equal(as.vector(A %*% y), ff(A, y))
[1] TRUE
> benchmark(Rcode = expression(A %*% y), Eigen = ff(A, y), replications=1000)
   test replications elapsed relative user.self sys.self user.child sys.child
2 Eigen         1000   3.259 1.000000      3.25     0.00          0         0
1 Rcode         1000  15.648 4.801473     17.27     0.39          0         0
> 
> proc.time()
   user  system elapsed 
 31.070   1.160  30.495 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: matvec.R
Type: application/octet-stream
Size: 1417 bytes
Desc: not available
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20110719/d950bac0/attachment.obj>
-------------- next part --------------

R version 2.13.1 (2011-07-08)
Copyright (C) 2011 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-unknown-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(rbenchmark)
> library(inline)
> library(RcppEigen)
Loading required package: Rcpp
> ## create an R function from Eigen-based C++ code
> ff <- if (packageVersion("RcppEigen") > "0.1.1") { #  development version
+     cxxfunction(signature(Xs = "matrix", ys = "numeric"), '
+     typedef  Eigen::Map<Eigen::MatrixXd>  MMatrixXd;
+     typedef  Eigen::Map<Eigen::VectorXd>  MVectorXd;
+ 
+     const MMatrixXd      Xe(as<MMatrixXd>(Xs));
+     const MVectorXd      ye(as<MVectorXd>(ys));
+     return   wrap(Xe * ye);
+ ', plugin = "RcppEigen")
+ } else {
+     cxxfunction(signature(Xs = "matrix", ys = "numeric"), '
+     typedef  Eigen::Map<Eigen::MatrixXd>  MMatrixXd;
+     typedef  Eigen::Map<Eigen::VectorXd>  MVectorXd;
+ 
+     const NumericMatrix   X(Xs);
+     const NumericVector   y(ys);
+     const MMatrixXd      Xe(X.begin(), X.rows(), X.cols());
+     const MVectorXd      ye(y.begin(), y.size());
+     Eigen::VectorXd     res = Xe * ye;
+     return    NumericVector(res.data(), res.data() + res.size());
+ ', plugin = "RcppEigen")
+ }
> 
> set.seed(1)                             # for reproducible results
> m <- 100
> n <- 10
> A <- matrix(runif(m * n), ncol = n)
> y <- runif(n)
> all.equal(as.vector(A %*% y), ff(A, y))
[1] TRUE
> benchmark(Rcode = expression(A %*% y), Eigen = ff(A, y), replications=100000)
   test replications elapsed relative user.self sys.self user.child sys.child
2 Eigen       100000   0.808 1.000000     0.808    0.001          0         0
1 Rcode       100000   1.037 1.283416     1.031    0.006          0         0
> m <- 1000
> n <- 1000
> A <- matrix(runif(m * n), ncol = n)
> y <- runif(n)
> all.equal(as.vector(A %*% y), ff(A, y))
[1] TRUE
> benchmark(Rcode = expression(A %*% y), Eigen = ff(A, y), replications=1000)
   test replications elapsed relative user.self sys.self user.child sys.child
2 Eigen         1000   1.957 1.000000     1.956        0          0         0
1 Rcode         1000   6.837 3.493613     6.835        0          0         0
> 
> proc.time()
   user  system elapsed 
 14.178   0.388  14.527