[Rcpp-devel] RcppEigen now on CRAN

Tue Jun 28 19:27:50 CEST 2011

On Mon, Jun 27, 2011 at 8:16 PM, Douglas Bates <bates at stat.wisc.edu> wrote:

By the way, my apologies Baptiste for misstating your last name in my
earlier message.

> On Mon, Jun 27, 2011 at 3:49 PM, baptiste auguie
> <baptiste.auguie at googlemail.com> wrote:
>> Hi,
>>
>> Another august piece work, thanks.
>> As a happy user of Armadillo (guess what, this week my packages even
>> passed win-builder! Yeah!) I have a general question: between (Rcpp),
>> RcppGSL, RcppArmadillo, RcppEigen, we now have so much choice that it
>> may be difficult to decide which framework to use for a particular
>> project. Are there comparative charts somewhere, in terms of function
>> coverage, speed? (those I found did not cover these three libraries).
>
> There is a benchmark comparing different methods of obtaining the
> coefficients and the standard errors for linear regression models.
> You can run the benchmark as
>
> library(RcppEigen)
> source(file.path(system.file("examples", package="RcppEigen"),
> "lmBenchmark.R"), echo=TRUE)

Actually that can be shortened to

library(RcppEigen)
source(system.file("examples", "lmBenchmark.R", package="RcppEigen"), echo=TRUE)

I have another timing on the benchmark, this time from my desktop on
which I have a locally-compiled version of atlas installed, thus
giving RcppArmadillo a boost.

lm benchmark for n = 100000 and p = 40: nrep = 20
     test   relative elapsed user.self sys.self
7     LLt   1.000000   1.132      1.14     0.00
3    LDLt   1.019435   1.154      1.16     0.00
5 SymmEig   2.932862   3.320      2.56     0.75
2   PivQR   7.891343   8.933      8.26     0.67
8    arma   7.897527   8.940      8.64     0.02
6      QR   8.679329   9.825      9.07     0.74
1  lm.fit  13.082155  14.809     13.38     1.40
4     SVD  64.440813  72.947     71.86     1.01
9     GSL 156.208481 176.828    175.86     0.80

You can see what I had stated before, that the performance of the
RcppGSL fastLm function is terrible - more than twice as slow as
creating a singular value decomposition and 150 times slower than
using a rank-revealing Cholesky decomposition.   It turns out that it
uses the gsl_multifit_linear function which is based on an older
singular values decomposition algorithm.

On another machine with the BLAS that are shipped with R (i.e. the
Fortran reference BLAS) I get

lm benchmark for n = 100000 and p = 40: nrep = 20
     test   relative elapsed user.self sys.self
3    LDLt   1.000000   1.593     1.591    0.000
7     LLt   1.001255   1.595     1.592    0.001
5 SymmEig   2.787822   4.441     3.674    0.758
6      QR   6.760829  10.770    10.010    0.752
2   PivQR   6.914626  11.015    10.225    0.777
1  lm.fit   9.943503  15.840    14.288    1.537
8    arma  13.971751  22.257    22.202    0.010
4     SVD  41.958569  66.840    65.609    1.158
9     GSL 162.653484 259.107   257.944    0.932

Bumping up the size to 1 000 000 by 100 on the machine with atlas I get

> do_bench(1000000, 100, 10)
lm benchmark for n = 1e+06 and p = 100: nrep = 10
     test   relative  elapsed user.self sys.self
3    LDLt   1.000000   25.605     25.49     0.10
7     LLt   1.047491   26.821     26.70     0.10
5 SymmEig   1.882054   48.190     66.61    10.05
6      QR   7.443937  190.602    181.92     8.56
8    arma   8.933138  228.733    224.29     4.22
2   PivQR   8.981644  229.975    221.70     8.11
1  lm.fit  12.003163  307.341    272.16    34.91
4     SVD  78.411873 2007.736   2012.31    21.72
9     GSL 210.777465 5396.957   5382.61    10.36

The do_bench function has been modified in the development version on
R-forge to have a fourth argument suppressSVD which, if TRUE,
suppresses the two slowest (SVD and GSL) versions.  On the machine
with reference BLAS I get

> do_bench(1000000L, 100L, 20L, TRUE)
lm benchmark for n = 1000000 and p = 100: nrep = 20
     test  relative  elapsed user.self sys.self
3    LDLt  1.000000   69.297    69.096    0.174
6     LLt  1.039295   72.020    71.750    0.203
4 SymmEig  2.828102  195.979   176.954   18.961
5      QR  8.990534  623.017   603.931   18.530
2   PivQR 12.731807  882.276   863.076   18.474
1  lm.fit 17.585884 1218.649  1133.541   84.174
7    arma 30.593186 2120.016  2108.481   10.087

It is interesting that the RcppArmadillo code is slower than R's
lm.fit.  Part of the reason may be that most of the effort in solving
the least squares problem using dgesl is repeated in calculating the
inverse of trans(X) * X in order to obtain the standard errors.  This
gets to one of the questions that Baptiste asked about the interface
and how it would translate between GSL, Armadillo and Eigen.  The
Eigen interface is based on a wide range of templated classes with
corresponding methods.  So when one does a decomposition you can use
the results of the decomposition in further calculations.  The
Armadillo interface for decompositions, etc. is based on stand-alone
functions with names like those in MATLAB.  For something like the QR
decomposition, you can use it to solve a least squares problem but it
is not easy to then get hold of the components.  I know that Conrad
reads this list and I am not trying to disparage his efforts, which
are well appreciated.  It is just that to me, as a person who started
doing numerical linear algebra a long time ago when we had to know the
number of flops involved in each kind of operation, it is more natural
to be able to decompose and then operate with the components of the
decomposition.

If others would be willing to run the benchmark and send me the
results, along with the output from sessionInfo() and a description of
the processor and memory, I would appreciate it.

> Be aware that it takes a while to run.  On my laptop the results,
> without RcppGSL which performs terribly on this benchmark, are
>
> lm benchmark for n = 100000 and p = 40: nrep = 20
>     test  relative elapsed user.self sys.self
> 3    LDLt  1.000000   3.381      3.31     0.00
> 7     LLt  1.007394   3.406      3.31     0.00
> 5 SymmEig  2.159716   7.302      7.08     0.08
> 8    arma  5.733215  19.384     19.13     0.01
> 6      QR  6.087548  20.582     20.24     0.09
> 2   PivQR  6.506655  21.999     21.65     0.07
> 1  lm.fit 10.447797  35.324     32.44     2.68
> 4     SVD 43.231588 146.166    142.41     1.77
>
> I probably should skip the SVD method on large examples.  Also, I
> forgot a print() in the source code, which is why you need to use
> echo=TRUE.
>
> Sourcing that file defines a function called do_bench with three
> arguments: n, p and nrep.  The script requires the package rbenchmark
> and will use RcppArmadillo and/or RcppGSL if they are available.
>
> On this example Armadillo is slightly faster than the QR and
> Column-pivoted QR decomposition as implemented in RcppEigen.  However,
> the RcppArmadillo and the RcppGSL fastLm functions do not use a
> rank-revealing decomposition whereas R's lm.fit and the LDLt, SymmEig,
> PivQR and SVD methods from RcppEigen do.  The problem with using a
> non-rank-revealing decomposition is that you can produce nonsense
> answers and not know it.
>
>> It could also be very useful to have a kind of Rosetta stone, akin to
>> http://arma.sourceforge.net/docs.html#syntax but comparing (R) | Rcpp
>> |  RcppGSL | RcppArmadillo | RcppEigen, ideally with wise comments
>> from advanced users. Where could we build such meta-documentation?
>
> Sounds like a good idea.  I haven't tried any of the Wikis associated
> with R but would be willing to contribute if someone had a
> recommendation on which one to use.
>> Just a thought,
>>
>> Best regards,
>>
>> baptiste
>>
>>
>>
>>
>> On 28 June 2011 07:55, Douglas Bates <bates at stat.wisc.edu> wrote:
>>> A new package, RcppEigen, which provides linkage from R to the Eigen
>>> C++ template library for linear algebra (http://eigen.tuxfamily.org)
>>> via Rcpp, is now available on CRAN.    The Eigen library is a pure
>>> template library that does not depend on the BLAS or Lapack, thus
>>> avoiding some difficulties such as Baptiste Auguste encountered
>>> regarding inconsistent versions of the Lapack routines being available
>>> on different platforms.
>>>
>>> A great deal of care has gone into the development of the Eigen
>>> library to ensure good performance through the use of vectorization
>>> instructions such as SSE2, SSE3 and ARM_NEON, when available, and by
>>> using block structured algorithms.  The files RcppEigen/src/fastLm.h
>>> and RcppEigen/src/fastLm.cpp in the source package provide examples of
>>> the use of some of the decompositions available in Eigen.  Eigen
>>> supports both dense and sparse matrices, with various types of
>>> triangular, symmetric (or selfAdjoint) and diagonal adaptors.  To
>>> learn about the facilities available in Eigen I would recommend
>>> starting with the tutorial at
>>> http://eigen.tuxfamily.org/dox/index.html
>>>
>>> Like the Rcpp and RcppArmadillo packages, RcppEigen has a plugin for
>>> cxxfunction from the inline package, to allow for rapid prototyping.
>>> There is also a function RcppEigen.package.skeleton to aid in creating
>>> a package that uses the RcppEigen headers.
>>>
>>> Because Eigen essentially creates its own BLAS functions, the size of
>>> the libs directory for packages using RcppEigen can be very large and
>>> compiling source files that include the RcppEigen.h file can be
>>> relatively slow.  As with most template libraries, the idea is to
>>> off-load the complexity of the code onto the compiler to sort out,
>>> with the penalty that instantiating templates may slow down the
>>> compiler.
>>>
>>> I enclose an example of a short C++ function using RcppEigen and a
>>> slightly longer and more complex C++ function, compiled with inline.
>>> Some things to notice:
>>>
>>> - in keeping with the functional programming semantics of R, it is a
>>> good idea to declare Rcpp objects created from the input SEXP
>>> arguments as const.
>>>
>>> - a Eigen matrix of double precision values has type Eigen::MatrixXd,
>>> similarly Eigen::MatrixXi for integer matrices and Eigen::MatrixXcd
>>> for matrices of std::complex<double> (see the tutorial for the
>>> explanation of the X in MatrixXd)
>>>
>>> - the Eigen::VectorXd class is a column vector of doubles, similarly
>>> Eigen::VectorXi and Eigen::VectorXcd.  A row vector is
>>> Eigen::RowVectorXd, etc.  These are all just specializations of the
>>> Matrix template (Vectors have 1 column, RowVectors have 1 row, fixed
>>> at compile time).
>>>
>>> - to obtain an Eigen matrix or vector from an Rcpp object, without
>>> copying the contents, use the Eigen::Map adaptor class.
>>>
>>> - the Eigen classes ArrayXd and ArrayXXd are similar representations
>>> to VectorXd and MatrixXd but allow for element-wise operations.  The
>>> .array() method creates a view of a Vector or Matrix object as an
>>> Array.  The .matrix() method maps the other way.
>>>
>>> - the Eigen::Matrix and Eigen::Array classes have methods for
>>> Rcpp::wrap defined in the RcppEigen headers.
>>>
>>> - Adaptor methods like .triangularView<type>(),
>>> .selfadjointView<type>() and .diagonalView() (applied to a Vector) are
>>> the common ways of restricting to specialized forms of matrices.
>>>
>>> - most methods for Eigen classes return *this as a reference so that
>>> methods can be chained.
>>>
>>> - like Armadillo, Eigen performs lazy evaluation of expressions to
>>> reduce the number of intermediate results that must be stored.
>>>
>>> As with any new libraries, Eigen takes some getting used to, but
>>> overall I have found it very good to work with and recommend its use.
>>>
>>> _______________________________________________
>>> Rcpp-devel mailing list
>>> Rcpp-devel at lists.r-forge.r-project.org
>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
>>>
>>>
>>
>