[Rcpp-devel] RcppEigen now on CRAN
Douglas Bates
bates at stat.wisc.edu
Tue Jun 28 19:27:50 CEST 2011
On Mon, Jun 27, 2011 at 8:16 PM, Douglas Bates <bates at stat.wisc.edu> wrote:
By the way, my apologies Baptiste for misstating your last name in my
earlier message.
> On Mon, Jun 27, 2011 at 3:49 PM, baptiste auguie
> <baptiste.auguie at googlemail.com> wrote:
>> Hi,
>>
>> Another august piece work, thanks.
>> As a happy user of Armadillo (guess what, this week my packages even
>> passed win-builder! Yeah!) I have a general question: between (Rcpp),
>> RcppGSL, RcppArmadillo, RcppEigen, we now have so much choice that it
>> may be difficult to decide which framework to use for a particular
>> project. Are there comparative charts somewhere, in terms of function
>> coverage, speed? (those I found did not cover these three libraries).
>
> There is a benchmark comparing different methods of obtaining the
> coefficients and the standard errors for linear regression models.
> You can run the benchmark as
>
> library(RcppEigen)
> source(file.path(system.file("examples", package="RcppEigen"),
> "lmBenchmark.R"), echo=TRUE)
Actually that can be shortened to
library(RcppEigen)
source(system.file("examples", "lmBenchmark.R", package="RcppEigen"), echo=TRUE)
I have another timing on the benchmark, this time from my desktop on
which I have a locally-compiled version of atlas installed, thus
giving RcppArmadillo a boost.
lm benchmark for n = 100000 and p = 40: nrep = 20
test relative elapsed user.self sys.self
7 LLt 1.000000 1.132 1.14 0.00
3 LDLt 1.019435 1.154 1.16 0.00
5 SymmEig 2.932862 3.320 2.56 0.75
2 PivQR 7.891343 8.933 8.26 0.67
8 arma 7.897527 8.940 8.64 0.02
6 QR 8.679329 9.825 9.07 0.74
1 lm.fit 13.082155 14.809 13.38 1.40
4 SVD 64.440813 72.947 71.86 1.01
9 GSL 156.208481 176.828 175.86 0.80
You can see what I had stated before, that the performance of the
RcppGSL fastLm function is terrible - more than twice as slow as
creating a singular value decomposition and 150 times slower than
using a rank-revealing Cholesky decomposition. It turns out that it
uses the gsl_multifit_linear function which is based on an older
singular values decomposition algorithm.
On another machine with the BLAS that are shipped with R (i.e. the
Fortran reference BLAS) I get
lm benchmark for n = 100000 and p = 40: nrep = 20
test relative elapsed user.self sys.self
3 LDLt 1.000000 1.593 1.591 0.000
7 LLt 1.001255 1.595 1.592 0.001
5 SymmEig 2.787822 4.441 3.674 0.758
6 QR 6.760829 10.770 10.010 0.752
2 PivQR 6.914626 11.015 10.225 0.777
1 lm.fit 9.943503 15.840 14.288 1.537
8 arma 13.971751 22.257 22.202 0.010
4 SVD 41.958569 66.840 65.609 1.158
9 GSL 162.653484 259.107 257.944 0.932
Bumping up the size to 1 000 000 by 100 on the machine with atlas I get
> do_bench(1000000, 100, 10)
lm benchmark for n = 1e+06 and p = 100: nrep = 10
test relative elapsed user.self sys.self
3 LDLt 1.000000 25.605 25.49 0.10
7 LLt 1.047491 26.821 26.70 0.10
5 SymmEig 1.882054 48.190 66.61 10.05
6 QR 7.443937 190.602 181.92 8.56
8 arma 8.933138 228.733 224.29 4.22
2 PivQR 8.981644 229.975 221.70 8.11
1 lm.fit 12.003163 307.341 272.16 34.91
4 SVD 78.411873 2007.736 2012.31 21.72
9 GSL 210.777465 5396.957 5382.61 10.36
The do_bench function has been modified in the development version on
R-forge to have a fourth argument suppressSVD which, if TRUE,
suppresses the two slowest (SVD and GSL) versions. On the machine
with reference BLAS I get
> do_bench(1000000L, 100L, 20L, TRUE)
lm benchmark for n = 1000000 and p = 100: nrep = 20
test relative elapsed user.self sys.self
3 LDLt 1.000000 69.297 69.096 0.174
6 LLt 1.039295 72.020 71.750 0.203
4 SymmEig 2.828102 195.979 176.954 18.961
5 QR 8.990534 623.017 603.931 18.530
2 PivQR 12.731807 882.276 863.076 18.474
1 lm.fit 17.585884 1218.649 1133.541 84.174
7 arma 30.593186 2120.016 2108.481 10.087
It is interesting that the RcppArmadillo code is slower than R's
lm.fit. Part of the reason may be that most of the effort in solving
the least squares problem using dgesl is repeated in calculating the
inverse of trans(X) * X in order to obtain the standard errors. This
gets to one of the questions that Baptiste asked about the interface
and how it would translate between GSL, Armadillo and Eigen. The
Eigen interface is based on a wide range of templated classes with
corresponding methods. So when one does a decomposition you can use
the results of the decomposition in further calculations. The
Armadillo interface for decompositions, etc. is based on stand-alone
functions with names like those in MATLAB. For something like the QR
decomposition, you can use it to solve a least squares problem but it
is not easy to then get hold of the components. I know that Conrad
reads this list and I am not trying to disparage his efforts, which
are well appreciated. It is just that to me, as a person who started
doing numerical linear algebra a long time ago when we had to know the
number of flops involved in each kind of operation, it is more natural
to be able to decompose and then operate with the components of the
decomposition.
If others would be willing to run the benchmark and send me the
results, along with the output from sessionInfo() and a description of
the processor and memory, I would appreciate it.
> Be aware that it takes a while to run. On my laptop the results,
> without RcppGSL which performs terribly on this benchmark, are
>
> lm benchmark for n = 100000 and p = 40: nrep = 20
> test relative elapsed user.self sys.self
> 3 LDLt 1.000000 3.381 3.31 0.00
> 7 LLt 1.007394 3.406 3.31 0.00
> 5 SymmEig 2.159716 7.302 7.08 0.08
> 8 arma 5.733215 19.384 19.13 0.01
> 6 QR 6.087548 20.582 20.24 0.09
> 2 PivQR 6.506655 21.999 21.65 0.07
> 1 lm.fit 10.447797 35.324 32.44 2.68
> 4 SVD 43.231588 146.166 142.41 1.77
>
> I probably should skip the SVD method on large examples. Also, I
> forgot a print() in the source code, which is why you need to use
> echo=TRUE.
>
> Sourcing that file defines a function called do_bench with three
> arguments: n, p and nrep. The script requires the package rbenchmark
> and will use RcppArmadillo and/or RcppGSL if they are available.
>
> On this example Armadillo is slightly faster than the QR and
> Column-pivoted QR decomposition as implemented in RcppEigen. However,
> the RcppArmadillo and the RcppGSL fastLm functions do not use a
> rank-revealing decomposition whereas R's lm.fit and the LDLt, SymmEig,
> PivQR and SVD methods from RcppEigen do. The problem with using a
> non-rank-revealing decomposition is that you can produce nonsense
> answers and not know it.
>
>> It could also be very useful to have a kind of Rosetta stone, akin to
>> http://arma.sourceforge.net/docs.html#syntax but comparing (R) | Rcpp
>> | RcppGSL | RcppArmadillo | RcppEigen, ideally with wise comments
>> from advanced users. Where could we build such meta-documentation?
>
> Sounds like a good idea. I haven't tried any of the Wikis associated
> with R but would be willing to contribute if someone had a
> recommendation on which one to use.
>> Just a thought,
>>
>> Best regards,
>>
>> baptiste
>>
>>
>>
>>
>> On 28 June 2011 07:55, Douglas Bates <bates at stat.wisc.edu> wrote:
>>> A new package, RcppEigen, which provides linkage from R to the Eigen
>>> C++ template library for linear algebra (http://eigen.tuxfamily.org)
>>> via Rcpp, is now available on CRAN. The Eigen library is a pure
>>> template library that does not depend on the BLAS or Lapack, thus
>>> avoiding some difficulties such as Baptiste Auguste encountered
>>> regarding inconsistent versions of the Lapack routines being available
>>> on different platforms.
>>>
>>> A great deal of care has gone into the development of the Eigen
>>> library to ensure good performance through the use of vectorization
>>> instructions such as SSE2, SSE3 and ARM_NEON, when available, and by
>>> using block structured algorithms. The files RcppEigen/src/fastLm.h
>>> and RcppEigen/src/fastLm.cpp in the source package provide examples of
>>> the use of some of the decompositions available in Eigen. Eigen
>>> supports both dense and sparse matrices, with various types of
>>> triangular, symmetric (or selfAdjoint) and diagonal adaptors. To
>>> learn about the facilities available in Eigen I would recommend
>>> starting with the tutorial at
>>> http://eigen.tuxfamily.org/dox/index.html
>>>
>>> Like the Rcpp and RcppArmadillo packages, RcppEigen has a plugin for
>>> cxxfunction from the inline package, to allow for rapid prototyping.
>>> There is also a function RcppEigen.package.skeleton to aid in creating
>>> a package that uses the RcppEigen headers.
>>>
>>> Because Eigen essentially creates its own BLAS functions, the size of
>>> the libs directory for packages using RcppEigen can be very large and
>>> compiling source files that include the RcppEigen.h file can be
>>> relatively slow. As with most template libraries, the idea is to
>>> off-load the complexity of the code onto the compiler to sort out,
>>> with the penalty that instantiating templates may slow down the
>>> compiler.
>>>
>>> I enclose an example of a short C++ function using RcppEigen and a
>>> slightly longer and more complex C++ function, compiled with inline.
>>> Some things to notice:
>>>
>>> - in keeping with the functional programming semantics of R, it is a
>>> good idea to declare Rcpp objects created from the input SEXP
>>> arguments as const.
>>>
>>> - a Eigen matrix of double precision values has type Eigen::MatrixXd,
>>> similarly Eigen::MatrixXi for integer matrices and Eigen::MatrixXcd
>>> for matrices of std::complex<double> (see the tutorial for the
>>> explanation of the X in MatrixXd)
>>>
>>> - the Eigen::VectorXd class is a column vector of doubles, similarly
>>> Eigen::VectorXi and Eigen::VectorXcd. A row vector is
>>> Eigen::RowVectorXd, etc. These are all just specializations of the
>>> Matrix template (Vectors have 1 column, RowVectors have 1 row, fixed
>>> at compile time).
>>>
>>> - to obtain an Eigen matrix or vector from an Rcpp object, without
>>> copying the contents, use the Eigen::Map adaptor class.
>>>
>>> - the Eigen classes ArrayXd and ArrayXXd are similar representations
>>> to VectorXd and MatrixXd but allow for element-wise operations. The
>>> .array() method creates a view of a Vector or Matrix object as an
>>> Array. The .matrix() method maps the other way.
>>>
>>> - the Eigen::Matrix and Eigen::Array classes have methods for
>>> Rcpp::wrap defined in the RcppEigen headers.
>>>
>>> - Adaptor methods like .triangularView<type>(),
>>> .selfadjointView<type>() and .diagonalView() (applied to a Vector) are
>>> the common ways of restricting to specialized forms of matrices.
>>>
>>> - most methods for Eigen classes return *this as a reference so that
>>> methods can be chained.
>>>
>>> - like Armadillo, Eigen performs lazy evaluation of expressions to
>>> reduce the number of intermediate results that must be stored.
>>>
>>> As with any new libraries, Eigen takes some getting used to, but
>>> overall I have found it very good to work with and recommend its use.
>>>
>>> _______________________________________________
>>> Rcpp-devel mailing list
>>> Rcpp-devel at lists.r-forge.r-project.org
>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
>>>
>>>
>>
>
More information about the Rcpp-devel
mailing list