[Rcpp-devel] Differences between RcppEigen and RcppArmadillo
edd at debian.org
Thu Jun 14 21:36:10 CEST 2012
On 15 June 2012 at 02:56, c s wrote:
| Simply installing ATLAS (which provides speed-ups for several Lapack
| functions) on Debian/Ubuntu systems can already make a big difference.
| (Debian & Ubuntu use a trick to redirect Lapack and Blas calls to
| ATLAS). Under Mac OS X, the Accelerate framework provides fast
| implementations of Lapack and Blas functions (eg. using
I found OpenBLAS to be faster than Atlas (with both coming from the
distribution), and tend to install that now. It uses a multithreaded approach
by default, but I have to double check one thing about cpu affinity which,
when used from R, has been seen to interfere. That is possible reason for
the single-threaded performance here. I'll report back.
| I've taken the modular approach to Armadillo (ie. using Lapack rather
| than reimplementing decompositions), as it specifically allows other
| specialist parties (such as Intel) to provide Lapack that is highly
| optimised for particular architectures. I myself would not be able to
| keep up with the specific optimisations required for each CPU. This
| also "future-proofs" Armadillo for each new CPU generation.
| More importantly, numerically stable implementation of computational
| decompositions/factorisations is notoriously difficult to get right.
| The core algorithms in Lapack have been evolving for the past 20+
| years, being exposed to a bazillion corner-cases. Lapack itself is
| related to Linpack and Eispack, which are even older. I've been
| exposed to software development long enough to know that in the end
| only time can shake out all the bugs. As such, using Lapack is far
| less risky than reimplementing decompositions from scratch. A
| "home-made" matrix decomposition might be a bit faster on a particular
| CPU, but you have far less knowledge as to when it's going to blow up
| in your face.
| High-performance variants of Lapack, such as MKL, take an existing
| proven implementation of a decomposition algorithm and recode parts of
| it in assembly, and/or parallelise other parts.
All excellent points which nobody disputes. It just so happens that Eigen
still looks better in some benchmarks but as you say we have to ensure we
really compare apples to apples.
Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com
More information about the Rcpp-devel