[Rcpp-devel] Results on a linear models benchmark
Douglas Bates
bates at stat.wisc.edu
Thu Nov 10 21:53:50 CET 2011
I have added another method to the fastLm function in the RcppEigen
package and Dirk and I have written an introductory vignette. We have
not yet released a new version so these are currently only available
through the SVN archive at R-forge.
If you install the version on R-forge you can run the benchmark of
linear model fitting methods (calculating both the coefficient
estimates and their standard errors) by starting R and entering
source(system.file(package="RcppEigen", "examples", "lmBenchmark.R"))
I have tried this with both a single-threaded BLAS (atlas) and a
multi-threaded BLAS (a locally compiled version of the Ubuntu
libopenblas-base package, which is based on Goto's BLAS). The
processor is an AMD Athlon || X4 635 with 4 cores.
The results with multi-threaded BLAS are
lm benchmark for n = 100000 and p = 40: nrep = 20
test relative elapsed user.self sys.self
3 LDLt 1.000000 1.176 1.172 0.000
8 LLt 1.010204 1.188 1.172 0.000
6 SymmEig 2.762755 3.249 2.704 0.516
7 QR 6.350340 7.468 6.932 0.528
9 arma 6.601190 7.763 25.686 4.473
2 PivQR 7.154762 8.414 7.777 0.608
1 lm.fit 11.683673 13.740 21.561 16.769
4 GESDD 12.576531 14.790 44.011 10.960
5 SVD 44.475340 52.303 51.379 0.804
10 GSL 150.456633 176.937 210.517 149.857
Only the arma, lm.fit, GESDD and GSL methods use the multithreading.
You can see that in each of these methods the user time exceeds the
elapsed time. Notice, though, how large the system time is. arma
achieves a reasonable trade-off between system time and computation
time, GESDD is sort-of okay but lm.fit and GSL are not - probably
because they use only level-1 BLAS.
However, in terms of elapsed time, which is what really counts, the
single-threaded methods LDLt, LLt, and SymmEig all using Eigen methods
based on the crossproduct matrix (i.e. X'X) are still the fastest.
The QR, arma and PivQR methods are comparable. Of these only the
PivQR handles rank-deficient cases properly. lm.fit and GESDD are
comparable. GESDD, SVD and GSL are all based on a singular value
decomposition. GSL from the GNU Scientific Library uses an older
algorithm and is out of contention. The SVD method uses the JacobiSVD
class from Eigen and the GESDD method uses Eigen classes but calls the
Lapack subroutine dgesdd to calculate the SVD itself.
Timings with a single-threaded BLAS on the same machine are
lm benchmark for n = 100000 and p = 40: nrep = 20
test relative elapsed user.self sys.self
8 LLt 1.000000 1.199 1.196 0.000
3 LDLt 1.053378 1.263 1.261 0.000
6 SymmEig 2.795663 3.352 2.760 0.584
7 QR 6.708924 8.044 7.393 0.640
9 arma 7.525438 9.023 8.992 0.004
2 PivQR 8.038365 9.638 9.004 0.624
1 lm.fit 14.638866 17.552 16.701 0.828
4 GESDD 15.276063 18.316 17.681 0.620
5 SVD 45.872394 55.001 54.091 0.876
10 GSL 149.428691 179.165 178.391 0.668
Given the variation in the timings of the methods that don't use BLAS
(LDLt, LLt, SymmEig, QR, PivQR and SVD), the gains from the
multi-threaded BLAS are not impressive. I was allowing 4 threads on a
machine with 4 cores so perhaps it would be better to use fewer
threads.
More information about the Rcpp-devel
mailing list