[Rcpp-devel] examples of using cula matrix multiplication in Rcpp

Dirk Eddelbuettel edd at debian.org
Sat May 16 17:56:59 CEST 2015


On 16 May 2015 at 11:46, Yue Li wrote:
| I wonder if anyone worked on incorporating CULA tools library functionality into Rcpp. How much speed gain on top of Rcpp do we expect on basic operation like matrix multiplication?
| 
| In particular, I’m  currently usnig RArmadillo to seamlessly perform matrix multiplication. But the speed gain over my R implementation is 5-10 times if not less. 
| 
| I’m wondering if there is an equivalent easy-to-use library for doing matrix multiplication with GPU enabled. A complete simple example would be greatly appreciated.

A few years ago I did some work on the 'gcbd' package to time and benchmark
precisely these types of things: because they will depend on the hardware
used for the gpu, hardware used for the cpu, software used as the compiler,
software used for the BLAS/LAPACK library, software used as the OS etc pp I
worked out a framework to benchmark these things and compare them.

So have a look at this package and its vignette: it at least times several
BLAS libraries against the gpu card I had (have).

In general, I think its conclusion stands. You "waste" so much time copying
data over to the gpu that any computation gain is dwarfed until you get to
truly enormous (and unsual) matrix sizes.  So gpus are still good for things
like limited (maybe one-time) transfer and then a of iterations: some finance
applications with Monte Carlo prices some to mind, anything MCMC and of
course the whole 'deep learning' complex.

And with that: no, as far as I know nobody has tightly integrated Rcpp and
gpu computing as it simply is not that clearly a match.

That's my $0.02. More comments welcome, particularly with benchmarks.

Dirk

-- 
http://dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org


More information about the Rcpp-devel mailing list