<div dir="ltr"><div>Some students I have been working with managed to get Rcpp to work with Cuda for a simple use case - calculating a big log-likelihood for MCMC - and they got a bit of a speedup compared with Rcpp - but it needs more work.  They promised they would write up a note for the gallery once their exams are over in a couple of weeks.<br><br></div>Sean<br><br></div><div class="gmail_extra"><br><div class="gmail_quote">On 16 May 2015 at 16:56, Dirk Eddelbuettel <span dir="ltr"><<a href="mailto:edd@debian.org" target="_blank">edd@debian.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class=""><br>

On 16 May 2015 at 11:46, Yue Li wrote:<br>

| I wonder if anyone worked on incorporating CULA tools library functionality into Rcpp. How much speed gain on top of Rcpp do we expect on basic operation like matrix multiplication?<br>

|<br>

| In particular, I’m  currently usnig RArmadillo to seamlessly perform matrix multiplication. But the speed gain over my R implementation is 5-10 times if not less.<br>

|<br>

| I’m wondering if there is an equivalent easy-to-use library for doing matrix multiplication with GPU enabled. A complete simple example would be greatly appreciated.<br>

<br>

</span>A few years ago I did some work on the 'gcbd' package to time and benchmark<br>

precisely these types of things: because they will depend on the hardware<br>

used for the gpu, hardware used for the cpu, software used as the compiler,<br>

software used for the BLAS/LAPACK library, software used as the OS etc pp I<br>

worked out a framework to benchmark these things and compare them.<br>

<br>

So have a look at this package and its vignette: it at least times several<br>

BLAS libraries against the gpu card I had (have).<br>

<br>

In general, I think its conclusion stands. You "waste" so much time copying<br>

data over to the gpu that any computation gain is dwarfed until you get to<br>

truly enormous (and unsual) matrix sizes.  So gpus are still good for things<br>

like limited (maybe one-time) transfer and then a of iterations: some finance<br>

applications with Monte Carlo prices some to mind, anything MCMC and of<br>

course the whole 'deep learning' complex.<br>

<br>

And with that: no, as far as I know nobody has tightly integrated Rcpp and<br>

gpu computing as it simply is not that clearly a match.<br>

<br>

That's my $0.02. More comments welcome, particularly with benchmarks.<br>

<span class="HOEnZb"><font color="#888888"><br>

Dirk<br>

<br>

--<br>

<a href="http://dirk.eddelbuettel.com" target="_blank">http://dirk.eddelbuettel.com</a> | @eddelbuettel | <a href="mailto:edd@debian.org">edd@debian.org</a><br>

</font></span><div class="HOEnZb"><div class="h5">_______________________________________________<br>

Rcpp-devel mailing list<br>

<a href="mailto:Rcpp-devel@lists.r-forge.r-project.org">Rcpp-devel@lists.r-forge.r-project.org</a><br>

<a href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel" target="_blank">https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel</a></div></div></blockquote></div><br><br clear="all"><br>-- <br><div class="gmail_signature">Kind regards,<div><br></div><div>Sean O'Riordain</div><div><a href="mailto:seoriord@tcd.ie" target="_blank">seoriord@tcd.ie</a></div><div><br></div><div><br></div></div>

</div>