[Rcpp-devel] examples of using cula matrix multiplication in Rcpp
Matt D.
matdzb at gmail.com
Mon May 18 16:37:21 CEST 2015
On 5/18/2015 15:12, Dale Smith wrote:
>
> I'm not a big fan of GPU computing for many of the reasons Dirk
> mentions below and something else I discovered while taking a Coursera
> class last winter.
>
> CUDA requires significant effort to keep up your skills unless you do
> it semi-regularly or more often. It's a very hard learning curve. I
> can't climb that curve at this point in my working life. An occasional
> user may want to skip CUDA and investigate OpenACC or something
> related. Do what works best for you. I’ll investigate rCUDA, PyCUDA,
> OpenACC, etc, and leave the lower-level stuff to others.
>
I also think the focus on the high-level approach is often the right
choice, at least initially.
Using either CUDA or OpenCL directly adds a lot of repetitive (and
redundant) boilerplate code -- oftentimes (unless you actually make
active use of the fine-tuning this allows you to use) with no
performance benefits compared to the higher-level solutions (this really
shouldn't need (re)stating, but I still occasionally encounter folks
expecting "lower level" -- read: longer -- code to be somehow
automagically faster). At the same time, having to deal with the
lower-level details can also make the whole experience more error-prone
(e.g., due to manual resource management -- which, again, unless you're
explicitly fine-tuning it yourself, will not make your code
automagically perform faster).
Personally, I've had a good experience with C++AMP (hardware-vendor
independent; note: the last time I've used it it was more polished on
MSFT platforms, although open-source Linux implementation is available)
and Thrust (CUDA / NVIDIA hardware): http://thrust.github.io/
SYCL looks (I'm yet to try it out) like an OpenCL equivalent of Thrust
-- and its parallel STL implementation looks quite promising:
https://github.com/KhronosGroup/SyclParallelSTL
// OpenCL-based Boost.Compute has been recently accepted to Boost:
https://github.com/boostorg/compute
(The flip side being that NVIDIA hasn't historically kept OpenCL drivers
for its cards very much up-to-date... perhaps this will change with
improvements necessary for CUDA 7, as well as requirements needed to
implement Vulkan API.)
In other words, instead of starting directly with CUDA, I'd suggest
starting with Thrust -- analogously, instead of jumping straight to raw
OpenCL, I'd probably start with SYCL Parallel STL (or Boost.Compute?).
There's plenty of high-level GPGPU solutions available for C++, here are
some good overviews:
http://www.soa-world.de/echelon/2014/04/c-accelerator-libraries.html //
multiple reviews: http://www.soa-world.de/echelon/
http://arxiv.org/abs/1212.6326
What I haven't seen is any study of integrating these with R (I've only
used standalone C++ code for GPGPU), could be interesting.
> I’d like to reiterate that by far the most difficult think about
> working with GPU technology is efficiently moving data on and off the
> card. Do you have a rigorously established use case for using GPU
> technology?
>
In my experience, the "best" use case (in terms of being the
lowest-hanging-fruit) would be an embarrassingly parallel problem; for
examples, see:
http://en.wikipedia.org/wiki/Embarrassingly_parallel
Naturally, the larger the workload, the higher the chance of the
speed-up exceeding the data transfer costs.
Best,
Matt
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20150518/362d8bf1/attachment-0001.html>
More information about the Rcpp-devel
mailing list