<html>

  <head>

    <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <div class="moz-cite-prefix">On 5/18/2015 15:12, Dale Smith wrote:<br>

    </div>

    <blockquote

cite="mid:E7226FC0F5F75642AA1F294CE3C551722FBFD3@MBX029-E1-VA-2.EXCH029.DOMAIN.LOCAL"

      type="cite">

      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

      <meta name="Generator" content="Microsoft Word 14 (filtered

        medium)">

      <!--[if !mso]><style>v\:* {behavior:url(#default#VML);}

o\:* {behavior:url(#default#VML);}

w\:* {behavior:url(#default#VML);}

.shape {behavior:url(#default#VML);}

</style><![endif]-->

      <style><!--

/* Font Definitions */

@font-face

        {font-family:Calibri;

        panose-1:2 15 5 2 2 2 4 3 2 4;}

@font-face

        {font-family:Tahoma;

        panose-1:2 11 6 4 3 5 4 4 2 4;}

/* Style Definitions */

p.MsoNormal, li.MsoNormal, div.MsoNormal

        {margin:0in;

        margin-bottom:.0001pt;

        font-size:12.0pt;

        font-family:"Times New Roman","serif";}

a:link, span.MsoHyperlink

        {mso-style-priority:99;

        color:blue;

        text-decoration:underline;}

a:visited, span.MsoHyperlinkFollowed

        {mso-style-priority:99;

        color:purple;

        text-decoration:underline;}

p.MsoPlainText, li.MsoPlainText, div.MsoPlainText

        {mso-style-priority:99;

        mso-style-link:"Plain Text Char";

        margin:0in;

        margin-bottom:.0001pt;

        font-size:11.0pt;

        font-family:"Calibri","sans-serif";}

p.MsoAcetate, li.MsoAcetate, div.MsoAcetate

        {mso-style-priority:99;

        mso-style-link:"Balloon Text Char";

        margin:0in;

        margin-bottom:.0001pt;

        font-size:8.0pt;

        font-family:"Tahoma","sans-serif";}

span.EmailStyle17

        {mso-style-type:personal-reply;

        font-family:"Calibri","sans-serif";

        color:#1F497D;}

span.BalloonTextChar

        {mso-style-name:"Balloon Text Char";

        mso-style-priority:99;

        mso-style-link:"Balloon Text";

        font-family:"Tahoma","sans-serif";}

span.PlainTextChar

        {mso-style-name:"Plain Text Char";

        mso-style-priority:99;

        mso-style-link:"Plain Text";

        font-family:"Calibri","sans-serif";}

.MsoChpDefault

        {mso-style-type:export-only;

        font-size:10.0pt;}

@page WordSection1

        {size:8.5in 11.0in;

        margin:1.0in 1.0in 1.0in 1.0in;}

div.WordSection1

        {page:WordSection1;}

--></style><!--[if gte mso 9]><xml>

<o:shapedefaults v:ext="edit" spidmax="1026" />

</xml><![endif]--><!--[if gte mso 9]><xml>

<o:shapelayout v:ext="edit">

<o:idmap v:ext="edit" data="1" />

</o:shapelayout></xml><![endif]-->

      <div class="WordSection1">

        <p class="MsoPlainText">I'm not a big fan of GPU computing for

          many of the reasons Dirk mentions below and something else I

          discovered while taking a Coursera class last winter.<o:p></o:p></p>

        <p class="MsoPlainText"><o:p> </o:p></p>

        <p class="MsoPlainText">CUDA requires significant effort to keep

          up your skills unless you do it semi-regularly or more often.

          It's a very hard learning curve. I can't climb that curve at

          this point in my working life. An occasional user may want to

          skip CUDA and investigate OpenACC or something related. Do

          what works best for you. I’ll investigate rCUDA, PyCUDA,

          OpenACC, etc, and leave the lower-level stuff to others.</p>

      </div>

    </blockquote>

    I also think the focus on the high-level approach is often the right

    choice, at least initially.<br>

    <br>

    Using either CUDA or OpenCL directly adds a lot of repetitive (and

    redundant) boilerplate code -- oftentimes (unless you actually make

    active use of the fine-tuning this allows you to use) with no

    performance benefits compared to the higher-level solutions (this

    really shouldn't need (re)stating, but I still occasionally

    encounter folks expecting "lower level" -- read: longer -- code to

    be somehow automagically faster). At the same time, having to deal

    with the lower-level details can also make the whole experience more

    error-prone (e.g., due to manual resource management -- which,

    again, unless you're explicitly fine-tuning it yourself, will not

    make your code automagically perform faster).<br>

    <br>

    Personally, I've had a good experience with C++AMP (hardware-vendor

    independent; note: the last time I've used it it was more polished

    on MSFT platforms, although open-source Linux implementation is

    available) and Thrust (CUDA / NVIDIA hardware):

    <a class="moz-txt-link-freetext" href="http://thrust.github.io/">http://thrust.github.io/</a><br>

    SYCL looks (I'm yet to try it out) like an OpenCL equivalent of

    Thrust -- and its parallel STL implementation looks quite promising:

    <a class="moz-txt-link-freetext" href="https://github.com/KhronosGroup/SyclParallelSTL">https://github.com/KhronosGroup/SyclParallelSTL</a><br>

    // OpenCL-based Boost.Compute has been recently accepted to Boost:

    <a class="moz-txt-link-freetext" href="https://github.com/boostorg/compute">https://github.com/boostorg/compute</a><br>

    (The flip side being that NVIDIA hasn't historically kept OpenCL

    drivers for its cards very much up-to-date... perhaps this will

    change with improvements necessary for CUDA 7, as well as

    requirements needed to implement Vulkan API.)<br>

    <br>

    In other words, instead of starting directly with CUDA, I'd suggest

    starting with Thrust -- analogously, instead of jumping straight to

    raw OpenCL, I'd probably start with SYCL Parallel STL (or

    Boost.Compute?).<br>

    <br>

    There's plenty of high-level GPGPU solutions available for C++, here

    are some good overviews:<br>

    <a class="moz-txt-link-freetext" href="http://www.soa-world.de/echelon/2014/04/c-accelerator-libraries.html">http://www.soa-world.de/echelon/2014/04/c-accelerator-libraries.html</a>

    // multiple reviews: <a class="moz-txt-link-freetext" href="http://www.soa-world.de/echelon/">http://www.soa-world.de/echelon/</a><br>

    <a class="moz-txt-link-freetext" href="http://arxiv.org/abs/1212.6326">http://arxiv.org/abs/1212.6326</a><br>

    <br>

    What I haven't seen is any study of integrating these with R (I've

    only used standalone C++ code for GPGPU), could be interesting.<br>

    <br>

    <blockquote

cite="mid:E7226FC0F5F75642AA1F294CE3C551722FBFD3@MBX029-E1-VA-2.EXCH029.DOMAIN.LOCAL"

      type="cite">

      <div class="WordSection1">

        <p class="MsoPlainText"><o:p></o:p></p>

        <p class="MsoPlainText"><o:p> </o:p></p>

        <p class="MsoPlainText">I’d like to reiterate that by far the

          most difficult think about working with GPU technology is

          efficiently moving data on and off the card. Do you have a

          rigorously established use case for using GPU technology?</p>

      </div>

    </blockquote>

    In my experience, the "best" use case (in terms of being the

    lowest-hanging-fruit) would be an embarrassingly parallel problem;

    for examples, see:<br>

    <a class="moz-txt-link-freetext" href="http://en.wikipedia.org/wiki/Embarrassingly_parallel">http://en.wikipedia.org/wiki/Embarrassingly_parallel</a><br>

    Naturally, the larger the workload, the higher the chance of the

    speed-up exceeding the data transfer costs.<br>

    <br>

    Best,<br>

    <br>

    Matt<br>

    <br>

  </body>

</html>