On Mon, Nov 19, 2012 at 9:56 AM, Dirk Eddelbuettel <span dir="ltr"><<a href="mailto:edd@debian.org" target="_blank">edd@debian.org</a>></span> wrote:<br><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

<div class="im"><br>

On 19 November 2012 at 09:31, Hadley Wickham wrote:<br>

| Hi all,<br>

|<br>

| Inspired by "Rcpp is smoking fast for agent-based models in data<br>

| frames" (<a href="http://www.babelgraph.org/wp/?p=358" target="_blank">http://www.babelgraph.org/wp/?p=358</a>), I've been doing some<br>

<br>

</div>[ I liked that post, but we got flak afterwards as his example was not well<br>

chosen. The illustration of the language speed difference does of course<br>

hold. ]<br>

<div><div class="h5"><br>

| exploration of vectorisation in R vs C++ at<br>

| <a href="https://gist.github.com/4111256" target="_blank">https://gist.github.com/4111256</a><br>

|<br>

| I have five versions of the basic vaccinate function:<br>

|<br>

| * vacc1: vectorisation in R with a for loop<br>

| * vacc2: used vectorised R primitives<br>

| * vacc3: vectorised with loop in C++<br>

| * vacc4: vectorised with Rcpp sugar<br>

| * vacc5: vectorised with Rcpp sugar, explicitly labelled as containing<br>

| no missing values<br>

|<br>

| And the timings I get are as follows:<br>

|<br>

| Unit: microseconds<br>

|                     expr    min     lq median     uq     max neval<br>

|  vacc1(age, female, ily) 6816.8 7139.4 7285.7 7823.9 10055.5   100<br>

|  vacc2(age, female, ily)  194.5  202.6  212.6  227.9   260.4   100<br>

|  vacc3(age, female, ily)   21.8   22.4   23.4   24.9    35.5   100<br>

|  vacc4(age, female, ily)   36.2   38.7   41.3   44.5    55.6   100<br>

|  vacc5(age, female, ily)   29.3   31.3   34.0   36.4    52.1   100<br>

|<br>

| Unsurprisingly the R loop (vacc1) is very slow, and proper<br>

| vectorisation speeds it up immensely.  Interestingly, however, the C++<br>

| loop still does considerably better (about 10x faster) - I'm not sure<br>

| exactly why this is the case, but I suspect it may be because it<br>

| avoids the many intermediate vectors that R requires.  The sugar<br>

| version is about half as fast, but this gets quite a bit faster with<br>

| explicit no missing flags.<br>

|<br>

| I'd love any feedback on my code (<a href="https://gist.github.com/4111256" target="_blank">https://gist.github.com</a></div></div></blockquote><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

<div><div class="h5"><a href="https://gist.github.com/4111256" target="_blank">/4111256</a>) -<br>

| please let me know if I've missed anything obvious.<br>

<br>

</div></div>I don't have a problem with sugar being a little slower that hand-rolling.<br>

The code is so much simpler and shorter. And we're still way faster than<br>

vectorised R.  I like that place.<br>

<br>

Somewhat off-topic/on-topic: I am still puzzled by how the Julia guys now<br>

revert back from vectorised code to hand-written loops because llvm does<br>

better on those.  Speed is good, but concise code with speed is better in my<br>

book.<br></blockquote><div><br></div><div>Sigh.  Speaking as one of the "Julia guys" I should point out two things (not that they will change Dirk's "cold, dead hands" attitude towards Julia :-)</div>

<div><br></div><div>1. Comprehensions provide what I feel is a clean syntax for sugar-like operations in Julia</div><div><br></div><div>2. A problem with vectorization is the issue of multiple loops, hence the number of attempts at implementing delayed evaluation in compiled code (Eigen) and in add-on's to R.</div>

<div><br></div><div>A translation of Hadley's vacc3 into Julia could be</div><div><br></div><div><div>function vacc3a(age::Float64, female::Bool, ily::Float64){</div><div>  p = 0.25 + 0.3 * 1 / (1 - exp(0.04 * age)) + 0.1 * ily</div>

<div>  p *= female ? 1.25 : 0.75</div><div>  min(max(0., p), 1.)</div><div>}</div><div><br></div><div>out = [vacc3a(age[i], female[i], ily[i]) for i in 1:length(age)]</div></div><div><br></div><div>The comprehension collapses the</div>

<div> 1. Determine the length of the output vector</div><div> 2. Allocate the result</div><div> 3. Loop over indices populating the result</div><div> 4. Return the result</div><div><br></div><div>to a single syntactic element that, in my opinion, is quite readable.</div>

<div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

<br>

Hence I would prefer to invoke the 80/20 rule as I think we have better<br>

targets to chase than to narrow that gap. But that's just my $0.02...<br>

<br>

If you can't sleep til both version have 20-some microsend medians then by<br>

all means go crazy ;-)<br>

<span class=""><font color="#888888"><br>

Dirk<br>

<br>

<br>

<br>

--<br>

Dirk Eddelbuettel | <a href="mailto:edd@debian.org">edd@debian.org</a> | <a href="http://dirk.eddelbuettel.com" target="_blank">http://dirk.eddelbuettel.com</a><br>

</font></span><div class=""><div class="h5">_______________________________________________<br>

Rcpp-devel mailing list<br>

<a href="mailto:Rcpp-devel@lists.r-forge.r-project.org">Rcpp-devel@lists.r-forge.r-project.org</a><br>

<a href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel" target="_blank">https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel</a><br>

</div></div></blockquote></div><br></div>