[Rcpp-devel] R vectorisation vs. C++ vectorisation

Romain Francois romain at r-enthusiasts.com
Wed Nov 21 20:59:27 CET 2012


Le 19/11/12 16:31, Hadley Wickham a écrit :
> Hi all,
>
> Inspired by "Rcpp is smoking fast for agent-based models in data
> frames" (http://www.babelgraph.org/wp/?p=358), I've been doing some
> exploration of vectorisation in R vs C++ at
> https://gist.github.com/4111256
>
> I have five versions of the basic vaccinate function:
>
> * vacc1: vectorisation in R with a for loop
> * vacc2: used vectorised R primitives
> * vacc3: vectorised with loop in C++
> * vacc4: vectorised with Rcpp sugar
> * vacc5: vectorised with Rcpp sugar, explicitly labelled as containing
> no missing values
>
> And the timings I get are as follows:
>
> Unit: microseconds
>                      expr    min     lq median     uq     max neval
>   vacc1(age, female, ily) 6816.8 7139.4 7285.7 7823.9 10055.5   100
>   vacc2(age, female, ily)  194.5  202.6  212.6  227.9   260.4   100
>   vacc3(age, female, ily)   21.8   22.4   23.4   24.9    35.5   100
>   vacc4(age, female, ily)   36.2   38.7   41.3   44.5    55.6   100
>   vacc5(age, female, ily)   29.3   31.3   34.0   36.4    52.1   100
>
> Unsurprisingly the R loop (vacc1) is very slow, and proper
> vectorisation speeds it up immensely.  Interestingly, however, the C++
> loop still does considerably better (about 10x faster) - I'm not sure
> exactly why this is the case, but I suspect it may be because it
> avoids the many intermediate vectors that R requires.  The sugar
> version is about half as fast, but this gets quite a bit faster with
> explicit no missing flags.
>
> I'd love any feedback on my code (https://gist.github.com/4111256) -
> please let me know if I've missed anything obvious.
>
> Hadley

I've put some code in that allows us to write mapply version. Here are 
two examples :

// [[Rcpp::export]]
NumericVector vacc6(NumericVector age, LogicalVector female, 
NumericVector ily) {
   NumericVector p = mapply( age, female, ily, vacc3a ) ;
   return p;
}


inline double minmax( double a, double x, double b) {
     return x < a ? a : ( x > b ? b : x ) ;
}

class Vaccinate {
public:
     typedef double result_type ;
     Vaccinate(){} ;

     inline double operator()(double age, bool female, double ily) const {
         double p = 0.25 + 0.3 * 1 / (1 - exp(0.04 * age)) + 0.1 * ily;
         p = minmax( 0.0, p * (female ? 1.25 : 0.75), 1.0 );
         return p;
     }


} ;

// [[Rcpp::export]]
NumericVector vacc7(NumericVector age, LogicalVector female, 
NumericVector ily) {
   NumericVector p = mapply( age, female, ily, Vaccinate() ) ;
   return p;
}


(C++11 lambdas would make this even funnier)


I get:

$ Rscript /tmp/vaccinate.R
Unit: microseconds
                      expr     min       lq   median       uq      max
1 vacc2(age, female, ily) 284.728 287.0085 298.3930 331.3630 1087.739
2 vacc3(age, female, ily)  23.010  23.5175  24.4360  27.4620   55.129
3 vacc4(age, female, ily)  42.698  43.0965  44.0405  49.2485   54.516
4 vacc6(age, female, ily)  28.936  29.2380  30.0405  34.2685  425.066
5 vacc7(age, female, ily)  24.751  25.2070  26.0705  29.9605   67.308

Not too far from the loop version. I expected mapply to be better. 
Anyway, we are splitting hair.


This gives me an opportunity to give insight about why vacc7 performs 
better than vecc6. The issue about vecc6 is that the construct:

NumericVector p = mapply( age, female, ily, vacc3a ) ;

uses a function pointer, and so the pointer has to be dereferenced each 
time to call the function. which has a penalty (minor, but still).

vacc7 can properly use inlining.


I'm just trying here to show some alternatives. I like these threads 
that let us think about alternatives and opportunities. This resulted in 
some improvements in Rcpp about the assuignment operator in Vector and 
pmin and pmax which just needed a fresh look.

Romain


-- 
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30

R Graph Gallery: http://gallery.r-enthusiasts.com
`- http://bit.ly/SweN1Z : SuperStorm Sandy

blog:            http://romainfrancois.blog.free.fr
|- http://bit.ly/RE6sYH : OOP with Rcpp modules
`- http://bit.ly/Thw7IK : Rcpp modules more flexible



More information about the Rcpp-devel mailing list