[Rcpp-devel] R vectorisation vs. C++ vectorisation
Romain Francois
romain at r-enthusiasts.com
Wed Nov 21 20:59:27 CET 2012
Le 19/11/12 16:31, Hadley Wickham a écrit :
> Hi all,
>
> Inspired by "Rcpp is smoking fast for agent-based models in data
> frames" (http://www.babelgraph.org/wp/?p=358), I've been doing some
> exploration of vectorisation in R vs C++ at
> https://gist.github.com/4111256
>
> I have five versions of the basic vaccinate function:
>
> * vacc1: vectorisation in R with a for loop
> * vacc2: used vectorised R primitives
> * vacc3: vectorised with loop in C++
> * vacc4: vectorised with Rcpp sugar
> * vacc5: vectorised with Rcpp sugar, explicitly labelled as containing
> no missing values
>
> And the timings I get are as follows:
>
> Unit: microseconds
> expr min lq median uq max neval
> vacc1(age, female, ily) 6816.8 7139.4 7285.7 7823.9 10055.5 100
> vacc2(age, female, ily) 194.5 202.6 212.6 227.9 260.4 100
> vacc3(age, female, ily) 21.8 22.4 23.4 24.9 35.5 100
> vacc4(age, female, ily) 36.2 38.7 41.3 44.5 55.6 100
> vacc5(age, female, ily) 29.3 31.3 34.0 36.4 52.1 100
>
> Unsurprisingly the R loop (vacc1) is very slow, and proper
> vectorisation speeds it up immensely. Interestingly, however, the C++
> loop still does considerably better (about 10x faster) - I'm not sure
> exactly why this is the case, but I suspect it may be because it
> avoids the many intermediate vectors that R requires. The sugar
> version is about half as fast, but this gets quite a bit faster with
> explicit no missing flags.
>
> I'd love any feedback on my code (https://gist.github.com/4111256) -
> please let me know if I've missed anything obvious.
>
> Hadley
I've put some code in that allows us to write mapply version. Here are
two examples :
// [[Rcpp::export]]
NumericVector vacc6(NumericVector age, LogicalVector female,
NumericVector ily) {
NumericVector p = mapply( age, female, ily, vacc3a ) ;
return p;
}
inline double minmax( double a, double x, double b) {
return x < a ? a : ( x > b ? b : x ) ;
}
class Vaccinate {
public:
typedef double result_type ;
Vaccinate(){} ;
inline double operator()(double age, bool female, double ily) const {
double p = 0.25 + 0.3 * 1 / (1 - exp(0.04 * age)) + 0.1 * ily;
p = minmax( 0.0, p * (female ? 1.25 : 0.75), 1.0 );
return p;
}
} ;
// [[Rcpp::export]]
NumericVector vacc7(NumericVector age, LogicalVector female,
NumericVector ily) {
NumericVector p = mapply( age, female, ily, Vaccinate() ) ;
return p;
}
(C++11 lambdas would make this even funnier)
I get:
$ Rscript /tmp/vaccinate.R
Unit: microseconds
expr min lq median uq max
1 vacc2(age, female, ily) 284.728 287.0085 298.3930 331.3630 1087.739
2 vacc3(age, female, ily) 23.010 23.5175 24.4360 27.4620 55.129
3 vacc4(age, female, ily) 42.698 43.0965 44.0405 49.2485 54.516
4 vacc6(age, female, ily) 28.936 29.2380 30.0405 34.2685 425.066
5 vacc7(age, female, ily) 24.751 25.2070 26.0705 29.9605 67.308
Not too far from the loop version. I expected mapply to be better.
Anyway, we are splitting hair.
This gives me an opportunity to give insight about why vacc7 performs
better than vecc6. The issue about vecc6 is that the construct:
NumericVector p = mapply( age, female, ily, vacc3a ) ;
uses a function pointer, and so the pointer has to be dereferenced each
time to call the function. which has a penalty (minor, but still).
vacc7 can properly use inlining.
I'm just trying here to show some alternatives. I like these threads
that let us think about alternatives and opportunities. This resulted in
some improvements in Rcpp about the assuignment operator in Vector and
pmin and pmax which just needed a fresh look.
Romain
--
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30
R Graph Gallery: http://gallery.r-enthusiasts.com
`- http://bit.ly/SweN1Z : SuperStorm Sandy
blog: http://romainfrancois.blog.free.fr
|- http://bit.ly/RE6sYH : OOP with Rcpp modules
`- http://bit.ly/Thw7IK : Rcpp modules more flexible
More information about the Rcpp-devel
mailing list