[Rcpp-devel] speed

Thu Aug 5 12:11:14 CEST 2010

Le 05/08/10 11:35, Romain Francois a écrit :
> Hello,
>
> I've put my spin on Dirk's solution and added a few things in Rcpp to
> combine convenience and performance.
>
> I've added Rcpp::stats::dpois and Rcpp::stats::dbinom as sugar
> functions, and more will come (time permitting, this is actually very
> easy and mechanical, so if someone wants to add other (dnorm, etc ...) I
> can guide them)
>
> Anyway, this means we can call dbinom and dpois on integer vectors (and
> actually on any integer sugar expression).
>
> With this, here comes fx3 :
>
> fx3 <- cxxfunction(signature(Kr="integer"), '
> IntegerVector K(Kr);
> int n = K.size() ;
> NumericMatrix bpsum(n, n);
> IntegerVector cmin0 ;
> NumericVector bp ;
>
> for(int i=0; i<n; i++) {
> cmin0 = seq(0, K[i]);
>
> for(int j=0; j<n; j++) {
> bp = stats::dbinom(cmin0, K[j], 0.5) * stats::dpois( K[j]-cmin0 , 1.5) ;
> bpsum(i, j) = std::accumulate(bp.begin(), bp.end(), 0.0);
> }
> }
> return bpsum;
> ', plugin="Rcpp")
>
>
> which on my mac does a better job than Dirk's version :
>
> test replications elapsed relative user.self sys.self
> 1 fx(0:100) 100 68.113 4.903744 67.409 0.704
> 2 fx2(0:100) 100 15.913 1.145644 15.523 0.391
> 3 fx3(0:100) 100 13.890 1.000000 13.782 0.108
> 4 fxR(0:100) 100 27.100 1.951044 26.536 0.564
>
> This is because Dirk's wrap_dbinom and wrap_dpois are not lazy, so they
> allocate memory for the result, etc ... where stats::dbinom and
> stats::dpois are lazy so they don't require any memory allocation.
>
>
> I guess we can do even better when someone (perhaps me) implements
> Rcpp::sum because we would not even have to allocate memory for bp.

Done in revision 1912, which gives fx4 :

fx4 <- cxxfunction(signature(Kr="integer"), '
     IntegerVector K(Kr);
     int n = K.size() ;
     NumericMatrix bpsum(n, n);
     IntegerVector cmin0 ;

     for(int i=0; i<n; i++) {
     	cmin0 = seq(0, K[i]);
     	for(int j=0; j<n; j++) { 

         	bpsum(i, j) = sum(
         		stats::dbinom(cmin0, K[j], 0.5) * stats::dpois( K[j]-cmin0 , 1.5)
         	);
         }
     }
     return bpsum;
     ',  plugin="Rcpp")

Here we don't need the "bp" variable and we are yet a little bit better:

         test replications elapsed relative user.self sys.self
1  fx(0:100)          100  67.927 5.180917    67.221    0.701
2 fx2(0:100)          100  15.911 1.213561    15.505    0.405
3 fx3(0:100)          100  13.643 1.040577    13.537    0.105
4 fx4(0:100)          100  13.111 1.000000    13.088    0.023
5 fxR(0:100)          100  27.084 2.065746    26.434    0.650

Note that the game here is not to achieve the best performance, but the 
best convenience/performance ratio ...

Romain

> Couple of other things:
>
> - I've added Rcpp::seq so that we don't have to do cmin and cmin0, and I
> need to investigate why " seq_len(K[i]+1)-1 " does not work.
>
> - The assignment operator for IntegerVector is smart enough to not
> allocate memory if the existing vector is of the same size as the
> expression it is given. This is why I am declaring cmin0 once and then
> allocate into it instead of allocating each time. It might even be that
> we don't need cmin at all.
>
> - I am also declaring bp upfront for the same reason
>
> Romain
>
> Le 04/08/10 22:10, Dirk Eddelbuettel a écrit :
>>
>> Richard,
>>
>> Great question.
>>
>> One obvious consideration for improvement is to not call dbinom and dpois
>> from R, but rather using the C entry points provided by Rmath.h. With
>> that, I
>> get these performances:
>>
>>> require(rbenchmark)
>> Loading required package: rbenchmark
>>> benchmark(fx(0:100),
>> + fx2(0:100),
>> + fxR(0:100))
>> test replications elapsed relative user.self
>> 1 fx(0:100) 100 89.426 3.811850 89.42
>> 2 fx2(0:100) 100 23.460 1.000000 23.46
>> 3 fxR(0:100) 100 41.699 1.777451 41.69
>>> # deleted empty sys.self, user,child, sys.child columns here
>>
>> The bad news is that I don't get the same results:
>
> Richard already identified the problem here: lambda needs to be a
> double, not an int.
>
>>> all.equal(fx(0:10), fxR(0:10))
>> [1] TRUE
>>> all.equal(fx2(0:10), fxR(0:10))
>> [1] "Mean relative difference: 0.2535309"
>>>
>>
>> But as Doug suggested, there may be a need for reworking things and
>> checks
>> anyway. Or maybe I just introduced a bug -- dunno. I merely meant to
>> help on
>> pointing out that Rmath.h is there too, and I did this as a quick and
>> dirty
>> wrap around the 'atomistic' Rmath functions. Maybe someone would want to
>> contribute vectorised versions of these ? Patches welcome, as they say...
>>
>> Dirk

-- 
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30
http://romainfrancois.blog.free.fr
|- http://bit.ly/aAyra4 : highlight 0.2-2
|- http://bit.ly/94EBKx : inline 0.3.6
`- http://bit.ly/aryfrk : useR! 2010