[Rcpp-devel] speed

Thu Aug 5 12:32:12 CEST 2010

Le 05/08/10 12:11, Romain Francois a écrit :
> Le 05/08/10 11:35, Romain Francois a écrit :
>> Hello,
>>
>> I've put my spin on Dirk's solution and added a few things in Rcpp to
>> combine convenience and performance.
>>
>> I've added Rcpp::stats::dpois and Rcpp::stats::dbinom as sugar
>> functions, and more will come (time permitting, this is actually very
>> easy and mechanical, so if someone wants to add other (dnorm, etc ...) I
>> can guide them)
>>
>> Anyway, this means we can call dbinom and dpois on integer vectors (and
>> actually on any integer sugar expression).
>>
>> With this, here comes fx3 :
>>
>> fx3 <- cxxfunction(signature(Kr="integer"), '
>> IntegerVector K(Kr);
>> int n = K.size() ;
>> NumericMatrix bpsum(n, n);
>> IntegerVector cmin0 ;
>> NumericVector bp ;
>>
>> for(int i=0; i<n; i++) {
>> cmin0 = seq(0, K[i]);
>>
>> for(int j=0; j<n; j++) {
>> bp = stats::dbinom(cmin0, K[j], 0.5) * stats::dpois( K[j]-cmin0 , 1.5) ;
>> bpsum(i, j) = std::accumulate(bp.begin(), bp.end(), 0.0);
>> }
>> }
>> return bpsum;
>> ', plugin="Rcpp")
>>
>>
>> which on my mac does a better job than Dirk's version :
>>
>> test replications elapsed relative user.self sys.self
>> 1 fx(0:100) 100 68.113 4.903744 67.409 0.704
>> 2 fx2(0:100) 100 15.913 1.145644 15.523 0.391
>> 3 fx3(0:100) 100 13.890 1.000000 13.782 0.108
>> 4 fxR(0:100) 100 27.100 1.951044 26.536 0.564
>>
>> This is because Dirk's wrap_dbinom and wrap_dpois are not lazy, so they
>> allocate memory for the result, etc ... where stats::dbinom and
>> stats::dpois are lazy so they don't require any memory allocation.
>>
>>
>> I guess we can do even better when someone (perhaps me) implements
>> Rcpp::sum because we would not even have to allocate memory for bp.
>
> Done in revision 1912, which gives fx4 :
>
> fx4 <- cxxfunction(signature(Kr="integer"), '
> IntegerVector K(Kr);
> int n = K.size() ;
> NumericMatrix bpsum(n, n);
> IntegerVector cmin0 ;
>
> for(int i=0; i<n; i++) {
> cmin0 = seq(0, K[i]);
> for(int j=0; j<n; j++) {
> bpsum(i, j) = sum(
> stats::dbinom(cmin0, K[j], 0.5) * stats::dpois( K[j]-cmin0 , 1.5)
> );
> }
> }
> return bpsum;
> ', plugin="Rcpp")
>
>
> Here we don't need the "bp" variable and we are yet a little bit better:
>
> test replications elapsed relative user.self sys.self
> 1 fx(0:100) 100 67.927 5.180917 67.221 0.701
> 2 fx2(0:100) 100 15.911 1.213561 15.505 0.405
> 3 fx3(0:100) 100 13.643 1.040577 13.537 0.105
> 4 fx4(0:100) 100 13.111 1.000000 13.088 0.023
> 5 fxR(0:100) 100 27.084 2.065746 26.434 0.650
>
>
> Note that the game here is not to achieve the best performance, but the
> best convenience/performance ratio ...
>
> Romain

And here is another version, for completeness, using Rcpp::outer

inc5 <- '

class Calculation : public std::binary_function<int,int,double> {
public:
	Calculation(){}
	inline double operator()(int ki, int kj) const {
		IntegerVector cmin0 = seq(0, ki ) ;
		return sum(
         		stats::dbinom(cmin0, kj, 0.5) * stats::dpois( kj - cmin0 , 1.5)
         		) ;
	}
} ;
'

fx5 <- cxxfunction(signature(Kr="integer"), '
     IntegerVector K(Kr);
     return wrap( outer( K, K, Calculation() ) );
',  plugin="Rcpp", includes = inc5 )

This one does not do as good as fx4 because cmin0 is calculated each 
time instead of one for each i, but I still like this version because 
things are nicely separated.

Romain

>> Couple of other things:
>>
>> - I've added Rcpp::seq so that we don't have to do cmin and cmin0, and I
>> need to investigate why " seq_len(K[i]+1)-1 " does not work.
>>
>> - The assignment operator for IntegerVector is smart enough to not
>> allocate memory if the existing vector is of the same size as the
>> expression it is given. This is why I am declaring cmin0 once and then
>> allocate into it instead of allocating each time. It might even be that
>> we don't need cmin at all.
>>
>> - I am also declaring bp upfront for the same reason
>>
>> Romain
>>
>> Le 04/08/10 22:10, Dirk Eddelbuettel a écrit :
>>>
>>> Richard,
>>>
>>> Great question.
>>>
>>> One obvious consideration for improvement is to not call dbinom and
>>> dpois
>>> from R, but rather using the C entry points provided by Rmath.h. With
>>> that, I
>>> get these performances:
>>>
>>>> require(rbenchmark)
>>> Loading required package: rbenchmark
>>>> benchmark(fx(0:100),
>>> + fx2(0:100),
>>> + fxR(0:100))
>>> test replications elapsed relative user.self
>>> 1 fx(0:100) 100 89.426 3.811850 89.42
>>> 2 fx2(0:100) 100 23.460 1.000000 23.46
>>> 3 fxR(0:100) 100 41.699 1.777451 41.69
>>>> # deleted empty sys.self, user,child, sys.child columns here
>>>
>>> The bad news is that I don't get the same results:
>>
>> Richard already identified the problem here: lambda needs to be a
>> double, not an int.
>>
>>>> all.equal(fx(0:10), fxR(0:10))
>>> [1] TRUE
>>>> all.equal(fx2(0:10), fxR(0:10))
>>> [1] "Mean relative difference: 0.2535309"
>>>>
>>>
>>> But as Doug suggested, there may be a need for reworking things and
>>> checks
>>> anyway. Or maybe I just introduced a bug -- dunno. I merely meant to
>>> help on
>>> pointing out that Rmath.h is there too, and I did this as a quick and
>>> dirty
>>> wrap around the 'atomistic' Rmath functions. Maybe someone would want to
>>> contribute vectorised versions of these ? Patches welcome, as they
>>> say...
>>>
>>> Dirk
>
>

-- 
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30
http://romainfrancois.blog.free.fr
|- http://bit.ly/aAyra4 : highlight 0.2-2
|- http://bit.ly/94EBKx : inline 0.3.6
`- http://bit.ly/aryfrk : useR! 2010