# [Rcpp-devel] speed

Richard Chandler richard.chandlers at gmail.com
Thu Aug 5 15:55:58 CEST 2010

```Thanks so much Romain, I like that sugar, and Rcpp has made my work
much easier. I have one more question if anyone is willing to
entertain it. My end goal was to replace the binomial probability 0.5
with a vector (omega), and the poisson mean 1.5 with a vector (gamma).
This would result in bpsum being a lk x lk x length(omega) array
instead of a lk x lk matrix. I thought I had it figured out, but for
some reason it doesn't work. Perhaps I'm just messing up the indexing
of the array in C++ but I can't find the problem. I am comparing it to
my best shot at a vectorized R version, which may look a little odd,
but is much faster than the looped version. Note also that all this
was sped up using cmin0 = 0:min(K[i], K[j]), which avoids zero
probabilities.

# Requires latest SVN version
fx5 <- cxxfunction(signature(Kr="integer", omegaR="numeric",
gammaR="numeric"), '
IntegerVector K(Kr);
int lk = K.size();
NumericVector omega(omegaR);
NumericVector gamma(gammaR);
NumericVector bpsum(lk * lk * omega.size());
for(int r=0; r<omega.size(); r++) {
for(int i=0; i<lk; i++) {
for(int j=0; j<lk; j++) {
int Km = std::min(K[i], K[j]);
IntegerVector cmin0 = seq(0, Km);
bpsum(r*lk*lk + i*lk + j) = sum(
stats::dbinom(cmin0, K[j], omega[r]) *
stats::dpois(K[j]-cmin0 , gamma[r])
);
}
}
}
return bpsum;
',  plugin="Rcpp")

round(array(fx5(0:4, runif(3), runif(3)), c(5, 5, 3)), 2)
length(unique(fx5(0:4, runif(3), runif(3)))) # Only 45. Why are there
duplicates?

fxR5 <- function(K, omega, gamma)
{
lk <- length(K)
bpargs <- cbind(rep(K, times=lk), rep(K, each=lk),
rep(omega, each=lk*lk),
rep(gamma, each=lk*lk)) # This could possibly run into memory limits
bp <- matrix(0, nrow(bpargs), lk)
for(i in K) {
notzero <- which(bpargs[,2] >= i & bpargs[,1] >= i)
bp[notzero,i+1] <- dbinom(i, bpargs[notzero,2], bpargs[notzero,3]) *
dpois(bpargs[notzero, 1] - i, bpargs[notzero, 4])
}
bpsum <- rowSums(bp)
bpsum <- array(bpsum, c(lk, lk, length(omega)))
return(bpsum)
}

round(fxR5(0:4, runif(3), runif(3)), 2)
length(unique(fxR5(0:4, runif(3), runif(3)))) #75 like it should be

K <- 0:4
omega <- runif(3)
gamma <- runif(3)

isTRUE(all.equal(
array(fx5(0:4, omega, gamma), c(5,3,3)),
fxR5(0:4, omega, gamma))) # FALSE ?

system.time(array(fx5(0:100, omega, gamma),
c(length(K),length(K),length(gamma)))) #  2.36s
system.time(fxR5(0:100, omega, gamma)) # 3.41s

On Thu, Aug 5, 2010 at 6:32 AM, Romain Francois
<romain at r-enthusiasts.com> wrote:
> Le 05/08/10 12:11, Romain Francois a écrit :
>>
>> Le 05/08/10 11:35, Romain Francois a écrit :
>>>
>>> Hello,
>>>
>>> I've put my spin on Dirk's solution and added a few things in Rcpp to
>>> combine convenience and performance.
>>>
>>> I've added Rcpp::stats::dpois and Rcpp::stats::dbinom as sugar
>>> functions, and more will come (time permitting, this is actually very
>>> easy and mechanical, so if someone wants to add other (dnorm, etc ...) I
>>> can guide them)
>>>
>>> Anyway, this means we can call dbinom and dpois on integer vectors (and
>>> actually on any integer sugar expression).
>>>
>>> With this, here comes fx3 :
>>>
>>> fx3 <- cxxfunction(signature(Kr="integer"), '
>>> IntegerVector K(Kr);
>>> int n = K.size() ;
>>> NumericMatrix bpsum(n, n);
>>> IntegerVector cmin0 ;
>>> NumericVector bp ;
>>>
>>> for(int i=0; i<n; i++) {
>>> cmin0 = seq(0, K[i]);
>>>
>>> for(int j=0; j<n; j++) {
>>> bp = stats::dbinom(cmin0, K[j], 0.5) * stats::dpois( K[j]-cmin0 , 1.5) ;
>>> bpsum(i, j) = std::accumulate(bp.begin(), bp.end(), 0.0);
>>> }
>>> }
>>> return bpsum;
>>> ', plugin="Rcpp")
>>>
>>>
>>> which on my mac does a better job than Dirk's version :
>>>
>>> test replications elapsed relative user.self sys.self
>>> 1 fx(0:100) 100 68.113 4.903744 67.409 0.704
>>> 2 fx2(0:100) 100 15.913 1.145644 15.523 0.391
>>> 3 fx3(0:100) 100 13.890 1.000000 13.782 0.108
>>> 4 fxR(0:100) 100 27.100 1.951044 26.536 0.564
>>>
>>> This is because Dirk's wrap_dbinom and wrap_dpois are not lazy, so they
>>> allocate memory for the result, etc ... where stats::dbinom and
>>> stats::dpois are lazy so they don't require any memory allocation.
>>>
>>>
>>> I guess we can do even better when someone (perhaps me) implements
>>> Rcpp::sum because we would not even have to allocate memory for bp.
>>
>> Done in revision 1912, which gives fx4 :
>>
>> fx4 <- cxxfunction(signature(Kr="integer"), '
>> IntegerVector K(Kr);
>> int n = K.size() ;
>> NumericMatrix bpsum(n, n);
>> IntegerVector cmin0 ;
>>
>> for(int i=0; i<n; i++) {
>> cmin0 = seq(0, K[i]);
>> for(int j=0; j<n; j++) {
>> bpsum(i, j) = sum(
>> stats::dbinom(cmin0, K[j], 0.5) * stats::dpois( K[j]-cmin0 , 1.5)
>> );
>> }
>> }
>> return bpsum;
>> ', plugin="Rcpp")
>>
>>
>> Here we don't need the "bp" variable and we are yet a little bit better:
>>
>> test replications elapsed relative user.self sys.self
>> 1 fx(0:100) 100 67.927 5.180917 67.221 0.701
>> 2 fx2(0:100) 100 15.911 1.213561 15.505 0.405
>> 3 fx3(0:100) 100 13.643 1.040577 13.537 0.105
>> 4 fx4(0:100) 100 13.111 1.000000 13.088 0.023
>> 5 fxR(0:100) 100 27.084 2.065746 26.434 0.650
>>
>>
>> Note that the game here is not to achieve the best performance, but the
>> best convenience/performance ratio ...
>>
>> Romain
>
> And here is another version, for completeness, using Rcpp::outer
>
> inc5 <- '
>
> class Calculation : public std::binary_function<int,int,double> {
> public:
>        Calculation(){}
>        inline double operator()(int ki, int kj) const {
>                IntegerVector cmin0 = seq(0, ki ) ;
>                return sum(
>                        stats::dbinom(cmin0, kj, 0.5) * stats::dpois( kj -
> cmin0 , 1.5)
>                        ) ;
>        }
> } ;
> '
>
> fx5 <- cxxfunction(signature(Kr="integer"), '
>    IntegerVector K(Kr);
>    return wrap( outer( K, K, Calculation() ) );
> ',  plugin="Rcpp", includes = inc5 )
>
>
> This one does not do as good as fx4 because cmin0 is calculated each time
> instead of one for each i, but I still like this version because things are
> nicely separated.
>
> Romain
>
>
>>> Couple of other things:
>>>
>>> - I've added Rcpp::seq so that we don't have to do cmin and cmin0, and I
>>> need to investigate why " seq_len(K[i]+1)-1 " does not work.
>>>
>>> - The assignment operator for IntegerVector is smart enough to not
>>> allocate memory if the existing vector is of the same size as the
>>> expression it is given. This is why I am declaring cmin0 once and then
>>> allocate into it instead of allocating each time. It might even be that
>>> we don't need cmin at all.
>>>
>>> - I am also declaring bp upfront for the same reason
>>>
>>> Romain
>>>
>>> Le 04/08/10 22:10, Dirk Eddelbuettel a écrit :
>>>>
>>>> Richard,
>>>>
>>>> Great question.
>>>>
>>>> One obvious consideration for improvement is to not call dbinom and
>>>> dpois
>>>> from R, but rather using the C entry points provided by Rmath.h. With
>>>> that, I
>>>> get these performances:
>>>>
>>>>> require(rbenchmark)
>>>>
>>>>>
>>>>> benchmark(fx(0:100),
>>>>
>>>> + fx2(0:100),
>>>> + fxR(0:100))
>>>> test replications elapsed relative user.self
>>>> 1 fx(0:100) 100 89.426 3.811850 89.42
>>>> 2 fx2(0:100) 100 23.460 1.000000 23.46
>>>> 3 fxR(0:100) 100 41.699 1.777451 41.69
>>>>>
>>>>> # deleted empty sys.self, user,child, sys.child columns here
>>>>
>>>> The bad news is that I don't get the same results:
>>>
>>> Richard already identified the problem here: lambda needs to be a
>>> double, not an int.
>>>
>>>>> all.equal(fx(0:10), fxR(0:10))
>>>>
>>>> [1] TRUE
>>>>>
>>>>> all.equal(fx2(0:10), fxR(0:10))
>>>>
>>>> [1] "Mean relative difference: 0.2535309"
>>>>>
>>>>
>>>> But as Doug suggested, there may be a need for reworking things and
>>>> checks
>>>> anyway. Or maybe I just introduced a bug -- dunno. I merely meant to
>>>> help on
>>>> pointing out that Rmath.h is there too, and I did this as a quick and
>>>> dirty
>>>> wrap around the 'atomistic' Rmath functions. Maybe someone would want to
>>>> contribute vectorised versions of these ? Patches welcome, as they
>>>> say...
>>>>
>>>> Dirk
>>
>>
>
>
> --
> Romain Francois
> Professional R Enthusiast
> +33(0) 6 28 91 30 30
> http://romainfrancois.blog.free.fr
> |- http://bit.ly/aAyra4 : highlight 0.2-2
> |- http://bit.ly/94EBKx : inline 0.3.6
> `- http://bit.ly/aryfrk : useR! 2010
>
>
```