[Rcpp-devel] speed

Thu Aug 5 21:19:08 CEST 2010

I want to close out this thread by saying that I did get the array
version to work, and it is > 30% faster than the vectorized R
function. Both are below in case they might help others new to this.
Thanks for everyone's help.

Richard

fx5 <- cxxfunction(signature(Kr="integer", omegaR="numeric",
gammaR="numeric"), '
    IntegerVector K(Kr);
    int lk = K.size();
    NumericVector omega(omegaR);
    NumericVector gamma(gammaR);
    int lom = omega.size();
    NumericVector bpsum(lom * lk * lk);
    for(int r=0; r<lom; r++) {
        for(int i=0; i<lk; i++) {
            for(int j=0; j<lk; j++) {
                int Km = std::min(K[i], K[j]);
                IntegerVector cmin0 = seq(0, Km);
                bpsum(r*lk*lk + i + j*lk) = sum(
                    stats::dbinom(cmin0, K[j], omega[r], false) *
                        stats::dpois(K[i]-cmin0, gamma[r], false)
                        );
                }
            }
        }
   return bpsum;
   ',  plugin="Rcpp")

fxR5 <- function(K, omega, gamma)
{
    lk <- length(K)
    bpargs <- cbind(rep(K, times=lk), rep(K, each=lk),
        rep(omega, each=lk*lk),
        rep(gamma, each=lk*lk))
    bp <- matrix(0, nrow(bpargs), lk)
    for(i in K) {
        notzero <- which(bpargs[,2] >= i & bpargs[,1] >= i)
        bp[notzero,i+1] <- dbinom(i, bpargs[notzero,2], bpargs[notzero,3]) *
            dpois(bpargs[notzero, 1] - i, bpargs[notzero, 4])
        }
    bpsum <- rowSums(bp)
    bpsum <- array(bpsum, c(lk, lk, length(omega)))
    return(bpsum)
}

K <- 0:4
omega <- runif(3)
gamma <- runif(3)

isTRUE(all.equal(
    array(fx5(0:4, omega, gamma), c(5,5,3)),
    fxR5(0:4, omega, gamma))) # TRUE

On Thu, Aug 5, 2010 at 9:55 AM, Richard Chandler
<richard.chandlers at gmail.com> wrote:
> Thanks so much Romain, I like that sugar, and Rcpp has made my work
> much easier. I have one more question if anyone is willing to
> entertain it. My end goal was to replace the binomial probability 0.5
> with a vector (omega), and the poisson mean 1.5 with a vector (gamma).
> This would result in bpsum being a lk x lk x length(omega) array
> instead of a lk x lk matrix. I thought I had it figured out, but for
> some reason it doesn't work. Perhaps I'm just messing up the indexing
> of the array in C++ but I can't find the problem. I am comparing it to
> my best shot at a vectorized R version, which may look a little odd,
> but is much faster than the looped version. Note also that all this
> was sped up using cmin0 = 0:min(K[i], K[j]), which avoids zero
> probabilities.
>
>
>
>
> # Requires latest SVN version
> fx5 <- cxxfunction(signature(Kr="integer", omegaR="numeric",
> gammaR="numeric"), '
>    IntegerVector K(Kr);
>    int lk = K.size();
>    NumericVector omega(omegaR);
>    NumericVector gamma(gammaR);
>    NumericVector bpsum(lk * lk * omega.size());
>    for(int r=0; r<omega.size(); r++) {
>        for(int i=0; i<lk; i++) {
>            for(int j=0; j<lk; j++) {
>                int Km = std::min(K[i], K[j]);
>                IntegerVector cmin0 = seq(0, Km);
>                bpsum(r*lk*lk + i*lk + j) = sum(
>                    stats::dbinom(cmin0, K[j], omega[r]) *
>                        stats::dpois(K[j]-cmin0 , gamma[r])
>                        );
>                }
>            }
>        }
>   return bpsum;
>   ',  plugin="Rcpp")
>
>
> round(array(fx5(0:4, runif(3), runif(3)), c(5, 5, 3)), 2)
> length(unique(fx5(0:4, runif(3), runif(3)))) # Only 45. Why are there
> duplicates?
>
>
> fxR5 <- function(K, omega, gamma)
> {
>    lk <- length(K)
>    bpargs <- cbind(rep(K, times=lk), rep(K, each=lk),
>        rep(omega, each=lk*lk),
>        rep(gamma, each=lk*lk)) # This could possibly run into memory limits
>    bp <- matrix(0, nrow(bpargs), lk)
>    for(i in K) {
>        notzero <- which(bpargs[,2] >= i & bpargs[,1] >= i)
>        bp[notzero,i+1] <- dbinom(i, bpargs[notzero,2], bpargs[notzero,3]) *
>            dpois(bpargs[notzero, 1] - i, bpargs[notzero, 4])
>        }
>    bpsum <- rowSums(bp)
>    bpsum <- array(bpsum, c(lk, lk, length(omega)))
>    return(bpsum)
> }
>
> round(fxR5(0:4, runif(3), runif(3)), 2)
> length(unique(fxR5(0:4, runif(3), runif(3)))) #75 like it should be
>
> K <- 0:4
> omega <- runif(3)
> gamma <- runif(3)
>
> isTRUE(all.equal(
>    array(fx5(0:4, omega, gamma), c(5,3,3)),
>    fxR5(0:4, omega, gamma))) # FALSE ?
>
> system.time(array(fx5(0:100, omega, gamma),
> c(length(K),length(K),length(gamma)))) #  2.36s
> system.time(fxR5(0:100, omega, gamma)) # 3.41s
>
>
>
>
>
>
>
>
>
>
> On Thu, Aug 5, 2010 at 6:32 AM, Romain Francois
> <romain at r-enthusiasts.com> wrote:
>> Le 05/08/10 12:11, Romain Francois a écrit :
>>>
>>> Le 05/08/10 11:35, Romain Francois a écrit :
>>>>
>>>> Hello,
>>>>
>>>> I've put my spin on Dirk's solution and added a few things in Rcpp to
>>>> combine convenience and performance.
>>>>
>>>> I've added Rcpp::stats::dpois and Rcpp::stats::dbinom as sugar
>>>> functions, and more will come (time permitting, this is actually very
>>>> easy and mechanical, so if someone wants to add other (dnorm, etc ...) I
>>>> can guide them)
>>>>
>>>> Anyway, this means we can call dbinom and dpois on integer vectors (and
>>>> actually on any integer sugar expression).
>>>>
>>>> With this, here comes fx3 :
>>>>
>>>> fx3 <- cxxfunction(signature(Kr="integer"), '
>>>> IntegerVector K(Kr);
>>>> int n = K.size() ;
>>>> NumericMatrix bpsum(n, n);
>>>> IntegerVector cmin0 ;
>>>> NumericVector bp ;
>>>>
>>>> for(int i=0; i<n; i++) {
>>>> cmin0 = seq(0, K[i]);
>>>>
>>>> for(int j=0; j<n; j++) {
>>>> bp = stats::dbinom(cmin0, K[j], 0.5) * stats::dpois( K[j]-cmin0 , 1.5) ;
>>>> bpsum(i, j) = std::accumulate(bp.begin(), bp.end(), 0.0);
>>>> }
>>>> }
>>>> return bpsum;
>>>> ', plugin="Rcpp")
>>>>
>>>>
>>>> which on my mac does a better job than Dirk's version :
>>>>
>>>> test replications elapsed relative user.self sys.self
>>>> 1 fx(0:100) 100 68.113 4.903744 67.409 0.704
>>>> 2 fx2(0:100) 100 15.913 1.145644 15.523 0.391
>>>> 3 fx3(0:100) 100 13.890 1.000000 13.782 0.108
>>>> 4 fxR(0:100) 100 27.100 1.951044 26.536 0.564
>>>>
>>>> This is because Dirk's wrap_dbinom and wrap_dpois are not lazy, so they
>>>> allocate memory for the result, etc ... where stats::dbinom and
>>>> stats::dpois are lazy so they don't require any memory allocation.
>>>>
>>>>
>>>> I guess we can do even better when someone (perhaps me) implements
>>>> Rcpp::sum because we would not even have to allocate memory for bp.
>>>
>>> Done in revision 1912, which gives fx4 :
>>>
>>> fx4 <- cxxfunction(signature(Kr="integer"), '
>>> IntegerVector K(Kr);
>>> int n = K.size() ;
>>> NumericMatrix bpsum(n, n);
>>> IntegerVector cmin0 ;
>>>
>>> for(int i=0; i<n; i++) {
>>> cmin0 = seq(0, K[i]);
>>> for(int j=0; j<n; j++) {
>>> bpsum(i, j) = sum(
>>> stats::dbinom(cmin0, K[j], 0.5) * stats::dpois( K[j]-cmin0 , 1.5)
>>> );
>>> }
>>> }
>>> return bpsum;
>>> ', plugin="Rcpp")
>>>
>>>
>>> Here we don't need the "bp" variable and we are yet a little bit better:
>>>
>>> test replications elapsed relative user.self sys.self
>>> 1 fx(0:100) 100 67.927 5.180917 67.221 0.701
>>> 2 fx2(0:100) 100 15.911 1.213561 15.505 0.405
>>> 3 fx3(0:100) 100 13.643 1.040577 13.537 0.105
>>> 4 fx4(0:100) 100 13.111 1.000000 13.088 0.023
>>> 5 fxR(0:100) 100 27.084 2.065746 26.434 0.650
>>>
>>>
>>> Note that the game here is not to achieve the best performance, but the
>>> best convenience/performance ratio ...
>>>
>>> Romain
>>
>> And here is another version, for completeness, using Rcpp::outer
>>
>> inc5 <- '
>>
>> class Calculation : public std::binary_function<int,int,double> {
>> public:
>>        Calculation(){}
>>        inline double operator()(int ki, int kj) const {
>>                IntegerVector cmin0 = seq(0, ki ) ;
>>                return sum(
>>                        stats::dbinom(cmin0, kj, 0.5) * stats::dpois( kj -
>> cmin0 , 1.5)
>>                        ) ;
>>        }
>> } ;
>> '
>>
>> fx5 <- cxxfunction(signature(Kr="integer"), '
>>    IntegerVector K(Kr);
>>    return wrap( outer( K, K, Calculation() ) );
>> ',  plugin="Rcpp", includes = inc5 )
>>
>>
>> This one does not do as good as fx4 because cmin0 is calculated each time
>> instead of one for each i, but I still like this version because things are
>> nicely separated.
>>
>> Romain
>>
>>
>>>> Couple of other things:
>>>>
>>>> - I've added Rcpp::seq so that we don't have to do cmin and cmin0, and I
>>>> need to investigate why " seq_len(K[i]+1)-1 " does not work.
>>>>
>>>> - The assignment operator for IntegerVector is smart enough to not
>>>> allocate memory if the existing vector is of the same size as the
>>>> expression it is given. This is why I am declaring cmin0 once and then
>>>> allocate into it instead of allocating each time. It might even be that
>>>> we don't need cmin at all.
>>>>
>>>> - I am also declaring bp upfront for the same reason
>>>>
>>>> Romain
>>>>
>>>> Le 04/08/10 22:10, Dirk Eddelbuettel a écrit :
>>>>>
>>>>> Richard,
>>>>>
>>>>> Great question.
>>>>>
>>>>> One obvious consideration for improvement is to not call dbinom and
>>>>> dpois
>>>>> from R, but rather using the C entry points provided by Rmath.h. With
>>>>> that, I
>>>>> get these performances:
>>>>>
>>>>>> require(rbenchmark)
>>>>>
>>>>> Loading required package: rbenchmark
>>>>>>
>>>>>> benchmark(fx(0:100),
>>>>>
>>>>> + fx2(0:100),
>>>>> + fxR(0:100))
>>>>> test replications elapsed relative user.self
>>>>> 1 fx(0:100) 100 89.426 3.811850 89.42
>>>>> 2 fx2(0:100) 100 23.460 1.000000 23.46
>>>>> 3 fxR(0:100) 100 41.699 1.777451 41.69
>>>>>>
>>>>>> # deleted empty sys.self, user,child, sys.child columns here
>>>>>
>>>>> The bad news is that I don't get the same results:
>>>>
>>>> Richard already identified the problem here: lambda needs to be a
>>>> double, not an int.
>>>>
>>>>>> all.equal(fx(0:10), fxR(0:10))
>>>>>
>>>>> [1] TRUE
>>>>>>
>>>>>> all.equal(fx2(0:10), fxR(0:10))
>>>>>
>>>>> [1] "Mean relative difference: 0.2535309"
>>>>>>
>>>>>
>>>>> But as Doug suggested, there may be a need for reworking things and
>>>>> checks
>>>>> anyway. Or maybe I just introduced a bug -- dunno. I merely meant to
>>>>> help on
>>>>> pointing out that Rmath.h is there too, and I did this as a quick and
>>>>> dirty
>>>>> wrap around the 'atomistic' Rmath functions. Maybe someone would want to
>>>>> contribute vectorised versions of these ? Patches welcome, as they
>>>>> say...
>>>>>
>>>>> Dirk
>>>
>>>
>>
>>
>> --
>> Romain Francois
>> Professional R Enthusiast
>> +33(0) 6 28 91 30 30
>> http://romainfrancois.blog.free.fr
>> |- http://bit.ly/aAyra4 : highlight 0.2-2
>> |- http://bit.ly/94EBKx : inline 0.3.6
>> `- http://bit.ly/aryfrk : useR! 2010
>>
>>
>