[Rcpp-devel] speed fix

Mon Dec 27 15:58:14 CET 2010

My apologies, I accidentally hit send on an incomplete message.

On Mon, Dec 27, 2010 at 8:34 AM, Douglas Bates <bates at stat.wisc.edu> wrote:
> On Mon, Dec 27, 2010 at 6:57 AM, Richard Chandler
> <richard.chandlers at gmail.com> wrote:
>> Hi,
>>
>> I am trying to speed up a vectorized R function without much luck. Perhaps
>> it is not possible, but if it is could someone suggest how to improve the
>> C++/Rcpp code below? The speed is approximately equal on a 64-bit version of
>> R, but the R code is much faster when using 32-bit R (on windows). Here's an
>> example.
>
> You are probably correct that there isn't much to be gained in going
> to compiled code relative to the vectorized R code.  It is likely that
> the majority of the execution time is the actual numerical evaluation,
> not the looping overhead.  However, if you want to try to improve the
> execution time of the compiled code you should look analyze your inner
> loop, as described below.
>
>>  # R function
>> nllR <- function(y, psi, p, nd) {
>>         cp <- (p^y) * ((1 - p)^(1 - y))
>>         cp <- exp(rowSums(log(cp))) # faster than: apply(cp, 1, prod)
>>     loglik <- log(cp * psi + nd * (1 - psi))
>>         -sum(loglik)
>>     }
>>
>>
>> # Rcpp function
>> nllRcpp <- cxxfunction(signature(yR="matrix", psiR="numeric", pR="matrix",
>>     ndR="integer"),
>>     '
>>     NumericMatrix y(yR);
>>     NumericVector psi(psiR);
>>     NumericMatrix p(pR);
>>     IntegerVector nd(ndR);
>>
>>     NumericVector yV(y);
>>     int R = y.nrow();
>>     int J = y.ncol();
>>     NumericMatrix cp(R, J);
>>     NumericVector ll(R);
>>     NumericVector cpsum(R);
>>
>>     for(int i=0; i<R; i++) {
>>         for(int j=0; j<J; j++) {
>>             cp(i,j) = pow(p(i,j), y(i,j)) * pow(1-p(i,j), 1-y(i,j));
>>             }
>>         NumericVector cpi = cp(i, _);
>>         cpsum[i] = exp(sum(log(cpi)));
>>         ll[i] = log(cpsum[i] * psi[i] + nd[i] * (1-psi[i]));
>>         }
>>     double nll = -1*sum(ll);

> Notice that you are going back and forth between the logarithm scale
> and the original scale twice (the pow function needs to take the
> logarithm then multiply by the power then exponentiate the result).
> If you want to return the log likelihood then start with log(p(i,j))
> and log(1 - p(i,j)) and rewrite the rest of the calculation on the
> logarithm scale.  Your definition of cpsum[i] exponentiates the sum of
> the logarithms then, in the next statement, takes the logarithm of a
> product involving that term.
>
> Also you should realize that y(i,j) and p(i,j) are not without cost so
> cache the values once you obtain them (it is likely that the compiler
> will perform this optimization anyway but there is little cost to
> writing the loop effectively in the first place).  And you don't
> really need to allocate the storage for cp, ll and cpsum because you
> are not saving these values.  Finally, there is an advantage in using
> the pointers to the array contents rather than indexing and in running
> down the rows rather than across columns.
>
> Doing all those things at once is perhaps unnecessary and may not be
> worthwhile.  Let's start with evaluating log(cp) then log(cpsum).
>
> NumericMatrix logcp(NumericMatrix p, NumericMatrix y) throw (std::exception) {
>    int nr = p.nrow(), nc = p.ncol();
>
>    if (y.nrow() != nr || y.ncol() != nc()) throw
> std::exception("dimension mismatch");
>    NumericMatrix ans = clone(y);
>
>    // Can write the next part as a loop
>    double *ap = ans.begin(), *pp = p.begin(), *yp = y.begin();
>    int ntot = nr * nc;
>    for (int i = 0; i < ntot; i++, ap++,) {
       *ap = *yp * log(*pp) + (1 - *yp) * log(1 - *pp);
    }
    return ans;

or you could use std::transform, by first defining a transformation
function of the form
double logcpscalar(double p, double y) {return y * log(p) + (1-y) * log(1-p);}

and replacing the entire loop by
   std::transform(p.begin(), p.end(), y.begin(), ans.begin(), logcpscalar);

Of course, if you will only ever use the row sums of the logarithms
then you could accumulate them while evaluating the elements and not
bother storing the elements at all.  It would look like

NumericVector rowsumsLog(> NumericMatrix p, NumericMatrix y) throw
(std::exception) {
    int nr = p.nrow(), nc = p.ncol();

    if (y.nrow() != nr || y.ncol() != nc()) throw
std::exception("dimension mismatch");
    NumericVector ans(nr, 0.);    // ensure that it is initialized to zeros
    double *ap = ans.begin(), *pp = p.begin(), *yp = y.begin();

    for (int j = 0; j < nc; j++)
        for (int i = 0; i < nr; i++, pp++, yp++)
            ap[i] += *yp * log(*pp) + (1 - *yp) * log(1 - *pp);

    return ans;

>>     return wrap(nll);
>>     ', plugin="Rcpp")
>>
>>
>>
>> # Test
>> M <- 10000
>> J <- 100
>> psi <- runif(M)
>> Z <- rbinom(M, 1, psi)
>> p <- matrix(runif(M*J), M, J)
>> y <- matrix(rbinom(M*J, 1, p*Z), M, J)
>> nd <- ifelse(rowSums(y, na.rm=TRUE) == 0, 1, 0)
>>
>>
>> all.equal(nllR(y, psi, p, nd), nllRcpp(y, psi, p, nd))
>>
>> # 64-bit
>>> benchmark(nllR(y, psi, p, nd), nllRcpp(y, psi, p, nd))
>>                     test replications elapsed relative user.self
>> 1    nllR(y, psi, p, nd)          100   10.58 1.046489      9.35
>> 2 nllRcpp(y, psi, p, nd)          100   10.11 1.000000      9.81
>>   sys.self user.child sys.child
>> 1     1.14         NA        NA
>> 2     0.26         NA        NA
>>
>>
>> # 32-bit
>>> benchmark(nllR(y, psi, p, nd), nllRcpp(y, psi, p, nd))
>>                     test replications elapsed relative user.self
>> 1    nllR(y, psi, p, nd)          100   16.75 1.000000     15.54
>> 2 nllRcpp(y, psi, p, nd)          100   29.97 1.789254     29.81
>>   sys.self user.child sys.child
>> 1     1.08         NA        NA
>> 2     0.01         NA        NA
>>
>>
>>
>>> sessionInfo()
>> R version 2.12.0 (2010-10-15)
>> Platform: x86_64-pc-mingw32/x64 (64-bit)
>>
>> ...
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods
>> [7] base
>>
>> other attached packages:
>> [1] rbenchmark_0.3       inline_0.3.8         RcppArmadillo_0.2.10
>> [4] Rcpp_0.9.0
>>
>> loaded via a namespace (and not attached):
>> [1] tools_2.12.0
>>>
>>
>>
>>
>> Thanks,
>> Richard
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Rcpp-devel mailing list
>> Rcpp-devel at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
>>
>>
>