[Rcpp-devel] speed fix

Mon Dec 27 15:34:04 CET 2010

On Mon, Dec 27, 2010 at 6:57 AM, Richard Chandler
<richard.chandlers at gmail.com> wrote:
> Hi,
>
> I am trying to speed up a vectorized R function without much luck. Perhaps
> it is not possible, but if it is could someone suggest how to improve the
> C++/Rcpp code below? The speed is approximately equal on a 64-bit version of
> R, but the R code is much faster when using 32-bit R (on windows). Here's an
> example.

You are probably correct that there isn't much to be gained in going
to compiled code relative to the vectorized R code.  It is likely that
the majority of the execution time is the actual numerical evaluation,
not the looping overhead.  However, if you want to try to improve the
execution time of the compiled code you should look analyze your inner
loop, as described below.

>  # R function
> nllR <- function(y, psi, p, nd) {
>         cp <- (p^y) * ((1 - p)^(1 - y))
>         cp <- exp(rowSums(log(cp))) # faster than: apply(cp, 1, prod)
>     loglik <- log(cp * psi + nd * (1 - psi))
>         -sum(loglik)
>     }
>
>
> # Rcpp function
> nllRcpp <- cxxfunction(signature(yR="matrix", psiR="numeric", pR="matrix",
>     ndR="integer"),
>     '
>     NumericMatrix y(yR);
>     NumericVector psi(psiR);
>     NumericMatrix p(pR);
>     IntegerVector nd(ndR);
>
>     NumericVector yV(y);
>     int R = y.nrow();
>     int J = y.ncol();
>     NumericMatrix cp(R, J);
>     NumericVector ll(R);
>     NumericVector cpsum(R);
>
>     for(int i=0; i<R; i++) {
>         for(int j=0; j<J; j++) {
>             cp(i,j) = pow(p(i,j), y(i,j)) * pow(1-p(i,j), 1-y(i,j));
>             }
>         NumericVector cpi = cp(i, _);
>         cpsum[i] = exp(sum(log(cpi)));
>         ll[i] = log(cpsum[i] * psi[i] + nd[i] * (1-psi[i]));
>         }
>     double nll = -1*sum(ll);

Notice that you are going back and forth between the logarithm scale
and the original scale twice (the pow function needs to take the
logarithm then multiply by the power then exponentiate the result).
If you want to return the log likelihood then start with log(p(i,j))
and log(1 - p(i,j)) and rewrite the rest of the calculation on the
logarithm scale.  Your definition of cpsum[i] exponentiates the sum of
the logarithms then, in the next statement, takes the logarithm of a
product involving that term.

Also you should realize that y(i,j) and p(i,j) are not without cost so
cache the values once you obtain them (it is likely that the compiler
will perform this optimization anyway but there is little cost to
writing the loop effectively in the first place).  And you don't
really need to allocate the storage for cp, ll and cpsum because you
are not saving these values.  Finally, there is an advantage in using
the pointers to the array contents rather than indexing and in running
down the rows rather than across columns.

Doing all those things at once is perhaps unnecessary and may not be
worthwhile.  Let's start with evaluating log(cp) then log(cpsum).

NumericMatrix logcp(NumericMatrix p, NumericMatrix y) throw (std::exception) {
    int nr = p.nrow(), nc = p.ncol();

    if (y.nrow() != nr || y.ncol() != nc()) throw
std::exception("dimension mismatch");
    NumericMatrix ans = clone(y);

    // Can write the next part as a loop
    double *ap = ans.begin(), *pp = p.begin(), *yp = y.begin();
    int ntot = nr * nc;
    for (int i = 0; i < ntot; i++, ap++,) {

>     return wrap(nll);
>     ', plugin="Rcpp")
>
>
>
> # Test
> M <- 10000
> J <- 100
> psi <- runif(M)
> Z <- rbinom(M, 1, psi)
> p <- matrix(runif(M*J), M, J)
> y <- matrix(rbinom(M*J, 1, p*Z), M, J)
> nd <- ifelse(rowSums(y, na.rm=TRUE) == 0, 1, 0)
>
>
> all.equal(nllR(y, psi, p, nd), nllRcpp(y, psi, p, nd))
>
> # 64-bit
>> benchmark(nllR(y, psi, p, nd), nllRcpp(y, psi, p, nd))
>                     test replications elapsed relative user.self
> 1    nllR(y, psi, p, nd)          100   10.58 1.046489      9.35
> 2 nllRcpp(y, psi, p, nd)          100   10.11 1.000000      9.81
>   sys.self user.child sys.child
> 1     1.14         NA        NA
> 2     0.26         NA        NA
>
>
> # 32-bit
>> benchmark(nllR(y, psi, p, nd), nllRcpp(y, psi, p, nd))
>                     test replications elapsed relative user.self
> 1    nllR(y, psi, p, nd)          100   16.75 1.000000     15.54
> 2 nllRcpp(y, psi, p, nd)          100   29.97 1.789254     29.81
>   sys.self user.child sys.child
> 1     1.08         NA        NA
> 2     0.01         NA        NA
>
>
>
>> sessionInfo()
> R version 2.12.0 (2010-10-15)
> Platform: x86_64-pc-mingw32/x64 (64-bit)
>
> ...
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods
> [7] base
>
> other attached packages:
> [1] rbenchmark_0.3       inline_0.3.8         RcppArmadillo_0.2.10
> [4] Rcpp_0.9.0
>
> loaded via a namespace (and not attached):
> [1] tools_2.12.0
>>
>
>
>
> Thanks,
> Richard
>
>
>
>
>
>
>
> _______________________________________________
> Rcpp-devel mailing list
> Rcpp-devel at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
>
>