[Rcpp-devel] speed fix

Mon Dec 27 18:24:10 CET 2010

Thanks, that's very helpful. I'll experiment with this and report back if
things speed up substantially.

Richard

On Mon, Dec 27, 2010 at 9:58 AM, Douglas Bates <bates at stat.wisc.edu> wrote:

> My apologies, I accidentally hit send on an incomplete message.
>
> On Mon, Dec 27, 2010 at 8:34 AM, Douglas Bates <bates at stat.wisc.edu>
> wrote:
> > On Mon, Dec 27, 2010 at 6:57 AM, Richard Chandler
> > <richard.chandlers at gmail.com> wrote:
> >> Hi,
> >>
> >> I am trying to speed up a vectorized R function without much luck.
> Perhaps
> >> it is not possible, but if it is could someone suggest how to improve
> the
> >> C++/Rcpp code below? The speed is approximately equal on a 64-bit
> version of
> >> R, but the R code is much faster when using 32-bit R (on windows).
> Here's an
> >> example.
> >
> > You are probably correct that there isn't much to be gained in going
> > to compiled code relative to the vectorized R code.  It is likely that
> > the majority of the execution time is the actual numerical evaluation,
> > not the looping overhead.  However, if you want to try to improve the
> > execution time of the compiled code you should look analyze your inner
> > loop, as described below.
> >
> >>  # R function
> >> nllR <- function(y, psi, p, nd) {
> >>         cp <- (p^y) * ((1 - p)^(1 - y))
> >>         cp <- exp(rowSums(log(cp))) # faster than: apply(cp, 1, prod)
> >>     loglik <- log(cp * psi + nd * (1 - psi))
> >>         -sum(loglik)
> >>     }
> >>
> >>
> >> # Rcpp function
> >> nllRcpp <- cxxfunction(signature(yR="matrix", psiR="numeric",
> pR="matrix",
> >>     ndR="integer"),
> >>     '
> >>     NumericMatrix y(yR);
> >>     NumericVector psi(psiR);
> >>     NumericMatrix p(pR);
> >>     IntegerVector nd(ndR);
> >>
> >>     NumericVector yV(y);
> >>     int R = y.nrow();
> >>     int J = y.ncol();
> >>     NumericMatrix cp(R, J);
> >>     NumericVector ll(R);
> >>     NumericVector cpsum(R);
> >>
> >>     for(int i=0; i<R; i++) {
> >>         for(int j=0; j<J; j++) {
> >>             cp(i,j) = pow(p(i,j), y(i,j)) * pow(1-p(i,j), 1-y(i,j));
> >>             }
> >>         NumericVector cpi = cp(i, _);
> >>         cpsum[i] = exp(sum(log(cpi)));
> >>         ll[i] = log(cpsum[i] * psi[i] + nd[i] * (1-psi[i]));
> >>         }
> >>     double nll = -1*sum(ll);
>
> > Notice that you are going back and forth between the logarithm scale
> > and the original scale twice (the pow function needs to take the
> > logarithm then multiply by the power then exponentiate the result).
> > If you want to return the log likelihood then start with log(p(i,j))
> > and log(1 - p(i,j)) and rewrite the rest of the calculation on the
> > logarithm scale.  Your definition of cpsum[i] exponentiates the sum of
> > the logarithms then, in the next statement, takes the logarithm of a
> > product involving that term.
> >
> > Also you should realize that y(i,j) and p(i,j) are not without cost so
> > cache the values once you obtain them (it is likely that the compiler
> > will perform this optimization anyway but there is little cost to
> > writing the loop effectively in the first place).  And you don't
> > really need to allocate the storage for cp, ll and cpsum because you
> > are not saving these values.  Finally, there is an advantage in using
> > the pointers to the array contents rather than indexing and in running
> > down the rows rather than across columns.
> >
> > Doing all those things at once is perhaps unnecessary and may not be
> > worthwhile.  Let's start with evaluating log(cp) then log(cpsum).
> >
> > NumericMatrix logcp(NumericMatrix p, NumericMatrix y) throw
> (std::exception) {
> >    int nr = p.nrow(), nc = p.ncol();
> >
> >    if (y.nrow() != nr || y.ncol() != nc()) throw
> > std::exception("dimension mismatch");
> >    NumericMatrix ans = clone(y);
> >
> >    // Can write the next part as a loop
> >    double *ap = ans.begin(), *pp = p.begin(), *yp = y.begin();
> >    int ntot = nr * nc;
> >    for (int i = 0; i < ntot; i++, ap++,) {
>        *ap = *yp * log(*pp) + (1 - *yp) * log(1 - *pp);
>    }
>    return ans;
>
> or you could use std::transform, by first defining a transformation
> function of the form
> double logcpscalar(double p, double y) {return y * log(p) + (1-y) *
> log(1-p);}
>
> and replacing the entire loop by
>   std::transform(p.begin(), p.end(), y.begin(), ans.begin(), logcpscalar);
>
> Of course, if you will only ever use the row sums of the logarithms
> then you could accumulate them while evaluating the elements and not
> bother storing the elements at all.  It would look like
>
> NumericVector rowsumsLog(> NumericMatrix p, NumericMatrix y) throw
> (std::exception) {
>    int nr = p.nrow(), nc = p.ncol();
>
>    if (y.nrow() != nr || y.ncol() != nc()) throw
> std::exception("dimension mismatch");
>     NumericVector ans(nr, 0.);    // ensure that it is initialized to zeros
>     double *ap = ans.begin(), *pp = p.begin(), *yp = y.begin();
>
>     for (int j = 0; j < nc; j++)
>        for (int i = 0; i < nr; i++, pp++, yp++)
>            ap[i] += *yp * log(*pp) + (1 - *yp) * log(1 - *pp);
>
>    return ans;
>
>
> >>     return wrap(nll);
> >>     ', plugin="Rcpp")
> >>
> >>
> >>
> >> # Test
> >> M <- 10000
> >> J <- 100
> >> psi <- runif(M)
> >> Z <- rbinom(M, 1, psi)
> >> p <- matrix(runif(M*J), M, J)
> >> y <- matrix(rbinom(M*J, 1, p*Z), M, J)
> >> nd <- ifelse(rowSums(y, na.rm=TRUE) == 0, 1, 0)
> >>
> >>
> >> all.equal(nllR(y, psi, p, nd), nllRcpp(y, psi, p, nd))
> >>
> >> # 64-bit
> >>> benchmark(nllR(y, psi, p, nd), nllRcpp(y, psi, p, nd))
> >>                     test replications elapsed relative user.self
> >> 1    nllR(y, psi, p, nd)          100   10.58 1.046489      9.35
> >> 2 nllRcpp(y, psi, p, nd)          100   10.11 1.000000      9.81
> >>   sys.self user.child sys.child
> >> 1     1.14         NA        NA
> >> 2     0.26         NA        NA
> >>
> >>
> >> # 32-bit
> >>> benchmark(nllR(y, psi, p, nd), nllRcpp(y, psi, p, nd))
> >>                     test replications elapsed relative user.self
> >> 1    nllR(y, psi, p, nd)          100   16.75 1.000000     15.54
> >> 2 nllRcpp(y, psi, p, nd)          100   29.97 1.789254     29.81
> >>   sys.self user.child sys.child
> >> 1     1.08         NA        NA
> >> 2     0.01         NA        NA
> >>
> >>
> >>
> >>> sessionInfo()
> >> R version 2.12.0 (2010-10-15)
> >> Platform: x86_64-pc-mingw32/x64 (64-bit)
> >>
> >> ...
> >>
> >> attached base packages:
> >> [1] stats     graphics  grDevices utils     datasets  methods
> >> [7] base
> >>
> >> other attached packages:
> >> [1] rbenchmark_0.3       inline_0.3.8         RcppArmadillo_0.2.10
> >> [4] Rcpp_0.9.0
> >>
> >> loaded via a namespace (and not attached):
> >> [1] tools_2.12.0
> >>>
> >>
> >>
> >>
> >> Thanks,
> >> Richard
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> Rcpp-devel mailing list
> >> Rcpp-devel at lists.r-forge.r-project.org
> >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
> >>
> >>
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20101227/3c18c04b/attachment-0001.htm>