[Rcpp-devel] speed fix
Richard Chandler
richard.chandlers at gmail.com
Mon Dec 27 18:24:10 CET 2010
Thanks, that's very helpful. I'll experiment with this and report back if
things speed up substantially.
Richard
On Mon, Dec 27, 2010 at 9:58 AM, Douglas Bates <bates at stat.wisc.edu> wrote:
> My apologies, I accidentally hit send on an incomplete message.
>
> On Mon, Dec 27, 2010 at 8:34 AM, Douglas Bates <bates at stat.wisc.edu>
> wrote:
> > On Mon, Dec 27, 2010 at 6:57 AM, Richard Chandler
> > <richard.chandlers at gmail.com> wrote:
> >> Hi,
> >>
> >> I am trying to speed up a vectorized R function without much luck.
> Perhaps
> >> it is not possible, but if it is could someone suggest how to improve
> the
> >> C++/Rcpp code below? The speed is approximately equal on a 64-bit
> version of
> >> R, but the R code is much faster when using 32-bit R (on windows).
> Here's an
> >> example.
> >
> > You are probably correct that there isn't much to be gained in going
> > to compiled code relative to the vectorized R code. It is likely that
> > the majority of the execution time is the actual numerical evaluation,
> > not the looping overhead. However, if you want to try to improve the
> > execution time of the compiled code you should look analyze your inner
> > loop, as described below.
> >
> >> # R function
> >> nllR <- function(y, psi, p, nd) {
> >> cp <- (p^y) * ((1 - p)^(1 - y))
> >> cp <- exp(rowSums(log(cp))) # faster than: apply(cp, 1, prod)
> >> loglik <- log(cp * psi + nd * (1 - psi))
> >> -sum(loglik)
> >> }
> >>
> >>
> >> # Rcpp function
> >> nllRcpp <- cxxfunction(signature(yR="matrix", psiR="numeric",
> pR="matrix",
> >> ndR="integer"),
> >> '
> >> NumericMatrix y(yR);
> >> NumericVector psi(psiR);
> >> NumericMatrix p(pR);
> >> IntegerVector nd(ndR);
> >>
> >> NumericVector yV(y);
> >> int R = y.nrow();
> >> int J = y.ncol();
> >> NumericMatrix cp(R, J);
> >> NumericVector ll(R);
> >> NumericVector cpsum(R);
> >>
> >> for(int i=0; i<R; i++) {
> >> for(int j=0; j<J; j++) {
> >> cp(i,j) = pow(p(i,j), y(i,j)) * pow(1-p(i,j), 1-y(i,j));
> >> }
> >> NumericVector cpi = cp(i, _);
> >> cpsum[i] = exp(sum(log(cpi)));
> >> ll[i] = log(cpsum[i] * psi[i] + nd[i] * (1-psi[i]));
> >> }
> >> double nll = -1*sum(ll);
>
> > Notice that you are going back and forth between the logarithm scale
> > and the original scale twice (the pow function needs to take the
> > logarithm then multiply by the power then exponentiate the result).
> > If you want to return the log likelihood then start with log(p(i,j))
> > and log(1 - p(i,j)) and rewrite the rest of the calculation on the
> > logarithm scale. Your definition of cpsum[i] exponentiates the sum of
> > the logarithms then, in the next statement, takes the logarithm of a
> > product involving that term.
> >
> > Also you should realize that y(i,j) and p(i,j) are not without cost so
> > cache the values once you obtain them (it is likely that the compiler
> > will perform this optimization anyway but there is little cost to
> > writing the loop effectively in the first place). And you don't
> > really need to allocate the storage for cp, ll and cpsum because you
> > are not saving these values. Finally, there is an advantage in using
> > the pointers to the array contents rather than indexing and in running
> > down the rows rather than across columns.
> >
> > Doing all those things at once is perhaps unnecessary and may not be
> > worthwhile. Let's start with evaluating log(cp) then log(cpsum).
> >
> > NumericMatrix logcp(NumericMatrix p, NumericMatrix y) throw
> (std::exception) {
> > int nr = p.nrow(), nc = p.ncol();
> >
> > if (y.nrow() != nr || y.ncol() != nc()) throw
> > std::exception("dimension mismatch");
> > NumericMatrix ans = clone(y);
> >
> > // Can write the next part as a loop
> > double *ap = ans.begin(), *pp = p.begin(), *yp = y.begin();
> > int ntot = nr * nc;
> > for (int i = 0; i < ntot; i++, ap++,) {
> *ap = *yp * log(*pp) + (1 - *yp) * log(1 - *pp);
> }
> return ans;
>
> or you could use std::transform, by first defining a transformation
> function of the form
> double logcpscalar(double p, double y) {return y * log(p) + (1-y) *
> log(1-p);}
>
> and replacing the entire loop by
> std::transform(p.begin(), p.end(), y.begin(), ans.begin(), logcpscalar);
>
> Of course, if you will only ever use the row sums of the logarithms
> then you could accumulate them while evaluating the elements and not
> bother storing the elements at all. It would look like
>
> NumericVector rowsumsLog(> NumericMatrix p, NumericMatrix y) throw
> (std::exception) {
> int nr = p.nrow(), nc = p.ncol();
>
> if (y.nrow() != nr || y.ncol() != nc()) throw
> std::exception("dimension mismatch");
> NumericVector ans(nr, 0.); // ensure that it is initialized to zeros
> double *ap = ans.begin(), *pp = p.begin(), *yp = y.begin();
>
> for (int j = 0; j < nc; j++)
> for (int i = 0; i < nr; i++, pp++, yp++)
> ap[i] += *yp * log(*pp) + (1 - *yp) * log(1 - *pp);
>
> return ans;
>
>
> >> return wrap(nll);
> >> ', plugin="Rcpp")
> >>
> >>
> >>
> >> # Test
> >> M <- 10000
> >> J <- 100
> >> psi <- runif(M)
> >> Z <- rbinom(M, 1, psi)
> >> p <- matrix(runif(M*J), M, J)
> >> y <- matrix(rbinom(M*J, 1, p*Z), M, J)
> >> nd <- ifelse(rowSums(y, na.rm=TRUE) == 0, 1, 0)
> >>
> >>
> >> all.equal(nllR(y, psi, p, nd), nllRcpp(y, psi, p, nd))
> >>
> >> # 64-bit
> >>> benchmark(nllR(y, psi, p, nd), nllRcpp(y, psi, p, nd))
> >> test replications elapsed relative user.self
> >> 1 nllR(y, psi, p, nd) 100 10.58 1.046489 9.35
> >> 2 nllRcpp(y, psi, p, nd) 100 10.11 1.000000 9.81
> >> sys.self user.child sys.child
> >> 1 1.14 NA NA
> >> 2 0.26 NA NA
> >>
> >>
> >> # 32-bit
> >>> benchmark(nllR(y, psi, p, nd), nllRcpp(y, psi, p, nd))
> >> test replications elapsed relative user.self
> >> 1 nllR(y, psi, p, nd) 100 16.75 1.000000 15.54
> >> 2 nllRcpp(y, psi, p, nd) 100 29.97 1.789254 29.81
> >> sys.self user.child sys.child
> >> 1 1.08 NA NA
> >> 2 0.01 NA NA
> >>
> >>
> >>
> >>> sessionInfo()
> >> R version 2.12.0 (2010-10-15)
> >> Platform: x86_64-pc-mingw32/x64 (64-bit)
> >>
> >> ...
> >>
> >> attached base packages:
> >> [1] stats graphics grDevices utils datasets methods
> >> [7] base
> >>
> >> other attached packages:
> >> [1] rbenchmark_0.3 inline_0.3.8 RcppArmadillo_0.2.10
> >> [4] Rcpp_0.9.0
> >>
> >> loaded via a namespace (and not attached):
> >> [1] tools_2.12.0
> >>>
> >>
> >>
> >>
> >> Thanks,
> >> Richard
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> Rcpp-devel mailing list
> >> Rcpp-devel at lists.r-forge.r-project.org
> >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
> >>
> >>
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20101227/3c18c04b/attachment-0001.htm>
More information about the Rcpp-devel
mailing list