Thanks, that's very helpful. I'll experiment with this and report back if things speed up substantially. <br><br>Richard<br><br><br><div class="gmail_quote">On Mon, Dec 27, 2010 at 9:58 AM, Douglas Bates <span dir="ltr"><<a href="mailto:bates@stat.wisc.edu">bates@stat.wisc.edu</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">My apologies, I accidentally hit send on an incomplete message.<br>
<div><div></div><div class="h5"><br>
On Mon, Dec 27, 2010 at 8:34 AM, Douglas Bates <<a href="mailto:bates@stat.wisc.edu">bates@stat.wisc.edu</a>> wrote:<br>
> On Mon, Dec 27, 2010 at 6:57 AM, Richard Chandler<br>
> <<a href="mailto:richard.chandlers@gmail.com">richard.chandlers@gmail.com</a>> wrote:<br>
>> Hi,<br>
>><br>
>> I am trying to speed up a vectorized R function without much luck. Perhaps<br>
>> it is not possible, but if it is could someone suggest how to improve the<br>
>> C++/Rcpp code below? The speed is approximately equal on a 64-bit version of<br>
>> R, but the R code is much faster when using 32-bit R (on windows). Here's an<br>
>> example.<br>
><br>
> You are probably correct that there isn't much to be gained in going<br>
> to compiled code relative to the vectorized R code. It is likely that<br>
> the majority of the execution time is the actual numerical evaluation,<br>
> not the looping overhead. However, if you want to try to improve the<br>
> execution time of the compiled code you should look analyze your inner<br>
> loop, as described below.<br>
><br>
>> # R function<br>
>> nllR <- function(y, psi, p, nd) {<br>
>> cp <- (p^y) * ((1 - p)^(1 - y))<br>
>> cp <- exp(rowSums(log(cp))) # faster than: apply(cp, 1, prod)<br>
>> loglik <- log(cp * psi + nd * (1 - psi))<br>
>> -sum(loglik)<br>
>> }<br>
>><br>
>><br>
>> # Rcpp function<br>
>> nllRcpp <- cxxfunction(signature(yR="matrix", psiR="numeric", pR="matrix",<br>
>> ndR="integer"),<br>
>> '<br>
>> NumericMatrix y(yR);<br>
>> NumericVector psi(psiR);<br>
>> NumericMatrix p(pR);<br>
>> IntegerVector nd(ndR);<br>
>><br>
>> NumericVector yV(y);<br>
>> int R = y.nrow();<br>
>> int J = y.ncol();<br>
>> NumericMatrix cp(R, J);<br>
>> NumericVector ll(R);<br>
>> NumericVector cpsum(R);<br>
>><br>
>> for(int i=0; i<R; i++) {<br>
>> for(int j=0; j<J; j++) {<br>
>> cp(i,j) = pow(p(i,j), y(i,j)) * pow(1-p(i,j), 1-y(i,j));<br>
>> }<br>
>> NumericVector cpi = cp(i, _);<br>
>> cpsum[i] = exp(sum(log(cpi)));<br>
>> ll[i] = log(cpsum[i] * psi[i] + nd[i] * (1-psi[i]));<br>
>> }<br>
>> double nll = -1*sum(ll);<br>
<br>
> Notice that you are going back and forth between the logarithm scale<br>
> and the original scale twice (the pow function needs to take the<br>
> logarithm then multiply by the power then exponentiate the result).<br>
> If you want to return the log likelihood then start with log(p(i,j))<br>
> and log(1 - p(i,j)) and rewrite the rest of the calculation on the<br>
> logarithm scale. Your definition of cpsum[i] exponentiates the sum of<br>
> the logarithms then, in the next statement, takes the logarithm of a<br>
> product involving that term.<br>
><br>
> Also you should realize that y(i,j) and p(i,j) are not without cost so<br>
> cache the values once you obtain them (it is likely that the compiler<br>
> will perform this optimization anyway but there is little cost to<br>
> writing the loop effectively in the first place). And you don't<br>
> really need to allocate the storage for cp, ll and cpsum because you<br>
> are not saving these values. Finally, there is an advantage in using<br>
> the pointers to the array contents rather than indexing and in running<br>
> down the rows rather than across columns.<br>
><br>
> Doing all those things at once is perhaps unnecessary and may not be<br>
> worthwhile. Let's start with evaluating log(cp) then log(cpsum).<br>
><br>
> NumericMatrix logcp(NumericMatrix p, NumericMatrix y) throw (std::exception) {<br>
> int nr = p.nrow(), nc = p.ncol();<br>
><br>
> if (y.nrow() != nr || y.ncol() != nc()) throw<br>
> std::exception("dimension mismatch");<br>
> NumericMatrix ans = clone(y);<br>
><br>
> // Can write the next part as a loop<br>
> double *ap = ans.begin(), *pp = p.begin(), *yp = y.begin();<br>
> int ntot = nr * nc;<br>
> for (int i = 0; i < ntot; i++, ap++,) {<br>
</div></div> *ap = *yp * log(*pp) + (1 - *yp) * log(1 - *pp);<br>
}<br>
return ans;<br>
<br>
or you could use std::transform, by first defining a transformation<br>
function of the form<br>
double logcpscalar(double p, double y) {return y * log(p) + (1-y) * log(1-p);}<br>
<br>
and replacing the entire loop by<br>
std::transform(p.begin(), p.end(), y.begin(), ans.begin(), logcpscalar);<br>
<br>
Of course, if you will only ever use the row sums of the logarithms<br>
then you could accumulate them while evaluating the elements and not<br>
bother storing the elements at all. It would look like<br>
<br>
NumericVector rowsumsLog(> NumericMatrix p, NumericMatrix y) throw<br>
<div class="im">(std::exception) {<br>
int nr = p.nrow(), nc = p.ncol();<br>
<br>
if (y.nrow() != nr || y.ncol() != nc()) throw<br>
std::exception("dimension mismatch");<br>
</div> NumericVector ans(nr, 0.); // ensure that it is initialized to zeros<br>
<div class="im"> double *ap = ans.begin(), *pp = p.begin(), *yp = y.begin();<br>
<br>
</div> for (int j = 0; j < nc; j++)<br>
for (int i = 0; i < nr; i++, pp++, yp++)<br>
ap[i] += *yp * log(*pp) + (1 - *yp) * log(1 - *pp);<br>
<br>
return ans;<br>
<div><div></div><div class="h5"><br>
<br>
>> return wrap(nll);<br>
>> ', plugin="Rcpp")<br>
>><br>
>><br>
>><br>
>> # Test<br>
>> M <- 10000<br>
>> J <- 100<br>
>> psi <- runif(M)<br>
>> Z <- rbinom(M, 1, psi)<br>
>> p <- matrix(runif(M*J), M, J)<br>
>> y <- matrix(rbinom(M*J, 1, p*Z), M, J)<br>
>> nd <- ifelse(rowSums(y, na.rm=TRUE) == 0, 1, 0)<br>
>><br>
>><br>
>> all.equal(nllR(y, psi, p, nd), nllRcpp(y, psi, p, nd))<br>
>><br>
>> # 64-bit<br>
>>> benchmark(nllR(y, psi, p, nd), nllRcpp(y, psi, p, nd))<br>
>> test replications elapsed relative user.self<br>
>> 1 nllR(y, psi, p, nd) 100 10.58 1.046489 9.35<br>
>> 2 nllRcpp(y, psi, p, nd) 100 10.11 1.000000 9.81<br>
>> sys.self user.child sys.child<br>
>> 1 1.14 NA NA<br>
>> 2 0.26 NA NA<br>
>><br>
>><br>
>> # 32-bit<br>
>>> benchmark(nllR(y, psi, p, nd), nllRcpp(y, psi, p, nd))<br>
>> test replications elapsed relative user.self<br>
>> 1 nllR(y, psi, p, nd) 100 16.75 1.000000 15.54<br>
>> 2 nllRcpp(y, psi, p, nd) 100 29.97 1.789254 29.81<br>
>> sys.self user.child sys.child<br>
>> 1 1.08 NA NA<br>
>> 2 0.01 NA NA<br>
>><br>
>><br>
>><br>
>>> sessionInfo()<br>
>> R version 2.12.0 (2010-10-15)<br>
>> Platform: x86_64-pc-mingw32/x64 (64-bit)<br>
>><br>
>> ...<br>
>><br>
>> attached base packages:<br>
>> [1] stats graphics grDevices utils datasets methods<br>
>> [7] base<br>
>><br>
>> other attached packages:<br>
>> [1] rbenchmark_0.3 inline_0.3.8 RcppArmadillo_0.2.10<br>
>> [4] Rcpp_0.9.0<br>
>><br>
>> loaded via a namespace (and not attached):<br>
>> [1] tools_2.12.0<br>
>>><br>
>><br>
>><br>
>><br>
>> Thanks,<br>
>> Richard<br>
>><br>
>><br>
>><br>
>><br>
>><br>
>><br>
>><br>
>> _______________________________________________<br>
>> Rcpp-devel mailing list<br>
>> <a href="mailto:Rcpp-devel@lists.r-forge.r-project.org">Rcpp-devel@lists.r-forge.r-project.org</a><br>
>> <a href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel" target="_blank">https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel</a><br>
>><br>
>><br>
><br>
</div></div></blockquote></div><br>