[Rcpp-devel] Using sugar expressions for evaluating deviance residuals

Douglas Bates bates at stat.wisc.edu
Sat Aug 14 20:28:20 CEST 2010

```In profiling code for generalized linear mixed models for binary
responses I found that a substantial portion of the execution time was
being spend in evaluating the R functions for inverse links and for
the deviance residuals.  As I result I wrote C code (in
\$RSRC/src/library/stats/src/family.c) or some of those.

The way that some research is going I expect that I will soon be in
the position where I need to evaluate such expressions even more
frequently so it is worthwhile to me to tune it up.

In general there will be three NumericVector objects -- y (observed
responses), wt (weights) and eta (linear predictors) -- plus an
IntegerVector ind.  y, wt and ind will all have length n.  eta's
length can be a multiple of n.

The index vector, ind, is a factor with k < n levels so all of its
elements are between 1 and k.

The objective is to apply the inverse link function to eta, producing
the predicted mean response, mu, evaluate the deviance residuals for
y, mu and wt and sum the deviance residuals according to the indices
in ind.

NumericVector mu = 1./(1. + exp(-eta));

For the deviance residuals I defined an helper function

static R_INLINE
double y_log_y(double y, double mu)
{
return (y) ? (y * log(y/mu)) : 0;
}

which has the appropriate limiting behavior when y is zero.  Using
that, the deviance residuals can be evaluated as

2 * wt * (y_log_y(y, mu) + y_log_y(1.-y, 1.-mu))

That last expression could have different lengths of y and mu but I
could use rep to extend y, wt and ind to the desired length.

The conditional evaluation in y_log_y is not something that can be
ignored because, in many cases y consists only of 0's and 1's so
either y_log_y(y, mu) or y_log_y(1.-y, 1.-mu) will fail the condition
in the ? expression.

Are there suggestions on how best to structure the calculation to make
it blazingly fast?
```