[Rcpp-devel] Using sugar expressions for evaluating deviance residuals

Romain Francois romain at r-enthusiasts.com
Mon Aug 16 15:10:19 CEST 2010


Nothing really conclusive here ...

I've added a preliminary version of SUGAR_BLOCK_3 (only for the case 
where all inputs are vectors) so that one can write :

incl8 <- '
inline double y_log_y(double y, double mu) {
     return y ? y*log(y/mu) : 0.;
}
inline double yMu(double y, double mu) {
     return 2. * (y_log_y(y, mu) + y_log_y(1.-y,1.-mu));
}
double yEta(double y, double eta, double w) {
     return w * yMu(y, 1./(1. + exp(-eta)));
}
SUGAR_BLOCK_3(yeta, ::yEta)
'
code8 <- '
     NumericVector ans = yeta(NumericVector(yy), NumericVector(ee), 
NumericVector(ww) );
     return ans ;
'

I get :

$ Rscript ylogy.R
Le chargement a nécessité le package : methods
              test replications elapsed user.self
1 fx0(yy, ee, ww)          500   0.953     0.947
2 fx1(yy, ee, ww)          500   2.576     2.521
3 fx2(yy, ee, ww)          500   2.887     2.843
4 fx3(yy, ee, ww)          500   2.898     2.854
5 fx4(yy, ee, ww)          500   1.273     1.241
6 fx5(yy, ee, ww)          500   1.502     1.450
7 fx6(yy, ee, ww)          500   1.408     1.356
8 fx7(yy, ee, ww)          500   1.303     1.273
9 fx8(yy, ee, ww)          500   1.452     1.389

So slightly better than fx5 but not as good as fx4


I guess there needs to be some investigation on why fx1, fx2 and fx3 
don't perform "well".

Romain

Le 15/08/10 22:40, Douglas Bates a écrit :
> On Sun, Aug 15, 2010 at 10:34 AM, Douglas Bates<bates at stat.wisc.edu>  wrote:
>> On Sun, Aug 15, 2010 at 3:39 AM, Romain Francois
>> <romain at r-enthusiasts.com>  wrote:
>>> Hi,
>>>
>>> Could you provide some minimal example using inline, etc ...
>>
>> I enclose a comparison of several different versions that I wrote.
>> Interestingly the old code, called by fx3, does reasonably well,
>> although I'm not sure if this is a fair comparison as the levels of
>> optimization by the compiler may not be the same for the code
>> generated by inline and for the code that is part of R.
>>
>> As we have seen in the past, extracting the pointers and calling
>> scalar functions on individual elements does well.  The direct use of
>> the sugar functions is much easier to understand in the code (it is
>> essentially the same as writing R expressions) except that ifelse() is
>> expensive in R but not as a sugar expression.
>>
>>> Here are a few hints in the meantime. In the next version of Rcpp, I've
>>> added a few things that help generating sugar functions. This is how for
>>> example the sugar version of choose is currently implemented :
>>>
>>> SUGAR_BLOCK_2(choose      , ::Rf_choose     )
>>>
>>> The SUGAR_BLOCK_2 macro (perhaps I should prefix it with RCPP_) lives in
>>> Rcpp/sugar/SugarBlock.h
>>>
>>> You can promote y_log_y to a sugar function like this :
>>>
>>> inc<- '
>>> double y_log_y(double y, double mu){
>>>     return (y) ? (y * log(y/mu)) : 0;
>>> }
>>> SUGAR_BLOCK_2( y_log_y , ::y_log_y )
>>> '
>
> I added a couple of other tests based on SUGAR_BLOCK_2 and cleaned up
> the logic a bit.  I enclose the results.
>>> fx<- cxxfunction( signature( y = "numeric", mu = "numeric" ), '
>>>         NumericVector res = y_log_y(
>>>                 NumericVector(y),
>>>                 NumericVector(mu)
>>>         ) ;
>>>         return res ;
>>> ', plugin = "Rcpp", includes = inc )
>>> fx( runif(11) , seq(0, 1, .1 ) )
>>>
>>>
>>> This gives you 3 Rcpp::y_log_y functions :
>>> - one for when y is a double and mu is a NumericVector (or any numeric sugar
>>> expression)
>>> - one for when y is a NumericVector and mu is a double
>>> - one for both are NumericVector.
>>>
>>> However, at the moment it does not take care about recycling so it is up to
>>> the user to make sure that y and mu have the same length. I'm currently
>>> thinking about how to deal with recycling.
>>>
>>>
>>>
>>> rep is a possibility here, but consider this hint in the TODO file:
>>>
>>>     o   not sure rep should be lazy, i.e. rep( x, 4 ) fetches x[i] 4 times,
>>>         maybe we should use LazyVector like in outer to somehow cache the
>>>         result when it is judged expensive to calculate
>>>
>>>
>>>
>>> One other thing about sugar is that it has to do many checks for missing
>>> values so if (with the one defined above) you did call this expression:
>>>
>>> 2 * wt * (y_log_y(y, mu) + y_log_y(1.-y, 1.-mu))
>>>
>>> (and again, currently binary operators +,*,.. don't take care of recycling)
>>>
>>> you would have many checks for missing values. That is fine if you may have
>>> missing values, because it will propagate them correctly, but it can slow
>>> things down because they are tested for at every step.
>>>
>>> If you are sure that y and mu don't contain missing values, then perhaps one
>>> thing we can do is embed that information in the data so that sugar does not
>>> have to check for missing values because it just assumes there are not any.
>>> Most of the code in sugar contains version that ignores missing values,
>>> controlled by a template parameter. For example seq_len creates a sugar
>>> expression where we know for sure that it does not contain missing values.
>>>
>>>
>>> One way perhaps to short circuit this is to first write the code with
>>> double's and then promote it to sugar:
>>>
>>> double resid( double y, double mu, double w){
>>>         return 2 * wt * (y_log_y(y, mu) + y_log_y(1.-y, 1.-mu) ;
>>> }
>>>
>>> But then you need someone to write SUGAR_BLOCK_3 or write it manually.
>>>
>>>
>>> Another idea would be to have something like NumericVector::import_transform
>>> but that would take 3 vector parameters instead of 1.
>>>
>>>
>>> Sorry if this email is a bit of a mess, I sort of wrote the ideas as they
>>> came.
>>>
>>> Romain
>>>
>>> Le 14/08/10 20:28, Douglas Bates a écrit :
>>>>
>>>> In profiling code for generalized linear mixed models for binary
>>>> responses I found that a substantial portion of the execution time was
>>>> being spend in evaluating the R functions for inverse links and for
>>>> the deviance residuals.  As I result I wrote C code (in
>>>> $RSRC/src/library/stats/src/family.c) or some of those.
>>>>
>>>> The way that some research is going I expect that I will soon be in
>>>> the position where I need to evaluate such expressions even more
>>>> frequently so it is worthwhile to me to tune it up.
>>>>
>>>> In general there will be three NumericVector objects -- y (observed
>>>> responses), wt (weights) and eta (linear predictors) -- plus an
>>>> IntegerVector ind.  y, wt and ind will all have length n.  eta's
>>>> length can be a multiple of n.
>>>>
>>>> The index vector, ind, is a factor with k<    n levels so all of its
>>>> elements are between 1 and k.
>>>>
>>>> The objective is to apply the inverse link function to eta, producing
>>>> the predicted mean response, mu, evaluate the deviance residuals for
>>>> y, mu and wt and sum the deviance residuals according to the indices
>>>> in ind.
>>>>
>>>> The simplest inverse link is for the logit link
>>>>
>>>>   NumericVector mu = 1./(1. + exp(-eta));
>>>>
>>>> For the deviance residuals I defined an helper function
>>>>
>>>> static R_INLINE
>>>> double y_log_y(double y, double mu)
>>>> {
>>>>      return (y) ? (y * log(y/mu)) : 0;
>>>> }
>>>>
>>>> which has the appropriate limiting behavior when y is zero.  Using
>>>> that, the deviance residuals can be evaluated as
>>>>
>>>> 2 * wt * (y_log_y(y, mu) + y_log_y(1.-y, 1.-mu))
>>>>
>>>> That last expression could have different lengths of y and mu but I
>>>> could use rep to extend y, wt and ind to the desired length.
>>>>
>>>> The conditional evaluation in y_log_y is not something that can be
>>>> ignored because, in many cases y consists only of 0's and 1's so
>>>> either y_log_y(y, mu) or y_log_y(1.-y, 1.-mu) will fail the condition
>>>> in the ? expression.
>>>>
>>>> Are there suggestions on how best to structure the calculation to make
>>>> it blazingly fast?


-- 
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30
http://romainfrancois.blog.free.fr
|- http://bit.ly/bzoWrs : Rcpp svn revision 2000
|- http://bit.ly/b8VNE2 : Rcpp at LondonR, oct 5th
`- http://bit.ly/aAyra4 : highlight 0.2-2



More information about the Rcpp-devel mailing list