[Rcpp-devel] Generalized Linear Model funcitonality inside Inline

Fri Jan 13 16:38:25 CET 2012

On Fri, Jan 13, 2012 at 8:53 AM, Jason LaCombe
<jlacombe at naturesourcegenetics.com> wrote:
> Hi Everyone,

> I’m trying to perform the following task more efficiently in Rcpp, and am
> running into some trouble…

> Here’s a simplified example of the sort of R code I’m trying to optimize,
> for Y a binary random variable, X a categorical variable with 3 classes:

> For(I in 1:100000)
> {
>   glm(X~Y, family=”binomial”);
> }

I think you mean glm(Y ~ X, family=binomial) don't you?

If you only have a single categorical predictor there is no need to
fit a GLM because you can reduce the data to a 2 by k contingency
table (k=3 in this case) and simply take the observed ratios in each
column.

Assuming that your actual computation is more challenging than this, I
don't think you will save much by using Rcpp code to evaluate calls to
glm or, as suggested by Jeffrey Pollock, glm.fit.  Those functions are
rather old, dating to the first versions of R, and not terribly well
designed.

To gain substantial performance benefits you really would need to
perform the IRLS algorithm in C++.  I have written a set of classes
for glm family objects using RcppEigen.  These classes are part of the
lme4Eigen package under the lme4 project on R-forge.   I also have
classes for "predictor modules" that are presently specialized to
GLMMs.  Converting to GLMs would not be terribly difficult because
they are a simplification.

One simplification that would help long-running glm fits is to reduce
the model matrix to a set of unique rows and use the fraction of 1's
as the response, incorporating the number of observations in the prior
weights.  That is essentially equivalent to what I mentioned above for
the contingency table representation (because the model matrix would
end up being the identity).

> This seems like an excellent candidate for Rcpp--I simply wish to evaluate
> the ‘for’ loop in c++ to reduce the computational overhead.
>
>
>
> What I would like to avoid is having to re-write code for explicitly
> evaluating a generalized linear model in c++. Any quick-and-dirty solution
> is acceptable. Two approaches that I have tried and been unsuccessful with
> are:
>
>
>
> -Attempting to expose the C glm methods from the R stats package
>
> -Working with the RcppModels package [having trouble with this due to my
> inability to find documentation, not that there isn’t some out there]
>
>
>
> Any thoughts, references, or corrections to my above approaches would be
> appreciated.
>
>
>
> Thanks,
>
>
>
> Jason
>
>
>
>
>
>
> _______________________________________________
> Rcpp-devel mailing list
> Rcpp-devel at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel