[Rcppdevel] Evaluating Formula's in Rcpp
Dirk Eddelbuettel
edd at debian.org
Sat May 4 04:56:34 CEST 2013
Hi Gabriel,
On 3 May 2013 at 22:04, Gabriel Hoffman wrote:
 Hi,
 I any trying to develop an R function for running 1000's of
 regressions very fast. I will omit the technical reasons for this, but
 I would like to write code to perform the following:

 for(j in 1:ncol(X) ){
 fit = myRegression( y ~ age:X[,j] )
 }

 This uses R's convenient 'formula' functionality to evaluate the
 interaction term in the regression.
Interesting.
I am not sure you can. You probably have to look at the code for formula(),
model.matrix(), ... and redo it. Which will be a royal pain.
 The issue is that 'myRegression' is very complicated, high overhead, and
 takes over arguments which I have omitted for simplicity. Therefore, I
Formulae evaluation is _very_ expensive. With the various version of
fastLm() that we wrote over the years, I think I do have a "full" benchmark
somewheremaybe in the RcppArmadillo package example. [ If you can't find it
it is easy to recreate, just calling benchmark() or microbenchmark(). ] The
gist of it is that a) fastLm() is fast when you use X matrix and y vector,
to call fastLm.default() and b) fastLm() is a lot slower when you use the
formula interface  as R code parses the formula.
 would like to pass the formula "y ~ age:X[,j]" into a Rcpp function, and
 construct the relevant matrices in C++ using Rcpp::Environment, and
 Rcpp::Language, where I change the value of j each time. Because, this
 would require only one entry into my C++ code, I would not have to incur
 the overhead each time. I would like to run my analysis with a call like:

 # return pvalues from fitting ncol(X) regressions
 myRegressionWrapper( y ~ age:X[,j], data=X)

 or something like this.

 Essentially I would to have the nice functionality of lm() in Rcpp to
 evaluate:

 mf < match.call(expand.dots = FALSE)
 m < match(c("formula", "data", "subset", "weights", "na.action",
 "offset"), names(mf), 0L)
 mf < mf[c(1L, m)]
 mf$drop.unused.levels < TRUE
 mf[[1L]] < as.name("model.frame")
 mf < eval(mf, parent.frame())
 y < model.response(mf, "numeric")
 mt < attr(mf, "terms")
 X < model.matrix(mt, mf, contrasts)

 so I can run my custom regression function on y and X quickly for each
 value of j.

 Do you know how to implement the this functionality in Rcpp or through
 some other method?
I do not know of a method, which is why my implementation of fastLm, when
using a formula interface, it still slow as R code does the formula parsing
work.
So, but the "No Free Lunch Theorem" wins again.
Dirk

Dirk Eddelbuettel  edd at debian.org  http://dirk.eddelbuettel.com
More information about the Rcppdevel
mailing list