[Rcpp-devel] Evaluating Formula's in Rcpp

Dirk Eddelbuettel edd at debian.org
Sat May 4 04:56:34 CEST 2013


Hi Gabriel, 

On 3 May 2013 at 22:04, Gabriel Hoffman wrote:
| Hi,
|      I any trying to develop an R function for running 1000's of 
| regressions very fast.  I will omit the technical reasons for this, but 
| I would like to write code to perform the following:
| 
| for(j in 1:ncol(X) ){
|      fit = myRegression( y ~ age:X[,j] )
| }
| 
| This uses R's convenient 'formula' functionality to evaluate the 
| interaction term in the regression.

Interesting. 

I am not sure you can.  You probably have to look at the code for formula(),
model.matrix(), ... and redo it. Which will be a royal pain.
 
| The issue is that 'myRegression' is very complicated, high overhead, and 
| takes over arguments which I have omitted for simplicity. Therefore, I 

Formulae evaluation is _very_ expensive.  With the various version of
fastLm() that we wrote over the years, I think I do have a "full" benchmark
somewhere--maybe in the RcppArmadillo package example. [ If you can't find it
it is easy to recreate, just calling benchmark() or microbenchmark(). ] The
gist of it is that a) fastLm() is fast when you use X matrix and y vector,
to call fastLm.default() and b) fastLm() is a lot slower when you use the
formula interface -- as R code parses the formula.

| would like to pass the formula "y ~ age:X[,j]" into a Rcpp function, and 
| construct the relevant matrices in C++ using Rcpp::Environment, and 
| Rcpp::Language, where I change the value of j each time.  Because, this 
| would require only one entry into my C++ code, I would not have to incur 
| the overhead each time.  I would like to run my analysis with a call like:
| 
| # return p-values from fitting ncol(X) regressions
| myRegressionWrapper( y ~ age:X[,j], data=X)
| 
| or something like this.
| 
| Essentially I would to have the nice functionality of lm() in Rcpp to 
| evaluate:
| 
| mf <- match.call(expand.dots = FALSE)
| m <- match(c("formula", "data", "subset", "weights", "na.action", 
| "offset"), names(mf), 0L)
| mf <- mf[c(1L, m)]
| mf$drop.unused.levels <- TRUE
| mf[[1L]] <- as.name("model.frame")
| mf <- eval(mf, parent.frame())
| y <- model.response(mf, "numeric")
| mt <- attr(mf, "terms")
| X <- model.matrix(mt, mf, contrasts)
| 
| so I can run my custom regression function on y and X quickly for each 
| value of j.
| 
| Do you know how to implement the this functionality in Rcpp or through 
| some other method?

I do not know of a method, which is why my implementation of fastLm, when
using a formula interface, it still slow as R code does the formula parsing
work. 

So, but the "No Free Lunch Theorem" wins again.

Dirk

-- 
Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com


More information about the Rcpp-devel mailing list