[Rcpp-devel] Evaluating Formula's in Rcpp
Dirk Eddelbuettel
edd at debian.org
Sat May 4 04:56:34 CEST 2013
Hi Gabriel,
On 3 May 2013 at 22:04, Gabriel Hoffman wrote:
| Hi,
| I any trying to develop an R function for running 1000's of
| regressions very fast. I will omit the technical reasons for this, but
| I would like to write code to perform the following:
|
| for(j in 1:ncol(X) ){
| fit = myRegression( y ~ age:X[,j] )
| }
|
| This uses R's convenient 'formula' functionality to evaluate the
| interaction term in the regression.
Interesting.
I am not sure you can. You probably have to look at the code for formula(),
model.matrix(), ... and redo it. Which will be a royal pain.
| The issue is that 'myRegression' is very complicated, high overhead, and
| takes over arguments which I have omitted for simplicity. Therefore, I
Formulae evaluation is _very_ expensive. With the various version of
fastLm() that we wrote over the years, I think I do have a "full" benchmark
somewhere--maybe in the RcppArmadillo package example. [ If you can't find it
it is easy to recreate, just calling benchmark() or microbenchmark(). ] The
gist of it is that a) fastLm() is fast when you use X matrix and y vector,
to call fastLm.default() and b) fastLm() is a lot slower when you use the
formula interface -- as R code parses the formula.
| would like to pass the formula "y ~ age:X[,j]" into a Rcpp function, and
| construct the relevant matrices in C++ using Rcpp::Environment, and
| Rcpp::Language, where I change the value of j each time. Because, this
| would require only one entry into my C++ code, I would not have to incur
| the overhead each time. I would like to run my analysis with a call like:
|
| # return p-values from fitting ncol(X) regressions
| myRegressionWrapper( y ~ age:X[,j], data=X)
|
| or something like this.
|
| Essentially I would to have the nice functionality of lm() in Rcpp to
| evaluate:
|
| mf <- match.call(expand.dots = FALSE)
| m <- match(c("formula", "data", "subset", "weights", "na.action",
| "offset"), names(mf), 0L)
| mf <- mf[c(1L, m)]
| mf$drop.unused.levels <- TRUE
| mf[[1L]] <- as.name("model.frame")
| mf <- eval(mf, parent.frame())
| y <- model.response(mf, "numeric")
| mt <- attr(mf, "terms")
| X <- model.matrix(mt, mf, contrasts)
|
| so I can run my custom regression function on y and X quickly for each
| value of j.
|
| Do you know how to implement the this functionality in Rcpp or through
| some other method?
I do not know of a method, which is why my implementation of fastLm, when
using a formula interface, it still slow as R code does the formula parsing
work.
So, but the "No Free Lunch Theorem" wins again.
Dirk
--
Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com
More information about the Rcpp-devel
mailing list