[Rcpp-devel] rex2arma: from R expression to RcppArmadillo code

Mon Nov 3 17:25:34 CET 2014

Hi,

I am new on this list so let me say that in my opinion
RcppArmadillo have a bright future among R developers
especially among those doing scientific calculus.

That's why and to facilitate my life with it, I have developed
a small tool rex2arma() for automatic conversion of R code to
RcppArmadillo inlined code. For instance, it is written in pure R
but if the conversion time becomes an issue it can be easily ported
to C or Rcpp.

Now, let me take a small case to present how rex2arma can be
used and what it gives. The case "fastLM" from RcppDistrubution
is a perfect example for this.

The most straightforward way is to write an R function that you want
to convert, e.g.:

# pure R solution which can be automatically converted to RcppArmadillo code
fastLm_r=function(X, y) {
    df=nrow(X)-ncol(X) # more generally, it should do nb_of_rows(X)-rank(X) but I keep the original code
    coef=qr.solve(X, y)
    res=y-c(X%*%coef)
    s2=(res%*%res)/df
    std_err=sqrt(s2*diag(ginv(t(X)%*%X)))
    return(list("coefficients"=coef, "stderr"=std_err, "df"=df))
}

Then you do
 > # NB. The input params X and y must exist before rex2arma() call
 > y <- log(trees$Volume)
 > X <- cbind(1, log(trees$Girth))
 > code=rex2arma(fastLm_r, fname="fastLm_rex", exec=1)

Here "fastLm_rex" is the function name that is created by
the conversion. To call it, do like with the original R function:

result <- fastLm_rex(X, y)

That's it. Adding fastLM_rex to the whole benchmark gives:
 > print(res[,1:4])
                         test replications relative elapsed
1           fLmOneCast(X, y)         5000    1.000   0.198
11   fastLm_rex_nocopy(X, y)         5000    1.010   0.200
3          fLmConstRef(X, y)         5000    1.025   0.203
2          fLmTwoCasts(X, y)         5000    1.030   0.204
10          fastLm_rex(X, y)         5000    1.035   0.205
5    fastLmPureDotCall(X, y)         5000    1.212   0.240
4           fastLmPure(X, y)         5000    1.823   0.361
7               lm.fit(X, y)         5000    2.904   0.575
9             fastLm_r(X, y)         5000   10.025   1.985
6  fastLm(frm, data = trees)         5000   39.399   7.801
8      lm(frm, data = trees)         5000   50.919  10.082

Let glimps the cpp code now.
 > cat(code)

cppFunction(depends='RcppArmadillo', rebuild=FALSE,
'SEXP fastLm_rex(
NumericMatrix X_in_,
NumericVector y_in_) {
    using namespace arma;
    using namespace Rcpp;
    // Variable declarations
    mat X(X_in_.begin(), X_in_.nrow(), X_in_.ncol(), true);
    vec y(y_in_.begin(), y_in_.size(), true);
    // Translated code starts here

    double df=(X).n_rows-(X).n_cols;
    vec coef=solve(X, y);
    vec res=y-vectorise(X*coef);
    double s2=(dot(res, res))/df;
    vec std_err=sqrt(s2*diagvec(pinv((X).t()*X)));
    return(List::create(Named("coefficients")=coef, Named("stderr")=std_err, Named("df")=df));

}'
)

Quite straightforward and faithful to R source, isn't it?

By default, the input parameters are copied in a new memory but if you know what you are doing
you can call rex2arma with an optional parameter copy=FALSE (it corresponds to a row
"fastLm_rex_nocopy" in the benchmark above) so the cpp code will do operations in-place.

Other options exist, like rebuild or not the compiled cpp function or
produce the code with or without its execution.

Another point is that different call modes are possible. Instead of passing an R function,
like above, one can pass directly an R code (which obviously won't be executed in R
but in the produced cpp function), e.g.:

 > result=rex2arma(diag(ginv(t(X)%*%X)))

or you can just see the produced code without function creation
 > cat(rex2arma(diag(ginv(t(X)%*%X)), exec=FALSE))
This can be helpful as kind of "cheat-sheet".

You can also pass a text containing an R code
 > x=1:5; src="inner=x%*%x; outer=x%o%x"
 > (result <- rex2arma(src))

Generally speaking, some limitations apply to
the R code that can be converted:
- only numeric types are allowed (a list() can appear only
as the last returned expression)
- no indexing (for now)
- only few and basic R functions can be converted to their Armadillo homologues
- etc. (cf. comments in the source code)

So, how can I verse this code to RcppArmadillo (in hypothesis that you
are interested in and willing to test it on your side)? Do I just post
the code here? Do I commit it somewhere to a guest branch of your git
repository? Something else?

Looking forward to here from you,
Serguei.