[Rcpp-devel] [rcpp-devel] arma::mat constructor vs Rcpp::as and interaction with .insert_col

Dirk Eddelbuettel edd at debian.org
Fri Apr 12 03:52:01 CEST 2013


Hi Dale,

On 11 April 2013 at 21:12, Dale Smith wrote:
| I had a problem this afternoon with an arma::mat constructor and the
| mat::insert_col method.

That is not a method I've used, I think, so I will defer comments.  

Whenever I did what you do below (ie expand a N*k matrix with a standard
'iota' column for a constant in the regression) then I did so _before calling
the C++ code_ as can be seen in all versions of fastLm we shipped.

| Since I got the example from Dirk, I thought I would
| point out the problem I had and a solution. See

I am not sure I call that a "problem and a solution".  It is worth a
discussion on that page, possibly.  

It is a little like the clone() example I give in the Rcpp classes --
sometimes the side benefit of a copy is noticeable and even welcome -- as eg
here where you want to modify the data structure.

I don't do that each time as fastLm is for _pure speed_.
 
| http://gallery.rcpp.org/articles/fast-linear-model-with-armadillo/
| 
| for background. Before I get to the example to reproduce the problem and the
| fix, let me say I'm posting from home as I can't post code from my work email;
| otherwise I'm fired.
| 
| I do believe Rcpp::as should be used to marshall from NumericMatrix to

That is the default.

| arma::mat, not the arma::mat constructor that accesses raw memory, as Dirk
| uses.

a) By choice, and b) for faster performance speed, and c) after consulting on
this with Conrad "way back when". Nobody got bitten yet.  And we unit test
this each and every time RcppArmadillo is built and tested.

And note that that is _not_ the default constructor.  In fact, I had
contemplated replacing the default constructor but what we have now is fine.

| I read the Armadillo documentation this afternoon and do believe the
| problem with .insert_col is due to the handling of memory in the constructor
| 
| mat(aux_mem*, n_rows, n_cols, copy_aux_mem = true, strict = true) 
| 
| see
| 
| http://arma.sourceforge.net/docs.html

So just create a copy.
 
| Note that I've run similar examples on my Windows 7 box at work and on my OS X
| laptop (Mountain Lion) at home to confirm this is not os-dependent.

I think this just a subtle misunderstanding.  _If_ one wants to muck with the
data _then_ the "promise" made to R in the constructor

      arma::mat X(Xr.begin(), n, k, false); 

is not a good one.   

But for illustration of _why_ we do what we do, consider this simple
example. We sum up a matrix -- a cheap operation -- and can compare the cost
of instantiating the matrix:

------------------------------------------------------------------------------

#include <RcppArmadillo.h>

// [[Rcpp::depends(RcppArmadillo)]]

// [[Rcpp::export]]
double sum1(Rcpp::NumericMatrix M) {
  arma::mat A(M.begin(), M.rows(), M.cols(), false);
  return sum(sum(A));
}

// [[Rcpp::export]]
double sum2(arma::mat M) {
  return sum(sum(M));
}


/*** R
set.seed(42)
M <- matrix(rnorm(100*100),100,100)
library(microbenchmark)
print(sum1(M))
print(sum2(M))
microbenchmark(sum1(M), sum2(M), times=100)
*/

------------------------------------------------------------------------------

which when running yields

------------------------------------------------------------------------------
R> sourceCpp("/tmp/dale.cpp")

R> set.seed(42)

R> M <- matrix(rnorm(100*100),100,100)

R> library(microbenchmark)

R> print(sum1(M))
[1] -113.094

R> print(sum2(M))
[1] -113.094

R> microbenchmark(sum1(M), sum2(M), times=100)
Unit: microseconds
    expr    min      lq  median      uq    max neval
 sum1(M)  8.256  8.4805  9.5750  9.9160 47.225   100
 sum2(M) 21.852 22.2845 22.8535 23.5475 60.001   100
R> 
------------------------------------------------------------------------------

There is nice pickup in speed when we forego the extra copy.  You created
yourself a need for the copy, and you're free to make one.

We should possibly amend the Rcpp Gallery page to note this.  Could you send
a suggested patch or added paragraph?

Thanks,  Dirk


| 
| Thanks,
| Dale Smith
| dtsmith at mindspring.com
| 
| mat <- cbind(rnorm(50), rnorm(50), rnorm(50))
| y <- rnorm(50)
| 
| cppFunction('
| List fastLm(NumericVector yr, NumericMatrix Xr) {
|   int n = Xr.nrow(), k = Xr.ncol();
|    
|   arma::mat X(Xr.begin(), n, k, false); 
|   arma::colvec y(yr.begin(), yr.size(), false);
|    
|   arma::colvec coef = arma::solve(X, y);
|   arma::colvec resid = y - X*coef;
| 
|   arma::colvec ones = arma::ones<arma::colvec>(X.n_rows);
|   X.insert_cols(1, ones);
|    
|   double sig2 = arma::as_scalar(arma::trans(resid)*resid/(n-k));
|   arma::colvec stderrest = arma::sqrt(
|        sig2 * arma::diagvec( arma::inv(arma::trans(X)*X)) );
|    
|   return List::create(Named("coefficients") = coef,
|                       Named("stderr")       = stderrest);
| }', depends = c("RcppArmadillo"))
| 
| 
| cppFunction('
| List fastLmMod(NumericVector yr, NumericMatrix Xr) {
|   int n = Xr.nrow(), k = Xr.ncol();
|    
|   arma::mat X = Rcpp::as<arma::mat>(Xr);
|   arma::colvec y(yr.begin(), yr.size(), false);
|    
|   arma::colvec ones = arma::ones<arma::colvec>(X.n_rows);
|   X.insert_cols(0, ones);
|    
|   arma::colvec coef = arma::solve(X, y);
|   arma::colvec resid = y - X*coef;
| 
|   double sig2 = arma::as_scalar(arma::trans(resid)*resid/(n-k));
|   arma::colvec stderrest = arma::sqrt(
|        sig2 * arma::diagvec( arma::inv(arma::trans(X)*X)) );
|    
|   return List::create(Named("coefficients") = coef,
|                       Named("stderr")       = stderrest);
| }', depends = c("RcppArmadillo"))
| 
| fastLm(y, mat) # error is "error: Mat::init(): mismatch between size of
| auxiliary memory and requested size"
| fastLmMod(y, mat) # coefficients are -0.15785722, -0.05668155, 0.15409366,
| -0.10079147
| lm(y ~ mat) # confirm fastLmMod via lm function in R
| 
| 
| ----------------------------------------------------------------------
| _______________________________________________
| Rcpp-devel mailing list
| Rcpp-devel at lists.r-forge.r-project.org
| https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
-- 
Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com


More information about the Rcpp-devel mailing list