[Rcpp-devel] [rcpp-devel] arma::mat constructor vs Rcpp::as and interaction with .insert_col

Dale Smith dtsmith at mindspring.com
Mon Apr 15 03:10:08 CEST 2013


As Dirk suggested below, I'm working on a paragraph to point out the constructor used implies no modification of the arma::mat memory.

Dale

On Apr 11, 2013, at 9:52 PM, Dirk Eddelbuettel <edd at debian.org> wrote:

> 
> Hi Dale,
> 
> On 11 April 2013 at 21:12, Dale Smith wrote:
> | I had a problem this afternoon with an arma::mat constructor and the
> | mat::insert_col method.
> 
> That is not a method I've used, I think, so I will defer comments.  
> 
> Whenever I did what you do below (ie expand a N*k matrix with a standard
> 'iota' column for a constant in the regression) then I did so _before calling
> the C++ code_ as can be seen in all versions of fastLm we shipped.
> 
> | Since I got the example from Dirk, I thought I would
> | point out the problem I had and a solution. See
> 
> I am not sure I call that a "problem and a solution".  It is worth a
> discussion on that page, possibly.  
> 
> It is a little like the clone() example I give in the Rcpp classes --
> sometimes the side benefit of a copy is noticeable and even welcome -- as eg
> here where you want to modify the data structure.
> 
> I don't do that each time as fastLm is for _pure speed_.
> 
> | http://gallery.rcpp.org/articles/fast-linear-model-with-armadillo/
> | 
> | for background. Before I get to the example to reproduce the problem and the
> | fix, let me say I'm posting from home as I can't post code from my work email;
> | otherwise I'm fired.
> | 
> | I do believe Rcpp::as should be used to marshall from NumericMatrix to
> 
> That is the default.
> 
> | arma::mat, not the arma::mat constructor that accesses raw memory, as Dirk
> | uses.
> 
> a) By choice, and b) for faster performance speed, and c) after consulting on
> this with Conrad "way back when". Nobody got bitten yet.  And we unit test
> this each and every time RcppArmadillo is built and tested.
> 
> And note that that is _not_ the default constructor.  In fact, I had
> contemplated replacing the default constructor but what we have now is fine.
> 
> | I read the Armadillo documentation this afternoon and do believe the
> | problem with .insert_col is due to the handling of memory in the constructor
> | 
> | mat(aux_mem*, n_rows, n_cols, copy_aux_mem = true, strict = true) 
> | 
> | see
> | 
> | http://arma.sourceforge.net/docs.html
> 
> So just create a copy.
> 
> | Note that I've run similar examples on my Windows 7 box at work and on my OS X
> | laptop (Mountain Lion) at home to confirm this is not os-dependent.
> 
> I think this just a subtle misunderstanding.  _If_ one wants to muck with the
> data _then_ the "promise" made to R in the constructor
> 
>      arma::mat X(Xr.begin(), n, k, false); 
> 
> is not a good one.   
> 
> But for illustration of _why_ we do what we do, consider this simple
> example. We sum up a matrix -- a cheap operation -- and can compare the cost
> of instantiating the matrix:
> 
> ------------------------------------------------------------------------------
> 
> #include <RcppArmadillo.h>
> 
> // [[Rcpp::depends(RcppArmadillo)]]
> 
> // [[Rcpp::export]]
> double sum1(Rcpp::NumericMatrix M) {
>  arma::mat A(M.begin(), M.rows(), M.cols(), false);
>  return sum(sum(A));
> }
> 
> // [[Rcpp::export]]
> double sum2(arma::mat M) {
>  return sum(sum(M));
> }
> 
> 
> /*** R
> set.seed(42)
> M <- matrix(rnorm(100*100),100,100)
> library(microbenchmark)
> print(sum1(M))
> print(sum2(M))
> microbenchmark(sum1(M), sum2(M), times=100)
> */
> 
> ------------------------------------------------------------------------------
> 
> which when running yields
> 
> ------------------------------------------------------------------------------
> R> sourceCpp("/tmp/dale.cpp")
> 
> R> set.seed(42)
> 
> R> M <- matrix(rnorm(100*100),100,100)
> 
> R> library(microbenchmark)
> 
> R> print(sum1(M))
> [1] -113.094
> 
> R> print(sum2(M))
> [1] -113.094
> 
> R> microbenchmark(sum1(M), sum2(M), times=100)
> Unit: microseconds
>    expr    min      lq  median      uq    max neval
> sum1(M)  8.256  8.4805  9.5750  9.9160 47.225   100
> sum2(M) 21.852 22.2845 22.8535 23.5475 60.001   100
> R> 
> ------------------------------------------------------------------------------
> 
> There is nice pickup in speed when we forego the extra copy.  You created
> yourself a need for the copy, and you're free to make one.
> 
> We should possibly amend the Rcpp Gallery page to note this.  Could you send
> a suggested patch or added paragraph?
> 
> Thanks,  Dirk
> 
> 
> | 
> | Thanks,
> | Dale Smith
> | dtsmith at mindspring.com
> | 
> | mat <- cbind(rnorm(50), rnorm(50), rnorm(50))
> | y <- rnorm(50)
> | 
> | cppFunction('
> | List fastLm(NumericVector yr, NumericMatrix Xr) {
> |   int n = Xr.nrow(), k = Xr.ncol();
> |    
> |   arma::mat X(Xr.begin(), n, k, false); 
> |   arma::colvec y(yr.begin(), yr.size(), false);
> |    
> |   arma::colvec coef = arma::solve(X, y);
> |   arma::colvec resid = y - X*coef;
> | 
> |   arma::colvec ones = arma::ones<arma::colvec>(X.n_rows);
> |   X.insert_cols(1, ones);
> |    
> |   double sig2 = arma::as_scalar(arma::trans(resid)*resid/(n-k));
> |   arma::colvec stderrest = arma::sqrt(
> |        sig2 * arma::diagvec( arma::inv(arma::trans(X)*X)) );
> |    
> |   return List::create(Named("coefficients") = coef,
> |                       Named("stderr")       = stderrest);
> | }', depends = c("RcppArmadillo"))
> | 
> | 
> | cppFunction('
> | List fastLmMod(NumericVector yr, NumericMatrix Xr) {
> |   int n = Xr.nrow(), k = Xr.ncol();
> |    
> |   arma::mat X = Rcpp::as<arma::mat>(Xr);
> |   arma::colvec y(yr.begin(), yr.size(), false);
> |    
> |   arma::colvec ones = arma::ones<arma::colvec>(X.n_rows);
> |   X.insert_cols(0, ones);
> |    
> |   arma::colvec coef = arma::solve(X, y);
> |   arma::colvec resid = y - X*coef;
> | 
> |   double sig2 = arma::as_scalar(arma::trans(resid)*resid/(n-k));
> |   arma::colvec stderrest = arma::sqrt(
> |        sig2 * arma::diagvec( arma::inv(arma::trans(X)*X)) );
> |    
> |   return List::create(Named("coefficients") = coef,
> |                       Named("stderr")       = stderrest);
> | }', depends = c("RcppArmadillo"))
> | 
> | fastLm(y, mat) # error is "error: Mat::init(): mismatch between size of
> | auxiliary memory and requested size"
> | fastLmMod(y, mat) # coefficients are -0.15785722, -0.05668155, 0.15409366,
> | -0.10079147
> | lm(y ~ mat) # confirm fastLmMod via lm function in R
> | 
> | 
> | ----------------------------------------------------------------------
> | _______________________________________________
> | Rcpp-devel mailing list
> | Rcpp-devel at lists.r-forge.r-project.org
> | https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
> -- 
> Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com



More information about the Rcpp-devel mailing list