[Rcpp-devel] Forcing a shallow versus deep copy

Romain Francois romain at r-enthusiasts.com
Fri Sep 13 12:53:59 CEST 2013


Hello,

(sorry this had to wait a few months, but now I have an idea about this)

We all love to be able to do that:

#include <RcppArmadillo.h>
using namespace Rcpp ;

// [[Rcpp::depends("RcppArmadillo")]]

// [[Rcpp::export]]
void do_stuff( arma::mat m ){
     // do stuff with m
}

This is really nice. But R does not know about arma::mat, so Rcpp 
creates an arma::mat object out of the numeric matrix it is given. This 
means data copy and therefore it will be inefficient. check this:

        arma::mat 6490.124 6546.9065 6589.4080 6651.9825 8279.696   100
    NumericMatrix    3.941    4.2195    5.2890    7.3570   22.686   100

Those numbers are microseconds that are taken by these functions:

// [[Rcpp::export]]
void test_copy( arma::mat m ){}

// [[Rcpp::export]]
void test_NumericMatrix( NumericMatrix m ){}


So now, you believe me that passing an arma::mat by copy as a parameter 
of a function that is exposed through attributes is expensive, at least 
compared to doing so with a numeric matrix where there is no data copy 
because of Rcpp's design choices.

But we know how to make this more efficient right, we just use the 
advanced constructor in armadillo, the one that uses auxiliary memory 
and we end up with something like this:

// [[Rcpp::export]]
void test_arma_adv( NumericMatrix m_ ){
     arma::mat m(m_.begin(), m_.nrow(), m_.ncol(), false) ;
     // do stuff with m
}

That's a lot more typing and some of us are frsutrated by this.

I've had a look at this this morning, and I have a partial solution. We 
could pretend that when we use pass by reference with armadillo objects, 
what happens is that behind the scenes the advanced constructor using 
the existing memory is used.

This means implementing versions of as for references and const 
references to arma matrices. This actually means implementing 
specializations of the Exporter class for these.

Here is one right here:

template <typename T>
	class Exporter< arma::Mat<T>& > {
	public:
		typedef typename Rcpp::Matrix< 
Rcpp::traits::r_sexptype_traits<T>::rtype > MATRIX ;
		
		Exporter(SEXP x) : mat(x) {}
		
		arma::Mat<T>& get(){
			arma::Mat<T>* m = new arma::Mat<T>( mat.begin(), mat.nrow(), 
mat.ncol(), false ) ;
			pointers.push_back( m ) ;
			return *m ;
		}
		
		static std::vector< arma::Mat<T>* > pointers ;
		
	private:
		MATRIX mat ;
	};
	
With this, we can now have functions like this:

// [[Rcpp::export]]
void test_ref( arma::mat& m ){
     m(0,0) = 4 ;
}

call them from R, and let armadillo work in place:

 > x <- matrix(rep(1, 16), 4)

 > test_ref(x)

 > x
      [,1] [,2] [,3] [,4]
[1,]    4    1    1    1
[2,]    1    1    1    1
[3,]    1    1    1    1
[4,]    1    1    1    1


Now the catch is that the static std::vector< arma::Mat<T>* > pointers 
grows by 1 each time one such reference is used and currently I don't 
know of a mechanism to remove the pointers when I no longer need them. I 
have a start of a plan for that, see below.

But the main question I'm asking here:


would it make sense to be able to have functions like this ? :

// [[Rcpp::export]]
void foo( arma::mat& m ){
     // do stuff with m, given m shares memory with the
     // R matrix it was constructed for
}

Is it a big deal that we would cheat on chat reference passing means ?







JJ, Maybe the attributes parser can help. At the moment this function

// [[Rcpp::export]]
void test_ref( arma::mat& m ){
     m(0,0) = 4 ;
}

gets wrapped to this:

RcppExport SEXP sourceCpp_21667_test_ref(SEXP mSEXP) {
BEGIN_RCPP
     {
         Rcpp::RNGScope __rngScope;
         arma::mat& m = Rcpp::as<arma::mat& >(mSEXP);
         test_ref(m);
     }
     return R_NilValue;
END_RCPP
}

Maybe we can add something extra after the call to test_ref that would 
"let go" of the reference, something like this:

RcppExport SEXP sourceCpp_21667_test_ref(SEXP mSEXP) {
BEGIN_RCPP
     {
         Rcpp::RNGScope __rngScope;
         arma::mat& m = Rcpp::as<arma::mat& >(mSEXP);
         test_ref(m);
         Rcpp::release< arma::mat& >( m ) ;
     }
     return R_NilValue;
END_RCPP
}

Where release would be a dummy most of the time, i.e

template <typename T>
void release( const T& x){}

but have specific implementations when needed, e.g. for T = 
arma::Mat<K>& and T = const arma::Mat<K>&

Romain


-- 
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30

Le 11/07/13 15:22, romain at r-enthusiasts.com a écrit :
>
> Hello,
>
> This comes up every now and then, I think we can find a syntax to
> initiate an arma::mat that would allow what you want.
>
> It is not likely it will come via attributes. The idea is to keep them
> simple. The solutions I see below would eventually lead to clutter, and
> we are heading in the less clutter direction.
>
> I'll think about it and propose something.
>
> Romain
>
> Le 2013-07-11 14:32, Changi Han a écrit :
>> Hello,
>>
>> I think I (superficially) understand the difference between:
>>
>> // [[Rcpp::export]]
>> double sum1(Rcpp::NumericMatrix M) {
>>     arma::mat A(M.begin(), M.rows(), M.cols(), false);
>>      return sum(sum(A));
>> }
>>
>> // [[Rcpp::export]]
>> double sum2(arma::mat A) {
>>     return sum(sum(A));
>> }
>>
>> Partly out of laziness, partly because sum2 is more elegant, and
>> partly to avoid namespace pollution, I was wondering if there is a way
>> to "force" a "shallow" copy in sum2.
>>
>> If not, then may I submit a low priority feature request. An
>> attribute? Some thing like:
>>
>> // [[Rcpp::export]]
>> double sum2(arma::mat A) {
>>     // [[ Rcpp::shallow ( A ) ]]
>>      return sum(sum(A));
>> }
>>
>> Or (akin to C++11 generalized attributes)
>>
>> // [[Rcpp::export]] { [[ Rcpp::shallow ( A ) ]] }
>> double sum2(arma::mat A) {
>>     return sum(sum(A));
>>  }
>>
>> An alternative is to have an argument in sourceCpp that takes a
>> list/vector of objects that are to be shallow or deep copied.
>>
>> For example in sum1, if M is changed within the function before
>> casting to the arma::mat, then might be cleaner to add M to a
>> list/vector of objects to be deep copied rather than cloning M within
>> sum1: leads to one fewer variable name.
>>
>> Just a thought. I can certainly live with the additional step. As
>> always, thanks for all the Rcpp goodness.
>>
>> Cheers,
>> Changi Han


-- 
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30



More information about the Rcpp-devel mailing list