[Rcpp-devel] Possible unprotected memory problems

Douglas Bates bates at stat.wisc.edu
Fri Jul 22 18:51:42 CEST 2011


On Fri, Jul 22, 2011 at 11:24 AM, Steve Lianoglou
<mailinglist.honeypot at gmail.com> wrote:
> Hi D(oug|irk),
>
> Thanks for taking the time to respond. Both of your emails have been
> quite helpful in shedding some light on how the guts of these things
> are working.
>
> Still, I think I didn't describe the scenario I *think* I want well
> (it might be that I might not actually want to do this when I realize
> how dumb it might be :-) It could also be that I'm misunderstanding
> what you folks are telling me, but the nut of where I think our wires
> are crossed is here:
>
>> | The "hand off" of the data/pointer to a C library would be like
>> | calling PROTECT. After your C function returns control back to R, it
>> | would still claim ownership/usage of the data. Things would hum along
>>
>> I think you have it inverse.  If you create an object in C++ and hand it to
>> R, you typically do not expect to see that object ever again in C++ -- and
>> hence you let R go about its business and even gc it.
>
> I actually think I want the inverse :-)
>
> The situation I am thinking of is constructing a large object (matrix
> of whatever type) in R, then passing that down to my C++ layer. I want
> the C++ layer to lay claim to that data, maybe as if there was another
> ('ghost') object on the R said that had an additional reference to it.
>
> I see this will require some digging into R internals. I'm sorry I cut
> out most of Dirk's message by this point. I think the point you make
> below about setting some "bit" is what I think I want to chase down:
>
>> We don't do PROTECT / UNPROTECT but a bit gets set that corresponds to the
>> same. I would have to look up the details as it has been a while....
>
> I'll try to pain the scenario I think I want using the *assumption*
> that R does some simple reference counting to do its GC. In this
> imaginary situation, this is what I want:
>
> + I want R to create the object (so there is 1 ref to it now).
>
> + Shoot this object down to C++, at which point I want my C++ library
> to get no-copy access to the data (so, hold on to its pointer), and
> add a +1 to R's refcount for this object, so that R won't free the
> memory "under my feet"
>
> + My C++ function will do whatever it is does and return to R (but it
> won't decrement this ref count) .. it is still holding on to the data
> when it returns to R.
>
> + R will then go about its business as usual. On the R side, the
> object I originally passed down to C may (or may not) fall out of
> scope and the refcount will go -1, but it won't hit 0 since there is
> still 1 refcount to it from the C++ side of things.
>
> + Eventually, the C++ object will be destroyed (a situation I will be
> notified about), at which point I will decrement the refcount to my
> object, which will set it to 0. Here two things can then happen:
>
> (1) I can just free this memory myself in my C++ code (actually, I
> guess this is dangerous if R will try to clean it again later); or
> (2) Since the refcount is now 0 for this object, I can wait for R's GC
> to free the memory whenever it decides to kick in.
>
> As I said, this is kind of what I want to do. It is described in the
> hypothetical situation where R does refcount-ing for its GC stuff.
>
> All that having been said, I see that I need to really read some
> R-internals stuff if I'm going to get serious about trying this out.
>
> I guess this the document I need to read to start groking these types
> of things a bit more:
> http://cran.r-project.org/doc/manuals/R-ints.html
>
> Is there other material I can/should be looking at as well?
>
> Thanks again for sharing your expertise,

Are you using the Rcpp Module mechanism?  It seems that if you want to
create an instance of a C++ class in one call from R and have it
behave autonomously afterwards then you really should consider the
Module mechanism where you can register one or more constructors to be
called from R.

I have two situations like that, one where I am passing a read-only
vector (the response vector) to the constructor and one where I am
passing several read-only S4 objects representing different types of
model matrices to the constructor.  For something like a vector you
incorporate it in the C++ class as a const member and initialize it
from the argument.  The trick is that even if you want to use the
storage in some other form, an Eigen::Map<Eigen::VectorXd> in my case,
you also need to keep it around as an Rcpp object, so that the garbage
collection is handled properly.  Rcpp objects that are members of an
instance of a class will be protected when the instance is created and
unprotected when the object goes out of scope.

So the "model response" class is declared as

    class modResp {
    protected:
	double                     d_wrss;
	const Rcpp::NumericVector  d_yR;
	const MVectorXd            d_y;
	VectorXd                   d_weights, d_offset, d_mu, d_sqrtXwt,
d_sqrtrwt, d_wtres;
    public:
	modResp(Rcpp::S4)                                        throw
(std::invalid_argument);
	modResp(Rcpp::NumericVector)                             throw
(std::invalid_argument);
	modResp(Rcpp::NumericVector, Rcpp::NumericVector)        throw
(std::invalid_argument);
	modResp(Rcpp::NumericVector, Rcpp::NumericVector,
		Rcpp::NumericVector)                             throw
(std::invalid_argument);

	const VectorXd&           mu() const {return d_mu;}
	const VectorXd&       offset() const {return d_offset;}
	const VectorXd&      sqrtXwt() const {return d_sqrtXwt;}
	const VectorXd&      sqrtrwt() const {return d_sqrtrwt;}
	const VectorXd&      weights() const {return d_weights;}
	const VectorXd&        wtres() const {return d_wtres;}
	const MVectorXd&           y() const {return d_y;}
	double                  wrss() const {return d_wrss;}
	double              updateMu(const VectorXd&);
	double             updateWts()       {return updateWrss();}
	double            updateWrss();
	void                    init();
    };

and one of the constructors is

    modResp::modResp(NumericVector y)
throw (invalid_argument)
	: d_yR(y),
	  d_y(d_yR.begin(), d_yR.size()),
	  d_weights(VectorXd::Constant(y.size(), 1.0)),
	  d_offset( VectorXd::Zero(y.size())),
	  d_mu(     y.size()),
	  d_sqrtXwt(y.size()),
	  d_sqrtrwt(y.size()),
	  d_wtres(  y.size()) {
	init();
    }

The argument to the constructor is used to construct both the d_yR
member, which is never used, and the d_y member, both of which share
the storage originally allocated in R.

When I initiailize from an S4 object from the Matrix or MatrixModels
package, I simply save the S4 object as a member of the class

    class ddenseModelMatrix : public MMatrixXd, public modelMatrix {
    protected:
	Rcpp::S4             d_xp;
    public:
	typedef MatrixXd           MatrixType;
	ddenseModelMatrix(const Rcpp::S4& xp)
	    : MMatrixXd(Rcpp::NumericVector(xp.slot("x")).begin(),
			::Rf_asInteger(xp.slot("Dim")),
			Rcpp::IntegerVector(xp.slot("Dim"))[1]),
	      modelMatrix(xp), d_xp(xp) {}
    };

where MMatrixXd is a typedef for Eigen::Map<Eigen::MatrixXd>

These classes are exposed to R through the Rcpp Modules mechanism.


More information about the Rcpp-devel mailing list