[Rcpp-devel] Possible unprotected memory problems

Douglas Bates bates at stat.wisc.edu
Fri Jul 22 19:46:05 CEST 2011

By the way, R doesn't have a true reference counting mechanism on
objects.  The mechanism for determining if an object must be copied or
not is somewhat messier and relies on the NAMED field in an SEXPREC.

The CXXR project does have a reference counting mechanism for R
objects but I haven't tried it in some time.

On Fri, Jul 22, 2011 at 11:51 AM, Douglas Bates <bates at stat.wisc.edu> wrote:
> On Fri, Jul 22, 2011 at 11:24 AM, Steve Lianoglou
> <mailinglist.honeypot at gmail.com> wrote:
>> Hi D(oug|irk),
>> Thanks for taking the time to respond. Both of your emails have been
>> quite helpful in shedding some light on how the guts of these things
>> are working.
>> Still, I think I didn't describe the scenario I *think* I want well
>> (it might be that I might not actually want to do this when I realize
>> how dumb it might be :-) It could also be that I'm misunderstanding
>> what you folks are telling me, but the nut of where I think our wires
>> are crossed is here:
>>> | The "hand off" of the data/pointer to a C library would be like
>>> | calling PROTECT. After your C function returns control back to R, it
>>> | would still claim ownership/usage of the data. Things would hum along
>>> I think you have it inverse.  If you create an object in C++ and hand it to
>>> R, you typically do not expect to see that object ever again in C++ -- and
>>> hence you let R go about its business and even gc it.
>> I actually think I want the inverse :-)
>> The situation I am thinking of is constructing a large object (matrix
>> of whatever type) in R, then passing that down to my C++ layer. I want
>> the C++ layer to lay claim to that data, maybe as if there was another
>> ('ghost') object on the R said that had an additional reference to it.
>> I see this will require some digging into R internals. I'm sorry I cut
>> out most of Dirk's message by this point. I think the point you make
>> below about setting some "bit" is what I think I want to chase down:
>>> We don't do PROTECT / UNPROTECT but a bit gets set that corresponds to the
>>> same. I would have to look up the details as it has been a while....
>> I'll try to pain the scenario I think I want using the *assumption*
>> that R does some simple reference counting to do its GC. In this
>> imaginary situation, this is what I want:
>> + I want R to create the object (so there is 1 ref to it now).
>> + Shoot this object down to C++, at which point I want my C++ library
>> to get no-copy access to the data (so, hold on to its pointer), and
>> add a +1 to R's refcount for this object, so that R won't free the
>> memory "under my feet"
>> + My C++ function will do whatever it is does and return to R (but it
>> won't decrement this ref count) .. it is still holding on to the data
>> when it returns to R.
>> + R will then go about its business as usual. On the R side, the
>> object I originally passed down to C may (or may not) fall out of
>> scope and the refcount will go -1, but it won't hit 0 since there is
>> still 1 refcount to it from the C++ side of things.
>> + Eventually, the C++ object will be destroyed (a situation I will be
>> notified about), at which point I will decrement the refcount to my
>> object, which will set it to 0. Here two things can then happen:
>> (1) I can just free this memory myself in my C++ code (actually, I
>> guess this is dangerous if R will try to clean it again later); or
>> (2) Since the refcount is now 0 for this object, I can wait for R's GC
>> to free the memory whenever it decides to kick in.
>> As I said, this is kind of what I want to do. It is described in the
>> hypothetical situation where R does refcount-ing for its GC stuff.
>> All that having been said, I see that I need to really read some
>> R-internals stuff if I'm going to get serious about trying this out.
>> I guess this the document I need to read to start groking these types
>> of things a bit more:
>> http://cran.r-project.org/doc/manuals/R-ints.html
>> Is there other material I can/should be looking at as well?
>> Thanks again for sharing your expertise,
> Are you using the Rcpp Module mechanism?  It seems that if you want to
> create an instance of a C++ class in one call from R and have it
> behave autonomously afterwards then you really should consider the
> Module mechanism where you can register one or more constructors to be
> called from R.
> I have two situations like that, one where I am passing a read-only
> vector (the response vector) to the constructor and one where I am
> passing several read-only S4 objects representing different types of
> model matrices to the constructor.  For something like a vector you
> incorporate it in the C++ class as a const member and initialize it
> from the argument.  The trick is that even if you want to use the
> storage in some other form, an Eigen::Map<Eigen::VectorXd> in my case,
> you also need to keep it around as an Rcpp object, so that the garbage
> collection is handled properly.  Rcpp objects that are members of an
> instance of a class will be protected when the instance is created and
> unprotected when the object goes out of scope.
> So the "model response" class is declared as
>    class modResp {
>    protected:
>        double                     d_wrss;
>        const Rcpp::NumericVector  d_yR;
>        const MVectorXd            d_y;
>        VectorXd                   d_weights, d_offset, d_mu, d_sqrtXwt,
> d_sqrtrwt, d_wtres;
>    public:
>        modResp(Rcpp::S4)                                        throw
> (std::invalid_argument);
>        modResp(Rcpp::NumericVector)                             throw
> (std::invalid_argument);
>        modResp(Rcpp::NumericVector, Rcpp::NumericVector)        throw
> (std::invalid_argument);
>        modResp(Rcpp::NumericVector, Rcpp::NumericVector,
>                Rcpp::NumericVector)                             throw
> (std::invalid_argument);
>        const VectorXd&           mu() const {return d_mu;}
>        const VectorXd&       offset() const {return d_offset;}
>        const VectorXd&      sqrtXwt() const {return d_sqrtXwt;}
>        const VectorXd&      sqrtrwt() const {return d_sqrtrwt;}
>        const VectorXd&      weights() const {return d_weights;}
>        const VectorXd&        wtres() const {return d_wtres;}
>        const MVectorXd&           y() const {return d_y;}
>        double                  wrss() const {return d_wrss;}
>        double              updateMu(const VectorXd&);
>        double             updateWts()       {return updateWrss();}
>        double            updateWrss();
>        void                    init();
>    };
> and one of the constructors is
>    modResp::modResp(NumericVector y)
> throw (invalid_argument)
>        : d_yR(y),
>          d_y(d_yR.begin(), d_yR.size()),
>          d_weights(VectorXd::Constant(y.size(), 1.0)),
>          d_offset( VectorXd::Zero(y.size())),
>          d_mu(     y.size()),
>          d_sqrtXwt(y.size()),
>          d_sqrtrwt(y.size()),
>          d_wtres(  y.size()) {
>        init();
>    }
> The argument to the constructor is used to construct both the d_yR
> member, which is never used, and the d_y member, both of which share
> the storage originally allocated in R.
> When I initiailize from an S4 object from the Matrix or MatrixModels
> package, I simply save the S4 object as a member of the class
>    class ddenseModelMatrix : public MMatrixXd, public modelMatrix {
>    protected:
>        Rcpp::S4             d_xp;
>    public:
>        typedef MatrixXd           MatrixType;
>        ddenseModelMatrix(const Rcpp::S4& xp)
>            : MMatrixXd(Rcpp::NumericVector(xp.slot("x")).begin(),
>                        ::Rf_asInteger(xp.slot("Dim")),
>                        Rcpp::IntegerVector(xp.slot("Dim"))[1]),
>              modelMatrix(xp), d_xp(xp) {}
>    };
> where MMatrixXd is a typedef for Eigen::Map<Eigen::MatrixXd>
> These classes are exposed to R through the Rcpp Modules mechanism.

More information about the Rcpp-devel mailing list