[Rcpp-devel] Possible unprotected memory problems

Douglas Bates bates at stat.wisc.edu
Thu Jul 21 20:39:39 CEST 2011


I actually meant to send my message to another list (rcpp-core), which
is why it refers to a package, lme4Eigen, that I have not discussed on
this list.

Notheless, it is good to discuss some of these issues here for clarification.

On Thu, Jul 21, 2011 at 1:15 PM, Steve Lianoglou
<mailinglist.honeypot at gmail.com> wrote:
> Hi,
>
> On Thu, Jul 21, 2011 at 2:04 PM, Douglas Bates <bates at stat.wisc.edu> wrote:
>> On Thu, Jul 21, 2011 at 12:52 PM, Douglas Bates <bates at stat.wisc.edu> wrote:
>>> In testing this new version of lme4 based on RcppEigen, I have
>>> encountered what looks to be a memory protection problem.  Naturally
>>> it only occurs on the large examples but I may be able to provoke it
>>> with gctorture even for a small example.
>>>
>>> I'm using Rcpp Modules to expose some classes in C++.  The symptom is
>>> that suddenly, probably as the result of a garbage collection, the
>>> value of one of the methods jumps.  In this case the residuals
>>> suddenly appear to be zero.
>>>
>>> I'll start trying to track this but may end up running out of time as
>>> I have some presentations for the next few weeks to prepare.
>>
>> This is probably my fault.  I was being careful to avoid copying some
>> large structures.  I used the Eigen::Map capability to share R's
>> storage but, unlike a case where the object in the C structure would
>> be a NumericVector or NumericMatrix, the contents of the R object were
>> not being protected because it would go out of scope.  I'll need to
>> rewrite that part of the logic.  Sorry for the false alarm.
>
> I'm glad you brought this up!
>
> I've been meaning to ask if there is a way to do this successfully, or
> is it impossible?
>
> Where "it" is the ability to not copy the contents of (a potentially
> large) numeric vector that we pass into a C/++ function, but rather
> just "pass the pointer/data off" to the C side of the equation, and
> let that worry about GC'ing the data when appropriate.

That's actually the way that Rcpp objects are formed.  If you create,
say, an Rcpp::NumericVector object from an SEXP it doesn't copy the
contents of the vector.  It accesses the storage allocated by R.  In
fact, all Rcpp objects use storage controlled by R; it is just whether
this storage is PROTECT'ed or not.  The constructor for an Rcpp object
PROTECTS the storage and the destructor UNPROTECTS it.

That was what I missed.  I went straight to an Eigen::Map object that
uses the storage allocated by R in the constructor or a class.  Once
the constructor returned, the storage controlled by R could be, and
was, garbage collected.

> In theory, I guess it would be like having an unbalanced
> PROTECT/UNPROTECT going on.

As I said, the PROTECT/UNPROTECT operations are done in the
constructor and destructor for an instance of an Rcpp object  I'm not
exactly sure how Dirk and Romain get around the problem of balancing
PROTECT and UNPROTECT in a single .Call but it is done somehow.

> The "hand off" of the data/pointer to a C library would be like
> calling PROTECT. After your C function returns control back to R, it
> would still claim ownership/usage of the data. Things would hum along
> "as usual", but the data in that part of memory wouldn't be GC'd by R
> until your C library decides to call its UNPROTECT on that some point
> later, at which point the normal R GC functionality would happen when
> it happens.

> Is that even possible?

Yes, possible and, in fact, done.

This is why, if you are going to treat an argument to an Rcpp-based
.Call function as read-write you should clone it rather than working
with it directly.  Otherwise you will violate R's functional
programming semantics that state that you cannot change the value of
an argument as it appears in the calling environment.


More information about the Rcpp-devel mailing list