[Rcpp-devel] Idiom for preserving an Rcpp object while passing back an external pointer?
djsamperi at gmail.com
Sat Feb 5 18:52:33 CET 2011
On Sat, Feb 5, 2011 at 9:52 AM, Douglas Bates <bates at stat.wisc.edu> wrote:
> As Dirk and Romain know I have been struggling to debug a memory
> protection problem that I encounter in code based on Rcpp Modules. As
> with all memory protection problems, it is very difficult to track
> down and I have kind-of run out of options right now.
> I plan, for the time being, to use code that acts like the
> module-based code without going through the modules. I plan to create
> the object in C++ and pass back an external pointer that will be used
> to locate the object for later method calls. Perhaps it is lack of
> imagination but I haven't thought of a way to construct an object and
> make it persistent until I decide to release the pointer. The best
> way I have been able to derive is to put the C++ object on a global
> stack but that approach screams "error-prone".
> My question may be somewhat vaguely stated. What I am trying to avoid
> is creating an object in C++ and returning an Rcpp::XPtr then, on
> return to R, having the C++ object go out of scope so the external
> pointer's target is gone. Somehow I need to hold on to the C++ object
> until the XPtr object is garbage collected.
> Suggestions welcome.
It might be helpful to sketch what I think are the basic assumptions
behind xptr-based implementations of C++ persistence in the R
environment. Input from experts on R internals and garbage
collection algorithms might be helpful here.
I assume that R uses C's malloc to grab chunks of memory as
needed and manages this memory using garbage collection
On the other hand, C++ uses new to allocate objects
in the heap, and a pointer to this memory is typically assigned to
an R external pointer so that R has a handle to it. R can pass this
handle as one of the parameters in a function call to C++ so that
the called code can manipulate the C++ object pointed to.
This quickly gets tedious, and an important contribution of
Rcpp modules is to automate some of the steps by using the
C++ compiler (and templates) to capture type information that
would otherwise have to be specified by hand.
As you mention, all stack-based objects created during the C++
function call are destroyed when it returns to R. On the other hand,
heap-based objects are not (a common source of memory leaks).
In this case this is not a memory leak: we have simply passed
the responsibility for managing the lifetime of objects pointed to
by external pointers to the R side.
An Rcpp application has the following dependencies:
R's gc'ed memory <-- Rcpp --> C++'s heap memory (and transient stack memory)
A fundamental assumption is that R's managed memory will not crash
into C++'s heap memory. Since R ultimately gets its memory from the OS
using C/malloc, and C++ gets its heap memory from the OS using new,
the OS should take care of keeping these memory areas separate. Thus
the right-pointing arrow above is unlikely to lead to problems.
On the other hand, the left-pointing arrow arises because C++ applications
can have many pointers to R's managed memory, all of which must be
protected from the garbage collection process. In particular,objects
allocated on the C++ heap that are not necessarily part of any running
C++ code may have such pointers.
Here are some basic assumptions underlying this strategy:
1. R's memory management does not use any non-standard features that
would break the compatibility between C/malloc and C++/new.
2. Mixed use of PROTECT/UNPROTECT and R_PreseveObject/R_ReleaseObject
is safe. (The latter pair are currently undocumented.)
3. Once a pointer to R's memory is protected this memory is not moved by
the garbage collector while a C++ function holds a pointer to this memory.
It is interesting to observe that the Windows .Net framework includes
the kind of C++ interface that we are talking about here. In particular,
C++ code can point to objects in the CLR (Common Language
Runtime---think Java, or R) through what are called tracking pointers.
These pointers are automatically updated when the object pointed to is
moved by the garbage collector, addressing a problem like the
one described in item 3 above.
If I recall correctly Brian Ripley has warned that R's memory management
may not play well with the C++ memory allocator (or even with C/malloc).
It might be helpful to know what precisely were his concerns.
> Rcpp-devel mailing list
> Rcpp-devel at lists.r-forge.r-project.org
More information about the Rcpp-devel