[Rcpp-devel] Idiom for preserving an Rcpp object while passing back an external pointer?

Dominick Samperi djsamperi at gmail.com
Sat Feb 5 18:52:33 CET 2011


On Sat, Feb 5, 2011 at 9:52 AM, Douglas Bates <bates at stat.wisc.edu> wrote:
> As Dirk and Romain know I have been struggling to debug a memory
> protection problem that I encounter in code based on Rcpp Modules.  As
> with all memory protection problems, it is very difficult to track
> down and I have kind-of run out of options right now.
>
> I plan, for the time being, to use code that acts like the
> module-based code without going through the modules.  I plan to create
> the object in C++ and pass back an external pointer that will be used
> to locate the object for later method calls.  Perhaps it is lack of
> imagination but I haven't thought of a way to construct an object and
> make it persistent until I decide to release the pointer.  The best
> way I have been able to derive is to put the C++ object on a global
> stack but that approach screams "error-prone".
>
> My question may be somewhat vaguely stated.  What I am trying to avoid
> is creating an object in C++ and returning an Rcpp::XPtr then, on
> return to R, having the C++ object go out of scope so the external
> pointer's target is gone.  Somehow I need to hold on to the C++ object
> until the XPtr object is garbage collected.
>
> Suggestions welcome.

It might be helpful to sketch what I think are the basic assumptions
behind xptr-based implementations of C++ persistence in the R
environment. Input from experts on R internals and garbage
collection algorithms might be helpful here.

I assume that R uses C's malloc to grab chunks of memory as
needed and manages this memory using garbage collection
algorithms.

On the other hand, C++ uses new to allocate objects
in the heap, and a pointer to this memory is typically assigned to
an R external pointer so that R has a handle to it. R can pass this
handle as one of the parameters in a function call to C++ so that
the called code can manipulate the C++ object pointed to.

This quickly gets tedious, and an important contribution of
Rcpp modules is to automate some of the steps by using the
C++ compiler (and templates) to capture type information that
would otherwise have to be specified by hand.

As you mention, all stack-based objects created during the C++
function call are destroyed when it returns to R. On the other hand,
heap-based objects are not (a common source of memory leaks).
In this case this is not a memory leak: we have simply passed
the responsibility for managing the lifetime of objects pointed to
by external pointers to the R side.

An Rcpp application has the following dependencies:

R's gc'ed memory <-- Rcpp --> C++'s heap memory (and transient stack memory)

A fundamental assumption is that R's managed memory will not crash
into C++'s heap memory. Since R ultimately gets its memory from the OS
using C/malloc, and C++ gets its heap memory from the OS using new,
the OS should take care of keeping these memory areas separate. Thus
the right-pointing arrow above is unlikely to lead to problems.

On the other hand, the left-pointing arrow arises because C++ applications
can have many pointers to R's managed memory, all of which must be
protected from the garbage collection process. In particular,objects
allocated on the C++ heap that are not necessarily part of any running
C++ code may have such pointers.

Here are some basic assumptions underlying this strategy:

1. R's memory management does not use any non-standard features that
    would break the compatibility between C/malloc and C++/new.
2. Mixed use of PROTECT/UNPROTECT and R_PreseveObject/R_ReleaseObject
    is safe. (The latter pair are currently undocumented.)
3. Once a pointer to R's memory is protected this memory is not moved by
    the garbage collector while a C++ function holds a pointer to this memory.

It is interesting to observe that the Windows .Net framework includes
the kind of C++ interface that we are talking about here. In particular,
C++ code can point to objects in the CLR (Common Language
Runtime---think Java, or R) through what are called tracking pointers.
These pointers are automatically updated when the object pointed to is
moved by the garbage collector, addressing a problem like the
one described in item 3 above.

If I recall correctly Brian Ripley has warned that R's memory management
may not play well with the C++ memory allocator (or even with C/malloc).
It might be helpful to know what precisely were his concerns.

Dominick

> _______________________________________________
> Rcpp-devel mailing list
> Rcpp-devel at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
>


More information about the Rcpp-devel mailing list