[Rcpp-devel] [Rd] must .Call C functions return SEXP?

Thu Oct 28 20:44:12 CEST 2010

See comments on Rcpp below.

On Thu, Oct 28, 2010 at 11:28 AM, William Dunlap <wdunlap at tibco.com> wrote:

> > -----Original Message-----
> > From: r-devel-bounces at r-project.org
> > [mailto:r-devel-bounces at r-project.org] On Behalf Of Andrew Piskorski
> > Sent: Thursday, October 28, 2010 6:48 AM
> > To: Simon Urbanek
> > Cc: r-devel at r-project.org
> > Subject: Re: [Rd] must .Call C functions return SEXP?
> >
> > On Thu, Oct 28, 2010 at 12:15:56AM -0400, Simon Urbanek wrote:
> >
> > > > Reason I ask, is I've written some R code which allocates two long
> > > > lists, and then calls a C function with .Call.  My C code
> > writes to
> > > > those two pre-allocated lists,
> >
> > > That's bad! All arguments are essentially read-only so you should
> > > never write into them!
> >
> > I don't see how.  (So, what am I missing?)  The R docs themselves
> > state that the main point of using .Call rather than .C is that .Call
> > does not do any extra copying and gives one direct access to the R
> > objects.  (This is indeed very useful, e.g. to reorder a large matrix
> > in seconds rather than hours.)
> >
> > I could allocate the two lists in my C code, but so far it was more
> > convenient to so in R.  What possible difference in behavior can there
> > be between the two approaches?
>
> Here is an example of how you break the rule that R-language functions
> do not change their arguments if you use .Call in the way that you
> describe.  The C code is in alter_argument.c:
>
> #include <R.h>
> #include <Rinternals.h>
>
> SEXP alter_argument(SEXP arg)
> {
>    SEXP dim ;
>    PROTECT(dim = allocVector(INTSXP, 2));
>    INTEGER(dim)[0] = 1 ;
>    INTEGER(dim)[1] = LENGTH(arg) ;
>    setAttrib(arg, R_DimSymbol, dim);
>    UNPROTECT(1) ;
>    return dim ;
> }
>
> Make a shared library out of this.  E.g., on Linux do
>    R CMD SHLIB -o Ralter_argument.so alter_argument.so
> and load it into R with
>    dyn.open("./Ralter_argument.so")
> (Or, on any platform, put it into a package along with
> the following R code and build it.)
>
> The associated R code is
>     myDim <- function(v).Call("alter_argument", v)
>     f <- function(z) myDim(z)[2]
> Now try using it:
>     > myData <- 6:10
>     > myData
>     [1]  6  7  8  9 10
>     > f(myData)
>     [1] 5
>     > myData
>          [,1] [,2] [,3] [,4] [,5]
>     [1,]    6    7    8    9   10
> The argument to f was changed!  This should never happen in R.
>
> If you are very careful you might be able ensure that
> no part of the argument to be altered can come from
> outside the function calling .Call().  It can be tricky
> to ensure that, especially when the argument is more complicated
> than an atomic vector.
>
> "If you live outside the law you must be honest" - Bob Dylan.
>

This thread seems to suggest (following Bob Dylan) that one needs
to be very careful when using C/C++ to modify R's memory
directly, because you may modify other R variables that point
to the same memory (due to R's copy-by-value semantics and
optimizations).

What are the implications for the Rcpp package where R
objects are exposed to the C++ side in precisely this way,
permitting unrestricted modifications? (In the original
or "classic" version of this package direct writes to R's
memory were done only for performance reasons.)

Seems like extra precautions need to be taken to
avoid the aliasing problem.

Dominick

In R, .Call() does not copy its arguments but the C code
> writer is expected to do so if they will be altered.
> In S+ (and S), .Call() copies the arguments if altering
> them would make a user-visible change in the environment,
> unless you specify that the C code will not be altering them.
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
> > > R has pass-by-value(!) semantics, so semantically you code has
> > > nothing to do with the result.1 and result.2 variables since only
> > > their *values* are guaranteed to be passed (possibly a copy).
> >
> > Clearly C code called from .Call must be allowed to construct R
> > objects, as that's how much of R itself is implemented, and further
> > down, it's what you recommend I should do instead.
> >
> > But why does it follow that C code must never modify an object
> > initially allocated by R code?  Are you saying there is some special
> > magic difference in the state of an object allocated by R's C code
> > vs. one allocated by R code?  If so, what is it?
> >
> > What is the potential problem here, that the garbage collector will
> > suddenly run while my C code is in the middle of writing to an R list?
> > Yes, if the gc is going to move the object elsewhere, that would be
> > very bad.  But it looks to me like that cannot happen, because lots of
> > the R implementation itself would fail badly if it did.
> >
> > E.g.:  The PROTECT call is used to increment reference counts, but I
> > see no guarantees that it is atomic with the operations that allocate
> > objects.  I see no mutexes or other barriers in C code to prevent the
> > gc from running, thus implying that it *can't* run until the C
> > function completes.
> >
> > And R is single threaded, of course.  But what about signal handlers,
> > could they ever invoke R's gc?
> >
> > Also, I was initially surprised not to find any matrix C APIs, but
> > grepping for examples (sorry, I don't remember exactly which
> > functions) showed me that the apparently accepted way to do matrix
> > operations from C is to simply assume R's column-first dense matrix
> > order, and access the 2D matrix as a flat 1D vector.  (Which is easy.)
> >
> > > The fact that internally R attempts to avoid copying for performance
> > > reasons is the only reason why your code may have appeared to work,
> > > but it's invalid!
> >
> > I will probably change my code to allocate a new list from the C code
> > and return that, as you recommend.  My main reason for doing the
> > allocation in R was just that it was simpler, especially given the
> > very limited documentation of R's C API.
> >
> > But, I didn't see anything in the "Writing R Extensions" doc saying
> > that what my code is doing is "invalid", and more importantly, I don't
> > see why it would or should be invalid...
> >
> > I'd still like to better understand why you think doing the initial
> > allocation of an object in R rather than C code is such a problem.  So
> > far, I don't see any way that the R interpreter could ever tell the
> > difference.
> >
> > Wait, or is the only objection here that I'm using C in a way that
> > makes pass-by-reference semantics visible to my R code?  Which will
> > work completely correctly, but is not the The Proper R Way?
> >
> > I don't actually need pass-by-reference behavior here at all, but I
> > can imagine cases where I might want it, so I'd like to understand
> > your objections better.  Is using C to implement pass-by-reference
> > actually Broken, or merely Ugly?  From my reasons above, I think it
> > will always work correctly and thus is not Broken.  But of course
> > given R's devotion to pass-by-value, it could be considered
> > unacceptably Ugly.
> >
> > --
> > Andrew Piskorski <atp at piskorski.com>
> > http://www.piskorski.com/
> >
> > ______________________________________________
> > R-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20101028/019d3180/attachment-0001.htm>