[Rcpp-devel] [Rd] must .Call C functions return SEXP?

Douglas Bates bates at stat.wisc.edu
Fri Oct 29 00:04:20 CEST 2010


On Thu, Oct 28, 2010 at 1:44 PM, Dominick Samperi <djsamperi at gmail.com> wrote:
> See comments on Rcpp below.
>
> On Thu, Oct 28, 2010 at 11:28 AM, William Dunlap <wdunlap at tibco.com> wrote:
>>
>> > -----Original Message-----
>> > From: r-devel-bounces at r-project.org
>> > [mailto:r-devel-bounces at r-project.org] On Behalf Of Andrew Piskorski
>> > Sent: Thursday, October 28, 2010 6:48 AM
>> > To: Simon Urbanek
>> > Cc: r-devel at r-project.org
>> > Subject: Re: [Rd] must .Call C functions return SEXP?
>> >
>> > On Thu, Oct 28, 2010 at 12:15:56AM -0400, Simon Urbanek wrote:
>> >
>> > > > Reason I ask, is I've written some R code which allocates two long
>> > > > lists, and then calls a C function with .Call.  My C code
>> > writes to
>> > > > those two pre-allocated lists,
>> >
>> > > That's bad! All arguments are essentially read-only so you should
>> > > never write into them!
>> >
>> > I don't see how.  (So, what am I missing?)  The R docs themselves
>> > state that the main point of using .Call rather than .C is that .Call
>> > does not do any extra copying and gives one direct access to the R
>> > objects.  (This is indeed very useful, e.g. to reorder a large matrix
>> > in seconds rather than hours.)
>> >
>> > I could allocate the two lists in my C code, but so far it was more
>> > convenient to so in R.  What possible difference in behavior can there
>> > be between the two approaches?
>>
>> Here is an example of how you break the rule that R-language functions
>> do not change their arguments if you use .Call in the way that you
>> describe.  The C code is in alter_argument.c:
>>
>> #include <R.h>
>> #include <Rinternals.h>
>>
>> SEXP alter_argument(SEXP arg)
>> {
>>    SEXP dim ;
>>    PROTECT(dim = allocVector(INTSXP, 2));
>>    INTEGER(dim)[0] = 1 ;
>>    INTEGER(dim)[1] = LENGTH(arg) ;
>>    setAttrib(arg, R_DimSymbol, dim);
>>    UNPROTECT(1) ;
>>    return dim ;
>> }
>>
>> Make a shared library out of this.  E.g., on Linux do
>>    R CMD SHLIB -o Ralter_argument.so alter_argument.so
>> and load it into R with
>>    dyn.open("./Ralter_argument.so")
>> (Or, on any platform, put it into a package along with
>> the following R code and build it.)
>>
>> The associated R code is
>>     myDim <- function(v).Call("alter_argument", v)
>>     f <- function(z) myDim(z)[2]
>> Now try using it:
>>     > myData <- 6:10
>>     > myData
>>     [1]  6  7  8  9 10
>>     > f(myData)
>>     [1] 5
>>     > myData
>>          [,1] [,2] [,3] [,4] [,5]
>>     [1,]    6    7    8    9   10
>> The argument to f was changed!  This should never happen in R.
>>
>> If you are very careful you might be able ensure that
>> no part of the argument to be altered can come from
>> outside the function calling .Call().  It can be tricky
>> to ensure that, especially when the argument is more complicated
>> than an atomic vector.
>>
>> "If you live outside the law you must be honest" - Bob Dylan.
>
> This thread seems to suggest (following Bob Dylan) that one needs
> to be very careful when using C/C++ to modify R's memory
> directly, because you may modify other R variables that point
> to the same memory (due to R's copy-by-value semantics and
> optimizations).
>
> What are the implications for the Rcpp package where R
> objects are exposed to the C++ side in precisely this way,
> permitting unrestricted modifications? (In the original
> or "classic" version of this package direct writes to R's
> memory were done only for performance reasons.)
>
> Seems like extra precautions need to be taken to
> avoid the aliasing problem.

The current Rcpp facilities has the same benefits and dangers as the C
macros used in .Call.  You get access to the memory of the R object
passed as an argument, saving a copy step.  You shouldn't modify that
memory.  If you do, bad things can happen and they will be your fault.
 If you want to get a read-write copy you clone the argument (in Rcpp
terminology).

To Bill:  I seem to remember the Dylan quote as "To live outside the
law you must be honest."


>
> Dominick
>
>> In R, .Call() does not copy its arguments but the C code
>> writer is expected to do so if they will be altered.
>> In S+ (and S), .Call() copies the arguments if altering
>> them would make a user-visible change in the environment,
>> unless you specify that the C code will not be altering them.
>>
>> Bill Dunlap
>> Spotfire, TIBCO Software
>> wdunlap tibco.com
>>
>> > > R has pass-by-value(!) semantics, so semantically you code has
>> > > nothing to do with the result.1 and result.2 variables since only
>> > > their *values* are guaranteed to be passed (possibly a copy).
>> >
>> > Clearly C code called from .Call must be allowed to construct R
>> > objects, as that's how much of R itself is implemented, and further
>> > down, it's what you recommend I should do instead.
>> >
>> > But why does it follow that C code must never modify an object
>> > initially allocated by R code?  Are you saying there is some special
>> > magic difference in the state of an object allocated by R's C code
>> > vs. one allocated by R code?  If so, what is it?
>> >
>> > What is the potential problem here, that the garbage collector will
>> > suddenly run while my C code is in the middle of writing to an R list?
>> > Yes, if the gc is going to move the object elsewhere, that would be
>> > very bad.  But it looks to me like that cannot happen, because lots of
>> > the R implementation itself would fail badly if it did.
>> >
>> > E.g.:  The PROTECT call is used to increment reference counts, but I
>> > see no guarantees that it is atomic with the operations that allocate
>> > objects.  I see no mutexes or other barriers in C code to prevent the
>> > gc from running, thus implying that it *can't* run until the C
>> > function completes.
>> >
>> > And R is single threaded, of course.  But what about signal handlers,
>> > could they ever invoke R's gc?
>> >
>> > Also, I was initially surprised not to find any matrix C APIs, but
>> > grepping for examples (sorry, I don't remember exactly which
>> > functions) showed me that the apparently accepted way to do matrix
>> > operations from C is to simply assume R's column-first dense matrix
>> > order, and access the 2D matrix as a flat 1D vector.  (Which is easy.)
>> >
>> > > The fact that internally R attempts to avoid copying for performance
>> > > reasons is the only reason why your code may have appeared to work,
>> > > but it's invalid!
>> >
>> > I will probably change my code to allocate a new list from the C code
>> > and return that, as you recommend.  My main reason for doing the
>> > allocation in R was just that it was simpler, especially given the
>> > very limited documentation of R's C API.
>> >
>> > But, I didn't see anything in the "Writing R Extensions" doc saying
>> > that what my code is doing is "invalid", and more importantly, I don't
>> > see why it would or should be invalid...
>> >
>> > I'd still like to better understand why you think doing the initial
>> > allocation of an object in R rather than C code is such a problem.  So
>> > far, I don't see any way that the R interpreter could ever tell the
>> > difference.
>> >
>> > Wait, or is the only objection here that I'm using C in a way that
>> > makes pass-by-reference semantics visible to my R code?  Which will
>> > work completely correctly, but is not the The Proper R Way?
>> >
>> > I don't actually need pass-by-reference behavior here at all, but I
>> > can imagine cases where I might want it, so I'd like to understand
>> > your objections better.  Is using C to implement pass-by-reference
>> > actually Broken, or merely Ugly?  From my reasons above, I think it
>> > will always work correctly and thus is not Broken.  But of course
>> > given R's devotion to pass-by-value, it could be considered
>> > unacceptably Ugly.
>> >
>> > --
>> > Andrew Piskorski <atp at piskorski.com>
>> > http://www.piskorski.com/
>> >
>> > ______________________________________________
>> > R-devel at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-devel
>> >
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
> _______________________________________________
> Rcpp-devel mailing list
> Rcpp-devel at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
>
>


More information about the Rcpp-devel mailing list