[Rcpp-devel] [Rd] must .Call C functions return SEXP?
bates at stat.wisc.edu
Fri Oct 29 00:04:20 CEST 2010
On Thu, Oct 28, 2010 at 1:44 PM, Dominick Samperi <djsamperi at gmail.com> wrote:
> See comments on Rcpp below.
> On Thu, Oct 28, 2010 at 11:28 AM, William Dunlap <wdunlap at tibco.com> wrote:
>> > -----Original Message-----
>> > From: r-devel-bounces at r-project.org
>> > [mailto:r-devel-bounces at r-project.org] On Behalf Of Andrew Piskorski
>> > Sent: Thursday, October 28, 2010 6:48 AM
>> > To: Simon Urbanek
>> > Cc: r-devel at r-project.org
>> > Subject: Re: [Rd] must .Call C functions return SEXP?
>> > On Thu, Oct 28, 2010 at 12:15:56AM -0400, Simon Urbanek wrote:
>> > > > Reason I ask, is I've written some R code which allocates two long
>> > > > lists, and then calls a C function with .Call. My C code
>> > writes to
>> > > > those two pre-allocated lists,
>> > > That's bad! All arguments are essentially read-only so you should
>> > > never write into them!
>> > I don't see how. (So, what am I missing?) The R docs themselves
>> > state that the main point of using .Call rather than .C is that .Call
>> > does not do any extra copying and gives one direct access to the R
>> > objects. (This is indeed very useful, e.g. to reorder a large matrix
>> > in seconds rather than hours.)
>> > I could allocate the two lists in my C code, but so far it was more
>> > convenient to so in R. What possible difference in behavior can there
>> > be between the two approaches?
>> Here is an example of how you break the rule that R-language functions
>> do not change their arguments if you use .Call in the way that you
>> describe. The C code is in alter_argument.c:
>> #include <R.h>
>> #include <Rinternals.h>
>> SEXP alter_argument(SEXP arg)
>> SEXP dim ;
>> PROTECT(dim = allocVector(INTSXP, 2));
>> INTEGER(dim) = 1 ;
>> INTEGER(dim) = LENGTH(arg) ;
>> setAttrib(arg, R_DimSymbol, dim);
>> UNPROTECT(1) ;
>> return dim ;
>> Make a shared library out of this. E.g., on Linux do
>> R CMD SHLIB -o Ralter_argument.so alter_argument.so
>> and load it into R with
>> (Or, on any platform, put it into a package along with
>> the following R code and build it.)
>> The associated R code is
>> myDim <- function(v).Call("alter_argument", v)
>> f <- function(z) myDim(z)
>> Now try using it:
>> > myData <- 6:10
>> > myData
>>  6 7 8 9 10
>> > f(myData)
>>  5
>> > myData
>> [,1] [,2] [,3] [,4] [,5]
>> [1,] 6 7 8 9 10
>> The argument to f was changed! This should never happen in R.
>> If you are very careful you might be able ensure that
>> no part of the argument to be altered can come from
>> outside the function calling .Call(). It can be tricky
>> to ensure that, especially when the argument is more complicated
>> than an atomic vector.
>> "If you live outside the law you must be honest" - Bob Dylan.
> This thread seems to suggest (following Bob Dylan) that one needs
> to be very careful when using C/C++ to modify R's memory
> directly, because you may modify other R variables that point
> to the same memory (due to R's copy-by-value semantics and
> What are the implications for the Rcpp package where R
> objects are exposed to the C++ side in precisely this way,
> permitting unrestricted modifications? (In the original
> or "classic" version of this package direct writes to R's
> memory were done only for performance reasons.)
> Seems like extra precautions need to be taken to
> avoid the aliasing problem.
The current Rcpp facilities has the same benefits and dangers as the C
macros used in .Call. You get access to the memory of the R object
passed as an argument, saving a copy step. You shouldn't modify that
memory. If you do, bad things can happen and they will be your fault.
If you want to get a read-write copy you clone the argument (in Rcpp
To Bill: I seem to remember the Dylan quote as "To live outside the
law you must be honest."
>> In R, .Call() does not copy its arguments but the C code
>> writer is expected to do so if they will be altered.
>> In S+ (and S), .Call() copies the arguments if altering
>> them would make a user-visible change in the environment,
>> unless you specify that the C code will not be altering them.
>> Bill Dunlap
>> Spotfire, TIBCO Software
>> wdunlap tibco.com
>> > > R has pass-by-value(!) semantics, so semantically you code has
>> > > nothing to do with the result.1 and result.2 variables since only
>> > > their *values* are guaranteed to be passed (possibly a copy).
>> > Clearly C code called from .Call must be allowed to construct R
>> > objects, as that's how much of R itself is implemented, and further
>> > down, it's what you recommend I should do instead.
>> > But why does it follow that C code must never modify an object
>> > initially allocated by R code? Are you saying there is some special
>> > magic difference in the state of an object allocated by R's C code
>> > vs. one allocated by R code? If so, what is it?
>> > What is the potential problem here, that the garbage collector will
>> > suddenly run while my C code is in the middle of writing to an R list?
>> > Yes, if the gc is going to move the object elsewhere, that would be
>> > very bad. But it looks to me like that cannot happen, because lots of
>> > the R implementation itself would fail badly if it did.
>> > E.g.: The PROTECT call is used to increment reference counts, but I
>> > see no guarantees that it is atomic with the operations that allocate
>> > objects. I see no mutexes or other barriers in C code to prevent the
>> > gc from running, thus implying that it *can't* run until the C
>> > function completes.
>> > And R is single threaded, of course. But what about signal handlers,
>> > could they ever invoke R's gc?
>> > Also, I was initially surprised not to find any matrix C APIs, but
>> > grepping for examples (sorry, I don't remember exactly which
>> > functions) showed me that the apparently accepted way to do matrix
>> > operations from C is to simply assume R's column-first dense matrix
>> > order, and access the 2D matrix as a flat 1D vector. (Which is easy.)
>> > > The fact that internally R attempts to avoid copying for performance
>> > > reasons is the only reason why your code may have appeared to work,
>> > > but it's invalid!
>> > I will probably change my code to allocate a new list from the C code
>> > and return that, as you recommend. My main reason for doing the
>> > allocation in R was just that it was simpler, especially given the
>> > very limited documentation of R's C API.
>> > But, I didn't see anything in the "Writing R Extensions" doc saying
>> > that what my code is doing is "invalid", and more importantly, I don't
>> > see why it would or should be invalid...
>> > I'd still like to better understand why you think doing the initial
>> > allocation of an object in R rather than C code is such a problem. So
>> > far, I don't see any way that the R interpreter could ever tell the
>> > difference.
>> > Wait, or is the only objection here that I'm using C in a way that
>> > makes pass-by-reference semantics visible to my R code? Which will
>> > work completely correctly, but is not the The Proper R Way?
>> > I don't actually need pass-by-reference behavior here at all, but I
>> > can imagine cases where I might want it, so I'd like to understand
>> > your objections better. Is using C to implement pass-by-reference
>> > actually Broken, or merely Ugly? From my reasons above, I think it
>> > will always work correctly and thus is not Broken. But of course
>> > given R's devotion to pass-by-value, it could be considered
>> > unacceptably Ugly.
>> > --
>> > Andrew Piskorski <atp at piskorski.com>
>> > http://www.piskorski.com/
>> > ______________________________________________
>> > R-devel at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-devel
>> R-devel at r-project.org mailing list
> Rcpp-devel mailing list
> Rcpp-devel at lists.r-forge.r-project.org
More information about the Rcpp-devel