See comments on Rcpp below.<br><br><div class="gmail_quote">On Thu, Oct 28, 2010 at 11:28 AM, William Dunlap <span dir="ltr"><<a href="mailto:wdunlap@tibco.com">wdunlap@tibco.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
<div class="im">> -----Original Message-----<br>
> From: <a href="mailto:r-devel-bounces@r-project.org">r-devel-bounces@r-project.org</a><br>
> [mailto:<a href="mailto:r-devel-bounces@r-project.org">r-devel-bounces@r-project.org</a>] On Behalf Of Andrew Piskorski<br>
> Sent: Thursday, October 28, 2010 6:48 AM<br>
> To: Simon Urbanek<br>
> Cc: <a href="mailto:r-devel@r-project.org">r-devel@r-project.org</a><br>
> Subject: Re: [Rd] must .Call C functions return SEXP?<br>
><br>
> On Thu, Oct 28, 2010 at 12:15:56AM -0400, Simon Urbanek wrote:<br>
><br>
> > > Reason I ask, is I've written some R code which allocates two long<br>
> > > lists, and then calls a C function with .Call. My C code<br>
> writes to<br>
> > > those two pre-allocated lists,<br>
><br>
> > That's bad! All arguments are essentially read-only so you should<br>
> > never write into them!<br>
><br>
> I don't see how. (So, what am I missing?) The R docs themselves<br>
> state that the main point of using .Call rather than .C is that .Call<br>
> does not do any extra copying and gives one direct access to the R<br>
> objects. (This is indeed very useful, e.g. to reorder a large matrix<br>
> in seconds rather than hours.)<br>
><br>
> I could allocate the two lists in my C code, but so far it was more<br>
> convenient to so in R. What possible difference in behavior can there<br>
> be between the two approaches?<br>
<br>
</div>Here is an example of how you break the rule that R-language functions<br>
do not change their arguments if you use .Call in the way that you<br>
describe. The C code is in alter_argument.c:<br>
<br>
#include <R.h><br>
#include <Rinternals.h><br>
<br>
SEXP alter_argument(SEXP arg)<br>
{<br>
SEXP dim ;<br>
PROTECT(dim = allocVector(INTSXP, 2));<br>
INTEGER(dim)[0] = 1 ;<br>
INTEGER(dim)[1] = LENGTH(arg) ;<br>
setAttrib(arg, R_DimSymbol, dim);<br>
UNPROTECT(1) ;<br>
return dim ;<br>
}<br>
<br>
Make a shared library out of this. E.g., on Linux do<br>
R CMD SHLIB -o Ralter_argument.so alter_argument.so<br>
and load it into R with<br>
dyn.open("./Ralter_argument.so")<br>
(Or, on any platform, put it into a package along with<br>
the following R code and build it.)<br>
<br>
The associated R code is<br>
myDim <- function(v).Call("alter_argument", v)<br>
f <- function(z) myDim(z)[2]<br>
Now try using it:<br>
> myData <- 6:10<br>
> myData<br>
[1] 6 7 8 9 10<br>
> f(myData)<br>
[1] 5<br>
> myData<br>
[,1] [,2] [,3] [,4] [,5]<br>
[1,] 6 7 8 9 10<br>
The argument to f was changed! This should never happen in R.<br>
<br>
If you are very careful you might be able ensure that<br>
no part of the argument to be altered can come from<br>
outside the function calling .Call(). It can be tricky<br>
to ensure that, especially when the argument is more complicated<br>
than an atomic vector.<br>
<br>
"If you live outside the law you must be honest" - Bob Dylan.<br></blockquote><div><br>This thread seems to suggest (following Bob Dylan) that one needs<br>to be very careful when using C/C++ to modify R's memory<br>
directly, because you may modify other R variables that point<br>to the same memory (due to R's copy-by-value semantics and<br>optimizations).<br><br>What are the implications for the Rcpp package where R<br>objects are exposed to the C++ side in precisely this way,<br>
permitting unrestricted modifications? (In the original<br>or "classic" version of this package direct writes to R's<br>memory were done only for performance reasons.)<br><br>Seems like extra precautions need to be taken to<br>
avoid the aliasing problem.<br><br>Dominick<br><br></div><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
In R, .Call() does not copy its arguments but the C code<br>
writer is expected to do so if they will be altered.<br>
In S+ (and S), .Call() copies the arguments if altering<br>
them would make a user-visible change in the environment,<br>
unless you specify that the C code will not be altering them.<br>
<br>
Bill Dunlap<br>
Spotfire, TIBCO Software<br>
wdunlap <a href="http://tibco.com" target="_blank">tibco.com</a><br>
<div><div></div><div class="h5"><br>
> > R has pass-by-value(!) semantics, so semantically you code has<br>
> > nothing to do with the result.1 and result.2 variables since only<br>
> > their *values* are guaranteed to be passed (possibly a copy).<br>
><br>
> Clearly C code called from .Call must be allowed to construct R<br>
> objects, as that's how much of R itself is implemented, and further<br>
> down, it's what you recommend I should do instead.<br>
><br>
> But why does it follow that C code must never modify an object<br>
> initially allocated by R code? Are you saying there is some special<br>
> magic difference in the state of an object allocated by R's C code<br>
> vs. one allocated by R code? If so, what is it?<br>
><br>
> What is the potential problem here, that the garbage collector will<br>
> suddenly run while my C code is in the middle of writing to an R list?<br>
> Yes, if the gc is going to move the object elsewhere, that would be<br>
> very bad. But it looks to me like that cannot happen, because lots of<br>
> the R implementation itself would fail badly if it did.<br>
><br>
> E.g.: The PROTECT call is used to increment reference counts, but I<br>
> see no guarantees that it is atomic with the operations that allocate<br>
> objects. I see no mutexes or other barriers in C code to prevent the<br>
> gc from running, thus implying that it *can't* run until the C<br>
> function completes.<br>
><br>
> And R is single threaded, of course. But what about signal handlers,<br>
> could they ever invoke R's gc?<br>
><br>
> Also, I was initially surprised not to find any matrix C APIs, but<br>
> grepping for examples (sorry, I don't remember exactly which<br>
> functions) showed me that the apparently accepted way to do matrix<br>
> operations from C is to simply assume R's column-first dense matrix<br>
> order, and access the 2D matrix as a flat 1D vector. (Which is easy.)<br>
><br>
> > The fact that internally R attempts to avoid copying for performance<br>
> > reasons is the only reason why your code may have appeared to work,<br>
> > but it's invalid!<br>
><br>
> I will probably change my code to allocate a new list from the C code<br>
> and return that, as you recommend. My main reason for doing the<br>
> allocation in R was just that it was simpler, especially given the<br>
> very limited documentation of R's C API.<br>
><br>
> But, I didn't see anything in the "Writing R Extensions" doc saying<br>
> that what my code is doing is "invalid", and more importantly, I don't<br>
> see why it would or should be invalid...<br>
><br>
> I'd still like to better understand why you think doing the initial<br>
> allocation of an object in R rather than C code is such a problem. So<br>
> far, I don't see any way that the R interpreter could ever tell the<br>
> difference.<br>
><br>
> Wait, or is the only objection here that I'm using C in a way that<br>
> makes pass-by-reference semantics visible to my R code? Which will<br>
> work completely correctly, but is not the The Proper R Way?<br>
><br>
> I don't actually need pass-by-reference behavior here at all, but I<br>
> can imagine cases where I might want it, so I'd like to understand<br>
> your objections better. Is using C to implement pass-by-reference<br>
> actually Broken, or merely Ugly? From my reasons above, I think it<br>
> will always work correctly and thus is not Broken. But of course<br>
> given R's devotion to pass-by-value, it could be considered<br>
> unacceptably Ugly.<br>
><br>
> --<br>
> Andrew Piskorski <<a href="mailto:atp@piskorski.com">atp@piskorski.com</a>><br>
> <a href="http://www.piskorski.com/" target="_blank">http://www.piskorski.com/</a><br>
><br>
> ______________________________________________<br>
> <a href="mailto:R-devel@r-project.org">R-devel@r-project.org</a> mailing list<br>
> <a href="https://stat.ethz.ch/mailman/listinfo/r-devel" target="_blank">https://stat.ethz.ch/mailman/listinfo/r-devel</a><br>
><br>
<br>
______________________________________________<br>
<a href="mailto:R-devel@r-project.org">R-devel@r-project.org</a> mailing list<br>
<a href="https://stat.ethz.ch/mailman/listinfo/r-devel" target="_blank">https://stat.ethz.ch/mailman/listinfo/r-devel</a><br>
</div></div></blockquote></div><br>