Hi, <div>I defined a little C++ function and sourced it with sourceCpp, tested, works great. It so happens that I develop a package to interface R and Hadoop, and I need to use the same function in the map or reduce parameter of a mapreduce call. This means that the definition of the function will have to be saved, distributed to a cluster and loaded and executed in multiple R instances. This works fine with R functions, but not sourceCpp-ed functions. In fact, the same error happens just by restarting the interpreter in RStudio, so I am pretty sure there is nothing specific to my code, but I wanted to give you the context to explain why this matters to me and the users of my package. This is a session in RStudio</div>
<div><br></div><div><div>> sourceCpp(file="rmr2/pkg/src/psum.cpp")</div><div>> psum(list(1:4, 1:5))</div><div>[1] 10 15</div><div><br></div><div>Restarting R session...</div><div><br></div><div>> psum(list(1:4, 1:5))</div>
<div>Error in .External(list(name = "InternalFunction_invoke", address = <pointer: 0x0>, : </div><div> NULL value passed as symbol address</div></div><div><br></div><div>the C code, probably irrelevant to this issue</div>
<div><br></div><div><div>#include <vector></div><div>#include <Rcpp.h></div><div><br></div><div>// [[Rcpp::export]]</div><div>std::vector<double> psum(Rcpp::List xx) {</div><div> std::vector<double> results(xx.size());</div>
<div> for(int i = 0; i < xx.size(); i ++) {</div><div> std::vector<double> x = Rcpp::as<std::vector<double> >(xx[i]);</div><div> for(int j = 0; j < x.size(); j++) {</div><div> results[i] += x[j];}}</div>
<div> return results;}</div></div><div><br></div><div><br></div><div>I remember something of this sort happening also with cxxfunction and that someone recommended to create a package for C extensions that aren't a one-off thing. But I would like to make the case that this is not a satisfactory solution because it raises the bar for users who can write a C function but may not be ready to write a complete package. For instance, the may just want to replace a</div>
<div><br></div><div>sapply(data, sum)</div><div><br></div><div>with the above function to see what kind of speed boost they can get. If they are doing this in the context of a mapreduce job, they can't as they will get the above error. They need to write a package around that single function to test their idea. That seems a bit too much. If they are not using RHadoop but, say, developing in RStudio, every time they rebuild a package they have to re-source their C code, not a show-stopper but an inconvenience. Am I missing something? Suggestions? Thanks</div>
<div><br></div><div><br></div><div>Antonio</div>