[Rcpp-devel] loading a souceCpp-ed function

JJ Allaire jj.allaire at gmail.com
Tue Jan 29 22:48:54 CET 2013


Hi Antonio,

You are correct in your analysis: currently C++ functions do not
persist across sessions and require recompilation. The answer is of
course to create a package, but as you point out for ad-hoc work this
might be considered too heavyweight.

Two possible longer term solutions exist:

(1) Save the contents of the shared library along with the function
(this is trickier to do properly than it sounds)

(2) More likely, we could allow you to optionally provide an external
(non-temporary) directory for build output. Since that directory will
survive across sessions, your C++ function would of course also.

J.J.

On Tue, Jan 29, 2013 at 4:20 PM, Antonio Piccolboni
<antonio at piccolboni.info> wrote:
> Hi,
> I defined a little C++ function and sourced it with sourceCpp, tested, works
> great. It so happens that I develop a package to interface R and Hadoop, and
> I need  to use the same function in the map or reduce parameter of a
> mapreduce call. This means that the definition of the function will have to
> be saved, distributed to a cluster and loaded and executed in multiple R
> instances. This works fine with R functions, but not sourceCpp-ed functions.
> In fact, the same error happens just by restarting the interpreter in
> RStudio, so I am pretty sure there is nothing specific to my code, but I
> wanted to give you the context to explain why this matters to me and the
> users of my package. This is a session in RStudio
>
>> sourceCpp(file="rmr2/pkg/src/psum.cpp")
>> psum(list(1:4, 1:5))
> [1] 10 15
>
> Restarting R session...
>
>> psum(list(1:4, 1:5))
> Error in .External(list(name = "InternalFunction_invoke", address =
> <pointer: 0x0>,  :
>   NULL value passed as symbol address
>
> the C code, probably irrelevant to this issue
>
> #include <vector>
> #include <Rcpp.h>
>
> // [[Rcpp::export]]
> std::vector<double> psum(Rcpp::List xx) {
>   std::vector<double> results(xx.size());
>   for(int i = 0; i < xx.size(); i ++) {
>     std::vector<double> x = Rcpp::as<std::vector<double> >(xx[i]);
>     for(int j = 0; j < x.size(); j++) {
>       results[i] += x[j];}}
>   return results;}
>
>
> I remember something of this sort happening also with cxxfunction and that
> someone recommended to create a package for C extensions that aren't a
> one-off thing. But I would like to make the case that this is not a
> satisfactory solution because it raises the bar for users who can write a C
> function but may not be ready to write a complete package. For instance, the
> may just want to replace a
>
> sapply(data, sum)
>
> with the above function to see what kind of speed boost they can get. If
> they are doing this in the context of a mapreduce job, they can't as they
> will get the above error. They need to write a package around that single
> function to test their idea. That seems a bit too much. If they are not
> using RHadoop but, say, developing in RStudio, every time they rebuild a
> package they have to re-source their C code, not a show-stopper but an
> inconvenience. Am I missing something? Suggestions? Thanks
>
>
> Antonio
>
> _______________________________________________
> Rcpp-devel mailing list
> Rcpp-devel at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel


More information about the Rcpp-devel mailing list