[Rcpp-devel] loading a souceCpp-ed function

JJ Allaire jj.allaire at gmail.com
Wed Jan 30 18:44:23 CET 2013


Antonio,

Thanks for all of the detailed feedback on this. We'll take a closer
look at this issue and hopefully come up with an improved
implementation that overcomes the current session-bound nature of
sourceCpp.

J.J.

On Tue, Jan 29, 2013 at 6:35 PM, Antonio Piccolboni
<antonio at piccolboni.info> wrote:
> I wonder if (2) is possible why not have a default path. It seems to me
> leaving the R object broken is not the best user experience. It would also
> help me on the RHadoop end since I could pack the directory and broadcast it
> to the cluster using the distributed cache feature of RHadoop. But I may
> also be imagining a need where there is none, how many R users are willing
> to write ad hoc C functions for one-off use? If one is putting in the work
> to write in C, he may as well be willing to organize that work in a package.
> Thanks
>
>
> Antonio
>
>
> On Tue, Jan 29, 2013 at 1:48 PM, JJ Allaire <jj.allaire at gmail.com> wrote:
>>
>> Hi Antonio,
>>
>> You are correct in your analysis: currently C++ functions do not
>> persist across sessions and require recompilation. The answer is of
>> course to create a package, but as you point out for ad-hoc work this
>> might be considered too heavyweight.
>>
>> Two possible longer term solutions exist:
>>
>> (1) Save the contents of the shared library along with the function
>> (this is trickier to do properly than it sounds)
>>
>> (2) More likely, we could allow you to optionally provide an external
>> (non-temporary) directory for build output. Since that directory will
>> survive across sessions, your C++ function would of course also.
>>
>> J.J.
>>
>> On Tue, Jan 29, 2013 at 4:20 PM, Antonio Piccolboni
>> <antonio at piccolboni.info> wrote:
>> > Hi,
>> > I defined a little C++ function and sourced it with sourceCpp, tested,
>> > works
>> > great. It so happens that I develop a package to interface R and Hadoop,
>> > and
>> > I need  to use the same function in the map or reduce parameter of a
>> > mapreduce call. This means that the definition of the function will have
>> > to
>> > be saved, distributed to a cluster and loaded and executed in multiple R
>> > instances. This works fine with R functions, but not sourceCpp-ed
>> > functions.
>> > In fact, the same error happens just by restarting the interpreter in
>> > RStudio, so I am pretty sure there is nothing specific to my code, but I
>> > wanted to give you the context to explain why this matters to me and the
>> > users of my package. This is a session in RStudio
>> >
>> >> sourceCpp(file="rmr2/pkg/src/psum.cpp")
>> >> psum(list(1:4, 1:5))
>> > [1] 10 15
>> >
>> > Restarting R session...
>> >
>> >> psum(list(1:4, 1:5))
>> > Error in .External(list(name = "InternalFunction_invoke", address =
>> > <pointer: 0x0>,  :
>> >   NULL value passed as symbol address
>> >
>> > the C code, probably irrelevant to this issue
>> >
>> > #include <vector>
>> > #include <Rcpp.h>
>> >
>> > // [[Rcpp::export]]
>> > std::vector<double> psum(Rcpp::List xx) {
>> >   std::vector<double> results(xx.size());
>> >   for(int i = 0; i < xx.size(); i ++) {
>> >     std::vector<double> x = Rcpp::as<std::vector<double> >(xx[i]);
>> >     for(int j = 0; j < x.size(); j++) {
>> >       results[i] += x[j];}}
>> >   return results;}
>> >
>> >
>> > I remember something of this sort happening also with cxxfunction and
>> > that
>> > someone recommended to create a package for C extensions that aren't a
>> > one-off thing. But I would like to make the case that this is not a
>> > satisfactory solution because it raises the bar for users who can write
>> > a C
>> > function but may not be ready to write a complete package. For instance,
>> > the
>> > may just want to replace a
>> >
>> > sapply(data, sum)
>> >
>> > with the above function to see what kind of speed boost they can get. If
>> > they are doing this in the context of a mapreduce job, they can't as
>> > they
>> > will get the above error. They need to write a package around that
>> > single
>> > function to test their idea. That seems a bit too much. If they are not
>> > using RHadoop but, say, developing in RStudio, every time they rebuild a
>> > package they have to re-source their C code, not a show-stopper but an
>> > inconvenience. Am I missing something? Suggestions? Thanks
>> >
>> >
>> > Antonio
>> >
>> > _______________________________________________
>> > Rcpp-devel mailing list
>> > Rcpp-devel at lists.r-forge.r-project.org
>> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
>
>


More information about the Rcpp-devel mailing list