[Rcpp-devel] Struggling with cppFunction() and clusterExport()

Romain Francois romain at r-enthusiasts.com
Fri Sep 27 11:28:32 CEST 2013


Le 27/09/13 00:20, Matteo Fasiolo a écrit :
> Thanks a lot for your replies, I will go for the package solution, given
> that
> I'm organizing all my code in a package in any case.
> Matteo

That's the preferred approach.

It might now however give you the flexibility of passing arbitrary 
functions at runtime. So you have to in advanced know all the functions 
you are going to apply and include them in your utility package.

It might be worth thinking about ways to transport the result of a 
cppFunction call.

The problem also occurs when you close and restart an R session.

 > fun <- cppFunction( "int foo(int i){ return i ; }" )
 > body(fun)[[2L]]
<pointer: 0x102f7de00>
attr(,"class")
[1] "NativeSymbol"
 > q("yes")


restarting the R session:

 > body(fun)[[2L]]
<pointer: 0x0>
attr(,"class")
[1] "NativeSymbol"

It might be worth handling this internally. The object could carry with 
it its definition and compile itself if the pointer is the null pointer, 
which is what you get apparently.

Unless things have evolved in R, there is no way to control how an 
external pointer is serialized and reloaded.

Romain

> On Thu, Sep 26, 2013 at 6:15 PM, Romain Francois
> <romain at r-enthusiasts.com <mailto:romain at r-enthusiasts.com>> wrote:
>
>     The usual way is to put the function in a package and load the package.
>
>     Otherwise, you could do something along these lines
>
>     auto_function <- function( code, ... ){
>          dots <- list(code, ...)
>          function(...){
>              do.call( cppFunction, dots )( ... )
>          }
>     }
>
>     This way the function knows how to compile itself, for example:
>
>      > fun <- auto_function(' double inner_Cpp(double a){ return 1;  } ' )
>     # this takes a while the first time
>      > fun( 2 )
>     [1] 1
>     # this is instant thanks to caching of sourceCpp
>      > fun( 2 )
>     [1] 1
>
>     Romain
>
>     Le 26/09/13 18:57, Matteo Fasiolo a écrit :
>
>         Dear Rcpp developers,
>
>            I'm trying to parallelize some of my algorithms but I have
>         encountered
>         the following problem:
>
>         # I have a cppFunction
>         cppFunction(' double inner_Cpp(double a){ return 1;  } ')
>
>         # And an R wrapper around it
>         wrapper_R<- function(input)
>         {
>             inner_Cpp(input)
>         }
>
>         # And I want to call the wrappen in parallel within algorithm
>         algo <- function(input)
>         {
>             cl <- makeCluster(2)
>             clusterExport(cl, "inner_Cpp")
>             a <- clusterApply(cl, 1:2, wrapper_R)
>             stopCluster(cl)
>             a
>         }
>
>         algo(2)
>
>         Error in checkForRemoteErrors(val) :
>             2 nodes produced errors; first error: NULL value passed as
>         symbol address
>
>         It seems that I'm unable to find the address of inner_Cpp.
>         If the inner function is an R function there is no problem:
>
>         inner_R <- function(input) 1
>
>         wrapper_R<- function(input)
>         {
>             inner_R(input)
>         }
>
>         algo <- function(input)
>         {
>             cl <- makeCluster(2)
>             clusterExport(cl, "inner_R")
>             a <- clusterApply(cl, 1:2, wrapper_R)
>             stopCluster(cl)
>             a
>         }
>
>         algo(2)
>
>         [[1]]
>         [1] 1
>
>         [[2]]
>         [1] 1
>
>         Do you have any idea about why this is happening?
>         Given that I have just started parallelizing my algorithms in this
>         way, any suggestion/criticism about the overall approach is
>         more then welcome!
>
>         Matteo


-- 
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30



More information about the Rcpp-devel mailing list