[Rcpp-devel] dynamic library from Rcpp/Rapi library
Michael Malecki
michael.malecki at columbia.edu
Wed Sep 28 02:18:19 CEST 2011
Already a colleague distills my long post:
What we want is three R functions, that
A. call a C++ program to read a Stan graphical model specification
and generate C++ code for a class extending an abstract base class
as well as factory methods to create/destroy instances,
B. compile the C++ code into a dynamically linkable library, using
header libraries Stan, Boost and Eigen, the versions of which we need
to control, and
C. create an an instance of the class implemented in the generated
C++ code (using a pre-specified factory method returning the base
class type), pass it data from R, run it, return the results to R.
Our current prototype is set up to
1. use RCpp to call the C++ program that reads the model and
generates C++ code,
2. exec an external compiler like g++ or clang++ on the generated code, and
3. use RCpp to call a C++ program that uses dlopen() to load the lib
created in 2, pass data from R to the C++ program and pass data
from C++ back to R.
Is there a cleaner way than dlopen() to link from within R cross
platform?
If we use dlopen(), can we dlopen()/dlclose() multiple times
without leaking memory to support repeated A/B/C steps during
model development and fitting?
Can we dlopen()/dlclose() and then dlopen() a function with
the same name? Or should we generate new factory/class namespaces
each time?
On Tue, Sep 27, 2011 at 7:18 PM, Michael Malecki <malecki at gmail.com> wrote:
> *1 Some questions about R and dynamic libraries. What I describe below
> appears quite nonstandard, but I think makes sense. We could be way off, in
> which case we'd like to know now; or there could just be some pitfalls that
> others might have encountered that we would benefit from knowing about.*
>
> *1.1 Overview: we are developing a package (Stan, as in Ulam) to perform
> Hamiltonian MCMC sampling from densities we can write down in C++, or can
> use a graphical modeling language (something that looks somewhat like BUGS
> describing nodes and their distributions).*
>
> The graphical model is used to generate c++ code that depends on Boost,
> Eigen, and Stan, which implements automatic differentiation of these
> high-dimension hard-to-sample densities. A stan::mcmc::hmc sampler is
> instantiated on a model, which is a stan::mcmc::prob_grad_ad, and a
> stan::mcmc::sample contains a std::vector<double> real_params, a
> std::vector<int> int_params, and a double log_prob.
>
> That's all well and good, but this is definitely an unusual thing to be
> doing from R.
>
> Inline has been suggested, but this would mean that each model would have
> to contain Rcpp hooks into R, to get data in and samples out. We would
> rather have a standalone stan model compiled as a shared object, that Rstan
> would interact with.
>
> Rstan would implement, for a concrete example, an R_data_reader class, with
> a public virtual method values("varname") returning a std::vector<double>
> via the handy Rcpp::environment::env["varname"] operator (and dimensions of
> real and integer parameters, using the sometimes-tricky but slick
> Rcpp::RObjects).
>
> *1.2 Rstan would interact with compiled stan shared objects via dlopen (or
> its windowsy friend), with Rstan::sample being a call to the model's sample
> method loaded with sample exposed to Rstan by dlsym(obj, "sample"). Is this
> not done in R pacakge compiled code because it is especially OS-dependent,
> (and how much are we talking?)*
>
> Is there some function in an R header that we don't know about that wraps
> dlopen in a less fragile way? We do not intend to expose any of the compiled
> stan .so methods directly to R – they would all be stan objects (like a
> vector of samples described above), and we'd use Rcpp to wrap the returns to
> R.
>
> We want to do this because it means stan shared objects can then interact
> with python or something else (or dump files in and out) through reader
> classes.
>
> *2 Some Questions for the Rcpp crowd*
>
> *2.1 We find no instances of this being done because most estimation is
> static model-fitting where only the data change. But if we want to change a
> model, we have a new sampler to compile, but as far as R is concerned, the
> only thing new is its name. The methods R needs to interact with are going
> to be the same (see the reader class above), and separating this way keeps R
> headers out of stan proper and only in the Rstan io. So first: does this
> sound reasonable given the description above?*
>
> *2.2 The inline package has been suggested, which would take our stan c++
> code and compile it with R CMD SHLIB. If we generated stan-c++ code that
> contained Rcpp headers and methods directly, we could inline it, right?*
>
> *2.3 R CMD SHLIB would have to be aware of our other headers (Eigen,
> boost, stan). For certain version dependencies (eigen has incremented 7
> version numbers, sometimes breaking backward compatibility, since we
> started), we plan to distribute stan with these libraries local. But more
> generally, is there a reason for or (as we are inclined) against using R CMD
> SHLIB (inline) to build stan shared objects?*
>
> *3 I (or one of my much more c++ savvy colleagues) will be happy to
> provide more details.*
>
>
> Thanks for your input!
>
> Michael Malecki
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20110927/0d3f6f79/attachment.htm>
More information about the Rcpp-devel
mailing list