Already a colleague distills my long post:<div><br></div><div><span class="Apple-style-span" style="color: rgb(34, 34, 34); font-family: arial, sans-serif; font-size: 13px; background-color: rgb(255, 255, 255); ">What we want is three R functions, that<br>
<br>A. call a C++ program to read a Stan graphical model specification<br>and generate C++ code for a class extending an abstract base class<br>as well as factory methods to create/destroy instances,<br><br>B. compile the C++ code into a dynamically linkable library, using<br>
header libraries Stan, Boost and Eigen, the versions of which we need<br>to control, and<br><br>C. create an an instance of the class implemented in the generated<br>C++ code (using a pre-specified factory method returning the base<br>
class type), pass it data from R, run it, return the results to R.<br><br>Our current prototype is set up to<br><br>1. use RCpp to call the C++ program that reads the model and<br>generates C++ code,<br><br>2. exec an external compiler like g++ or clang++ on the generated code, and<br>
<br>3. use RCpp to call a C++ program that uses dlopen() to load the lib<br>created in 2, pass data from R to the C++ program and pass data<br>from C++ back to R.<br><br>Is there a cleaner way than dlopen() to link from within R cross<br>
platform?<br><br>If we use dlopen(), can we dlopen()/dlclose() multiple times<br>without leaking memory to support repeated A/B/C steps during<br>model development and fitting?<br><br>Can we dlopen()/dlclose() and then dlopen() a function with<br>
the same name? Or should we generate new factory/class namespaces<br></span><span class="Apple-style-span" style="color: rgb(34, 34, 34); font-family: arial, sans-serif; font-size: 13px; background-color: rgb(255, 255, 255); ">each time?</span></div>
<div><font class="Apple-style-span" color="#222222" face="arial, sans-serif"><br></font><br><div class="gmail_quote">On Tue, Sep 27, 2011 at 7:18 PM, Michael Malecki <span dir="ltr"><<a href="mailto:malecki@gmail.com">malecki@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><p><b>1 Some questions about R and dynamic libraries. What I describe below appears quite nonstandard, but I think makes sense. We could be way off, in which case we'd like to know now; or there could just be some pitfalls that others might have encountered that we would benefit from knowing about.</b></p>
<p><b>1.1 Overview: we are developing a package (Stan, as in Ulam) to perform Hamiltonian MCMC sampling from densities we can write down in C++, or can use a graphical modeling language (something that looks somewhat like BUGS describing nodes and their distributions).</b></p>
<p>The graphical model is used to generate c++ code that depends on Boost, Eigen, and Stan, which implements automatic differentiation of these high-dimension hard-to-sample densities. A stan::mcmc::hmc sampler is instantiated on a model, which is a stan::mcmc::prob_grad_ad, and a stan::mcmc::sample contains a std::vector<double> real_params, a std::vector<int> int_params, and a double log_prob.</p>
<p>That's all well and good, but this is definitely an unusual thing to be doing from R. </p><p>Inline has been suggested, but this would mean that each model would have to contain Rcpp hooks into R, to get data in and samples out. We would rather have a standalone stan model compiled as a shared object, that Rstan would interact with.</p>
<p>Rstan would implement, for a concrete example, an R_data_reader class, with a public virtual method values("varname") returning a std::vector<double> via the handy Rcpp::environment::env["varname"] operator (and dimensions of real and integer parameters, using the sometimes-tricky but slick Rcpp::RObjects).</p>
<p><b>1.2 Rstan would interact with compiled stan shared objects via dlopen (or its windowsy friend), with Rstan::sample being a call to the model's sample method loaded with sample exposed to Rstan by dlsym(obj, "sample"). Is this not done in R pacakge compiled code because it is especially OS-dependent, (and how much are we talking?)</b></p>
<p>Is there some function in an R header that we don't know about that wraps dlopen in a less fragile way? We do not intend to expose any of the compiled stan .so methods directly to R – they would all be stan objects (like a vector of samples described above), and we'd use Rcpp to wrap the returns to R.</p>
<p>We want to do this because it means stan shared objects can then interact with python or something else (or dump files in and out) through reader classes.</p><p><b>2 Some Questions for the Rcpp crowd</b></p><p><b>2.1 We find no instances of this being done because most estimation is static model-fitting where only the data change. But if we want to change a model, we have a new sampler to compile, but as far as R is concerned, the only thing new is its name. The methods R needs to interact with are going to be the same (see the reader class above), and separating this way keeps R headers out of stan proper and only in the Rstan io. So first: does this sound reasonable given the description above?</b></p>
<p><b>2.2 The inline package has been suggested, which would take our stan c++ code and compile it with R CMD SHLIB. If we generated stan-c++ code that contained Rcpp headers and methods directly, we could inline it, right?</b></p>
<p><b>2.3 R CMD SHLIB would have to be aware of our other headers (Eigen, boost, stan). For certain version dependencies (eigen has incremented 7 version numbers, sometimes breaking backward compatibility, since we started), we plan to distribute stan with these libraries local. But more generally, is there a reason for or (as we are inclined) against using R CMD SHLIB (inline) to build stan shared objects?</b></p>
<p><b>3 I (or one of my much more c++ savvy colleagues) will be happy to provide more details.</b></p><p><br></p><p>Thanks for your input!</p><p>Michael Malecki</p>
</blockquote></div><br></div>