[Rcpp-devel] dynamic library from Rcpp/Rapi library

Dirk Eddelbuettel edd at debian.org
Wed Sep 28 03:52:10 CEST 2011


Hi Michael,

Thanks for posting here.  There is a lot of meat in this post, so I'll try to
be brief. It is also late, so my concentration may not be at full tilt.

I think in principle dlopen() should work. I never looked at exactly how R's
own dyn.load() is implemented but I suspect it uses dlopen() so you may get
this to work.  dlopen() can have issues across platforms.  Do you need to
this to work "whereever", ie Linux, OS X and even on that other unspeakable
platform?  Much harder -- but as R has done the abstracting of it, I'd try as
much as possible to lean on R and its dynamic extensions. That simply
works...

Also, it is sometimes good to remember that Rcpp is after all 'just' glue
between R and C++. It alters neither R not C++, it 'just' gets them a little
closer together which is A Good Thing (TM) in my book.  With that, I often
attempt to build proofs of concept in C++ alone (ie work out header creation,
compile, link, and here for you dlopen() ...) before trying to do it from R.

On 27 September 2011 at 20:18, Michael Malecki wrote:
| Already a colleague distills my long post:
| 
| What we want is three R functions, that
| 
| A.  call a C++ program to read a Stan graphical model specification
| and generate C++ code for a class extending an abstract base class
| as well as factory methods to create/destroy instances,
| 
| B.  compile the C++ code into a dynamically linkable library, using
| header libraries Stan, Boost and Eigen, the versions of which we need
| to control, and
| 
| C.  create an an instance of the class implemented in the generated
| C++ code (using a pre-specified factory method returning the base
| class type), pass it data from R, run it, return the results to R.

Looks good.

| Our current prototype is set up to
| 
| 1.  use RCpp to call the C++ program that reads the model and
| generates C++ code,

"use Rcpp to call the C++ program" is not correct language.  

Rcpp is an interface between C++ and R data structures. It is not a running /
working data broker, or compiler, or ...

You can use Rcpp to interface your Stan, Boost, Eigen, ... headers. Rcpp
modules can help with interfaces (but is not yet the most robust solution,
though very promising) otherwise you can do it by hand too.
 
| 2.  exec an external compiler like g++ or clang++ on the generated code, and
| 
| 3.  use RCpp to call a C++ program that uses dlopen() to load the lib
| created in 2, pass data from R to the C++ program and pass data
| from C++ back to R.

2. and 3. are pretty close to what inline does for Rcpp. inline can be
extended to other headers as we have done for Armadillo, (parts of) GSL,
and Eigen.
 
| Is there a cleaner way than dlopen() to link from within R cross
| platform?

R's own dyn.load() which is what library() does, and which is what inline's
cfunction() and cxxfunction() do.
 
| If we use dlopen(), can we dlopen()/dlclose() multiple times
| without leaking memory to support repeated A/B/C steps during
| model development and fitting?

I see no reason why not.
 
| Can we dlopen()/dlclose() and then dlopen() a function with
| the same name?  Or should we generate new factory/class namespaces
| each time?

Isn't dlopen() a C function?   Protecting namespaces is good practice anyway.

| On Tue, Sep 27, 2011 at 7:18 PM, Michael Malecki <malecki at gmail.com> wrote:
| 
| 
|     1 Some questions about R and dynamic libraries. What I describe below
|     appears quite nonstandard, but I think makes sense. We could be way off, in
|     which case we'd like to know now; or there could just be some pitfalls that
|     others might have encountered that we would benefit from knowing about.
| 
|     1.1 Overview: we are developing a package (Stan, as in Ulam) to perform
|     Hamiltonian MCMC sampling from densities we can write down in C++, or can
|     use a graphical modeling language (something that looks somewhat like BUGS
|     describing nodes and their distributions).
| 
|     The graphical model is used to generate c++ code that depends on Boost,
|     Eigen, and Stan, which implements automatic differentiation of these
|     high-dimension hard-to-sample densities. A stan::mcmc::hmc sampler is
|     instantiated on a model, which is a stan::mcmc::prob_grad_ad, and a
|     stan::mcmc::sample contains a std::vector<double> real_params, a
|     std::vector<int> int_params, and a double log_prob.
| 
|     That's all well and good, but this is definitely an unusual thing to be
|     doing from R. 
| 
|     Inline has been suggested, but this would mean that each model would have
|     to contain Rcpp hooks into R, to get data in and samples out. We would
|     rather have a standalone stan model compiled as a shared object, that Rstan
|     would interact with.

inline is a hood to get C, C++ or Fortran code into R.  We adapted it to also
be workable with Rcpp, but inline does not impose Rcpp if you do not want
Rcpp.  So in that sense what you write is not correct.
 
|     Rstan would implement, for a concrete example, an R_data_reader class, with
|     a public virtual method values("varname") returning a std::vector<double>
|     via the handy Rcpp::environment::env["varname"] operator (and dimensions of
|     real and integer parameters, using the sometimes-tricky but slick
|     Rcpp::RObjects).
| 
|     1.2 Rstan would interact with compiled stan shared objects via dlopen (or
|     its windowsy friend), with Rstan::sample being a call to the model's sample
|     method loaded with sample exposed to Rstan by dlsym(obj, "sample"). Is this
|     not done in R pacakge compiled code because it is especially OS-dependent,
|     (and how much are we talking?)

I would always use packages as a first instance. They work. End of story.
Proof to the world why you need more ...
 
|     Is there some function in an R header that we don't know about that wraps
|     dlopen in a less fragile way? We do not intend to expose any of the
|     compiled stan .so methods directly to R – they would all be stan objects
|     (like a vector of samples described above), and we'd use Rcpp to wrap the
|     returns to R.

dlopen is not referenced in the 'Writing R Extension' manual, and AFAIK not
part of the API. So the question is malformed.
 
|     We want to do this because it means stan shared objects can then interact
|     with python or something else (or dump files in and out) through reader
|     classes.
| 
|     2 Some Questions for the Rcpp crowd
| 
|     2.1 We find no instances of this being done because most estimation is
|     static model-fitting where only the data change. But if we want to change a
|     model, we have a new sampler to compile, but as far as R is concerned, the
|     only thing new is its name. The methods R needs to interact with are going
|     to be the same (see the reader class above), and separating this way keeps
|     R headers out of stan proper and only in the Rstan io. So first: does this
|     sound reasonable given the description above?

I think you want to watch a bit more closely what Doug Bates is doing in
lme4eigen which seems to me (as an innocent bystander) to be related so
sophisticated model updates.
 
|     2.2 The inline package has been suggested, which would take our stan c++
|     code and compile it with R CMD SHLIB. If we generated stan-c++ code that
|     contained Rcpp headers and methods directly, we could inline it, right?

Inline is orthogonal to how your lay out organise your code. It is a helper.
It doesn't impose anything, really -- it "merely" makes Rcpp easier for
experimentation as it already did for C, C++ and Fortran without the added
Rcpp API.
 
|     2.3 R CMD SHLIB would have to be aware of our other headers (Eigen, boost,
|     stan). For certain version dependencies (eigen has incremented 7 version
|     numbers, sometimes breaking backward compatibility, since we started), we
|     plan to distribute stan with these libraries local. But more generally, is
|     there a reason for or (as we are inclined) against using R CMD SHLIB
|     (inline) to build stan shared objects?

Just distribute source and people rebuild. Boost has changed may more often,
and people have gotten used to the need for rebuilds.

Dirk
 
|     3 I (or one of my much more c++ savvy colleagues) will be happy to provide
|     more details.
| 
| 
|     Thanks for your input!
| 
|     Michael Malecki
| 
| 
| 
| ----------------------------------------------------------------------
| _______________________________________________
| Rcpp-devel mailing list
| Rcpp-devel at lists.r-forge.r-project.org
| https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
-- 
New Rcpp master class for R and C++ integration is scheduled for 
San Francisco (Oct 8), more details / reg.info available at
http://www.revolutionanalytics.com/products/training/public/rcpp-master-class.php


More information about the Rcpp-devel mailing list