[Rcpp-devel] dynamic library from Rcpp/Rapi library

Wed Sep 28 15:16:55 CEST 2011

On Tue, Sep 27, 2011 at 8:52 PM, Dirk Eddelbuettel <edd at debian.org> wrote:
>
> Hi Michael,
>
> Thanks for posting here.  There is a lot of meat in this post, so I'll try to
> be brief. It is also late, so my concentration may not be at full tilt.
>
> I think in principle dlopen() should work. I never looked at exactly how R's
> own dyn.load() is implemented but I suspect it uses dlopen() so you may get
> this to work.  dlopen() can have issues across platforms.  Do you need to
> this to work "whereever", ie Linux, OS X and even on that other unspeakable
> platform?  Much harder -- but as R has done the abstracting of it, I'd try as
> much as possible to lean on R and its dynamic extensions. That simply
> works...
>
> Also, it is sometimes good to remember that Rcpp is after all 'just' glue
> between R and C++. It alters neither R not C++, it 'just' gets them a little
> closer together which is A Good Thing (TM) in my book.  With that, I often
> attempt to build proofs of concept in C++ alone (ie work out header creation,
> compile, link, and here for you dlopen() ...) before trying to do it from R.
>
> On 27 September 2011 at 20:18, Michael Malecki wrote:
> | Already a colleague distills my long post:
> |
> | What we want is three R functions, that
> |
> | A.  call a C++ program to read a Stan graphical model specification
> | and generate C++ code for a class extending an abstract base class
> | as well as factory methods to create/destroy instances,
> |
> | B.  compile the C++ code into a dynamically linkable library, using
> | header libraries Stan, Boost and Eigen, the versions of which we need
> | to control, and
> |
> | C.  create an an instance of the class implemented in the generated
> | C++ code (using a pre-specified factory method returning the base
> | class type), pass it data from R, run it, return the results to R.
>
> Looks good.
>
> | Our current prototype is set up to
> |
> | 1.  use RCpp to call the C++ program that reads the model and
> | generates C++ code,
>
> "use Rcpp to call the C++ program" is not correct language.
>
> Rcpp is an interface between C++ and R data structures. It is not a running /
> working data broker, or compiler, or ...
>
> You can use Rcpp to interface your Stan, Boost, Eigen, ... headers. Rcpp
> modules can help with interfaces (but is not yet the most robust solution,
> though very promising) otherwise you can do it by hand too.
>
> | 2.  exec an external compiler like g++ or clang++ on the generated code, and
> |
> | 3.  use RCpp to call a C++ program that uses dlopen() to load the lib
> | created in 2, pass data from R to the C++ program and pass data
> | from C++ back to R.
>
> 2. and 3. are pretty close to what inline does for Rcpp. inline can be
> extended to other headers as we have done for Armadillo, (parts of) GSL,
> and Eigen.
>
> | Is there a cleaner way than dlopen() to link from within R cross
> | platform?
>
> R's own dyn.load() which is what library() does, and which is what inline's
> cfunction() and cxxfunction() do.
>
> | If we use dlopen(), can we dlopen()/dlclose() multiple times
> | without leaking memory to support repeated A/B/C steps during
> | model development and fitting?
>
> I see no reason why not.
>
> | Can we dlopen()/dlclose() and then dlopen() a function with
> | the same name?  Or should we generate new factory/class namespaces
> | each time?
>
> Isn't dlopen() a C function?   Protecting namespaces is good practice anyway.
>
> | On Tue, Sep 27, 2011 at 7:18 PM, Michael Malecki <malecki at gmail.com> wrote:
> |
> |
> |     1 Some questions about R and dynamic libraries. What I describe below
> |     appears quite nonstandard, but I think makes sense. We could be way off, in
> |     which case we'd like to know now; or there could just be some pitfalls that
> |     others might have encountered that we would benefit from knowing about.
> |
> |     1.1 Overview: we are developing a package (Stan, as in Ulam) to perform
> |     Hamiltonian MCMC sampling from densities we can write down in C++, or can
> |     use a graphical modeling language (something that looks somewhat like BUGS
> |     describing nodes and their distributions).
> |
> |     The graphical model is used to generate c++ code that depends on Boost,
> |     Eigen, and Stan, which implements automatic differentiation of these
> |     high-dimension hard-to-sample densities. A stan::mcmc::hmc sampler is
> |     instantiated on a model, which is a stan::mcmc::prob_grad_ad, and a
> |     stan::mcmc::sample contains a std::vector<double> real_params, a
> |     std::vector<int> int_params, and a double log_prob.
> |
> |     That's all well and good, but this is definitely an unusual thing to be
> |     doing from R.
> |
> |     Inline has been suggested, but this would mean that each model would have
> |     to contain Rcpp hooks into R, to get data in and samples out. We would
> |     rather have a standalone stan model compiled as a shared object, that Rstan
> |     would interact with.
>
> inline is a hood to get C, C++ or Fortran code into R.  We adapted it to also
> be workable with Rcpp, but inline does not impose Rcpp if you do not want
> Rcpp.  So in that sense what you write is not correct.
>
> |     Rstan would implement, for a concrete example, an R_data_reader class, with
> |     a public virtual method values("varname") returning a std::vector<double>
> |     via the handy Rcpp::environment::env["varname"] operator (and dimensions of
> |     real and integer parameters, using the sometimes-tricky but slick
> |     Rcpp::RObjects).
> |
> |     1.2 Rstan would interact with compiled stan shared objects via dlopen (or
> |     its windowsy friend), with Rstan::sample being a call to the model's sample
> |     method loaded with sample exposed to Rstan by dlsym(obj, "sample"). Is this
> |     not done in R pacakge compiled code because it is especially OS-dependent,
> |     (and how much are we talking?)
>
> I would always use packages as a first instance. They work. End of story.
> Proof to the world why you need more ...
>
> |     Is there some function in an R header that we don't know about that wraps
> |     dlopen in a less fragile way? We do not intend to expose any of the
> |     compiled stan .so methods directly to R – they would all be stan objects
> |     (like a vector of samples described above), and we'd use Rcpp to wrap the
> |     returns to R.
>
> dlopen is not referenced in the 'Writing R Extension' manual, and AFAIK not
> part of the API. So the question is malformed.
>
> |     We want to do this because it means stan shared objects can then interact
> |     with python or something else (or dump files in and out) through reader
> |     classes.
> |
> |     2 Some Questions for the Rcpp crowd
> |
> |     2.1 We find no instances of this being done because most estimation is
> |     static model-fitting where only the data change. But if we want to change a
> |     model, we have a new sampler to compile, but as far as R is concerned, the
> |     only thing new is its name. The methods R needs to interact with are going
> |     to be the same (see the reader class above), and separating this way keeps
> |     R headers out of stan proper and only in the Rstan io. So first: does this
> |     sound reasonable given the description above?
>
> I think you want to watch a bit more closely what Doug Bates is doing in
> lme4eigen which seems to me (as an innocent bystander) to be related so
> sophisticated model updates.

It seems to me that this is an overly complex approach but that may
just be my only having had one cup of coffee so far.  To me this feels
like trying to take a particular approach that works for separately
compiled C++ programs and wedging it into R in some way.  It may be
possible but what you will end up with may not be pretty.

What Dirk is referring to in the lme4Eigen (capitalization is slightly
different from what he wrote) is the combination of reference class
objects in R and C++ objects from particular classes.  The S4 classes
in R and the more commonly used S3 method dispatch mechanism are a
different design from what C++ or Java programmers are accustomed to.
The recently introduced reference classes, which are not overly well
documented - start with ?setRefClass in an R session, store references
to data members and incorporate methods as part of the class.  If you
change a data member in an object from such a  class the data member
is changed in the object itself, not in a copy of that object.

In lme4Eigen I have objects representing statistical models for which
the parameter estimates are those that optimize a criterion, say
maximizing the likelihood or the posterior density.  When the
reference class instance is constructed it does little more than
generate a corresponding object in a C++ class and return an "external
pointer" to that object which is one of the data members of the R
reference class.  So, in other words, there is an R object that, for
all intents and purposes, just holds a pointer to an instance of a C++
class.  All the action - setting a new value of the parameters,
evaluating the objective function, etc., - takes place in the C++
class instance but becomes visible to R through methods defined on the
R reference class object.

There is a native mechanism in Rcpp, called Rcpp modules, that does
this implicitly, but I ended up rolling my own mechanism for this
because the internals of Rcpp modules got too complicated for me and
there are some subtle issues related to serializing/unserializing R
objects that, for me at least, were difficult to address in Rcpp
modules.

When you speak of graphical models I begin to think of MCMC.  If that
is indeed your goal, I think this would be a good mechanism because
you can have hooks in R to query and update the state of an instance
of a C++ class representing the state of the chain.  However, as I
said, I am just starting on the second cup of coffee and it is not
unlikely that I missed the point entirely.

> |     2.2 The inline package has been suggested, which would take our stan c++
> |     code and compile it with R CMD SHLIB. If we generated stan-c++ code that
> |     contained Rcpp headers and methods directly, we could inline it, right?
>
> Inline is orthogonal to how your lay out organise your code. It is a helper.
> It doesn't impose anything, really -- it "merely" makes Rcpp easier for
> experimentation as it already did for C, C++ and Fortran without the added
> Rcpp API.
>
> |     2.3 R CMD SHLIB would have to be aware of our other headers (Eigen, boost,
> |     stan). For certain version dependencies (eigen has incremented 7 version
> |     numbers, sometimes breaking backward compatibility, since we started), we
> |     plan to distribute stan with these libraries local. But more generally, is
> |     there a reason for or (as we are inclined) against using R CMD SHLIB
> |     (inline) to build stan shared objects?
>
> Just distribute source and people rebuild. Boost has changed may more often,
> and people have gotten used to the need for rebuilds.
>
> Dirk
>
> |     3 I (or one of my much more c++ savvy colleagues) will be happy to provide
> |     more details.
> |
> |
> |     Thanks for your input!
> |
> |     Michael Malecki
> |
> |
> |
> | ----------------------------------------------------------------------
> | _______________________________________________
> | Rcpp-devel mailing list
> | Rcpp-devel at lists.r-forge.r-project.org
> | https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
> --
> New Rcpp master class for R and C++ integration is scheduled for
> San Francisco (Oct 8), more details / reg.info available at
> http://www.revolutionanalytics.com/products/training/public/rcpp-master-class.php
> _______________________________________________
> Rcpp-devel mailing list
> Rcpp-devel at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel