[Rcpp-devel] RcppArmadillo

Dirk Eddelbuettel edd at debian.org
Sun Feb 3 15:06:34 CET 2013


Hi Simon,

On 3 February 2013 at 14:23, Simon Zehnder wrote:
| Thanks Dirk! 
| 
| I worked a lot on it already yesterday. 
| 
| Regarding OpenMP: I haven't yet found the underlying data structure for SEXPs but there should be something in the R Internals Guide.

I think you misunderstood.  Let me be more clear:

1)  R is single-threaded.  Always has been, probably always will be. 

    (As the story as far as I know goes, someone now in R Core and rather
    well-known for deep understanding of R / S internals spent a good chunk
    of his PhD research time trying to alter that and could not, even while
    working under the direction of one the R / S founder.)

2)  So you need to 'set a mutex' when you leave R, invoke parallel code which
    you can get via OpenMP from the compiler, remove the mutex and return to R.  

    In fact based on the work Luke Tierney did years ago in pnmath and pnmath0,
    parts of R's libraries now do this (very carefully) themselves. It is
    expected that more parts of R will as the multicore world beckons, but
    this is hard work given the overall constraint.

3)  For the reasons outlined above, there cannot be SEXP type as there is no
    native multithreading.
 
| As the first-touch-principle has to be applied when the memory is allocated, I would rather copy the data from an SEXP to an arma::mat and run the '#pragma' over the arma::mat.  So for allocating I have to pass an arma::mat an array, that is already allocated via first-touch. memptr() should do this work. I hope, though, that it does not copy the array :)

As we said before, that is all fine.  

You can get memory from R, take the SEXP down to C++ level via Rcpp and
inject it into Armadillo types.  The most efficient way is the two-step we
use in the fastLm.cpp example which does all that without copying by invoking
the most appropriate constructor from Armadillo:

    extern "C" SEXP fastLm(SEXP Xs, SEXP ys) {

        try {
            Rcpp::NumericVector yr(ys);                     // creates Rcpp vector from SEXP
            Rcpp::NumericMatrix Xr(Xs);                     // creates Rcpp matrix from SEXP
            int n = Xr.nrow(), k = Xr.ncol();
            arma::mat X(Xr.begin(), n, k, false);           // reuses memory and avoids extra copy
            arma::colvec y(yr.begin(), yr.size(), false);

    [....]

At this point you can go crazy and OpenMP away to your heart's content.  Just
do not call back into R while in an OpenMP threaded context or some kittens
may get killed somewhere.  Or R will likely crash. At least one these is true.

Ok?

Dirk
 
| Best
| 
| Simon
| 
| On Feb 2, 2013, at 4:17 PM, Dirk Eddelbuettel <edd at debian.org> wrote:
| 
| > 
| > Hi Simon,
| > 
| > Welcome to the list.
| > 
| > On 2 February 2013 at 14:47, Simon Zehnder wrote:
| > | Dear Rcpp-Devels,
| > | 
| > | this list was suggested to me by Dirk Eddelbüttel in regard to a question using C++ Extensions in relation with the Armadillo library. 
| > | 
| > | At first I have to make compliments to the developers of Rcpp/RcppArmadillo. Dirk, Francois, this is a marvelous work! As someone programming a lot in C++ and using R Extensions regularly, it is cleaning away all this cumbersome programming dirt connected to SEXPs. 
| > 
| > Thanks.
| > 
| > | There are still remaining questions for me, which can be surely answered by the subscribers to this list: 
| > | 
| > | 1. I saw the Rcpp.package.skeleton function and I ask myself, if a similar offer is made for RcppArmadillo for including automatically the Armadillo library in a package? 
| > 
| > Sure -- there is RcppArmadillo.package.skeleton() in the RcppArmadillo
| > package. Dito for RcppGSL and RcppEigen.
| > 
| > | 2. If I want to use Armadillo solely in the C++ files, not via inline functions as shown in the RcppArmadillo paper, should I use solely the Rcpp package? No, right? The RcppArmadillo package provides objects wrappers for Armadillo objects to be passed to R? 
| > 
| > If you want just C++ from R, use Rcpp.
| > 
| > If you want C++ and Armadillo from R, use RcppArmadillo. It depends on Rcpp
| > and brings it in.
| > 
| > Etc pp for other libraries. Rcpp and RcppArmadillo should provide the
| > scaffolding to build upon. 
| > 
| > | 3. My package is based on S4 classes and I saw the S4 class wrapping in the Rcpp package. I miss an example on this. Can you refer to any document or website for this issue? 
| > 
| > Look at the Rcpp documentation, including the recently introduced Rcpp
| > Gallery at http://gallery.rcpp.org for some examples.  I don't use S4 all
| > that much so I do not write many examples, but eg in RcppEigen context a few
| > more are found (as some of that work was motivated bty sparse matrices which
| > already have an S4 representation in R).
| > 
| > | 4. Further: What is your experience regarding performance with S4 classes and OOP in C++: Does it make a difference mapping the S4 class to a struct in C++ or using directly the attributes of the S4 class (like vectors, etc.) as Armadillo vectors etc. in C++?
| > | As I work a lot on the HPC in Aachen/Germany together with some of the contributors to the OpenMP API, I am highly influenced by the parsimonious approach, i.e. use only basic objects in C++ to get high performance (although I know, that one of the main work now in the OpenMP API is the extension to complex/user-defined objects inside the #pragmas). 
| > 
| > As I said, I do not use S4 all that much. But I am in favour of OOP :)
| > 
| > | 5. Using OpenMP with RcppArmadillo: Up to now I used almost exclusively the Scythe Statistical Library (http://scythe.wustl.edu), which is pretty fast. I encountered lately problems with it, using parallel computing (OpenMP). Also important in this regard is the possibility to apply a 'first-touch-principle' where all my approaches failed in Scythe, due to the object structure. 
| > | Now, I would like to use RccpArmadillo with OpenMP and 
| > | 		a) I want to get best performance: how would I proceed?
| > 
| > Just do it. We have OpenMP examples in the Rcpp docs. Ensure you have locks
| > from R, do not call back, and it tends to just work to be "multithreaded in
| > chunks" at the C++ level..
| > 
| > Scythe was very important and a first C++ library for R when it came out. I
| > could imagine that Armadillo is faster, but I have not seen comparisons.
| > 	
| > | 		b) I want to apply the 'first-touch-principle': where do I apply it? What is the internal data structure in Rcpp-/RcppArmadillo-Objects that allocates the memory?
| > 
| > It's whatever you would do with native R objects at the C level, only easier :)
| > 
| > | I am very excited now to start work with the Rcpp/RcppArmadillo package in my own one, which is at least planned to be pushed to CRAN one day. 
| > 
| > We look forward to your contributions.  These are open projects, so if you
| > can think of patches to code or documentation, please let us know.
| > 
| > | I am looking forward to your answers
| > 
| > Hope this helps.
| > 
| > Dirk
| > 
| > | 
| > | Best
| > | 
| > | Simon 
| > | _______________________________________________
| > | Rcpp-devel mailing list
| > | Rcpp-devel at lists.r-forge.r-project.org
| > | https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
| > 
| > -- 
| > Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com  
| 

-- 
Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com  


More information about the Rcpp-devel mailing list