[Rcpp-devel] Input on code design implementing a MCMC algorithm in R and improving performance with RcppArmadillo
Dirk Eddelbuettel
edd at debian.org
Fri Apr 10 14:03:38 CEST 2020
Hi Eva,
Thanks for resending in text, and providing all the detail.
On 10 April 2020 at 11:21, eva lehner wrote:
| I want to „translate“ a MCMC algorithm written in matlab to R to eventually release it in a package. In the matlab version model input, state of the chain and chain output is stored in structs. Updating different parameters of the underlying model is performed with different functions. I already have a very raw R version where these structs are implemented as S4 classes and updates are performed with generic functions taking both classes as input. I would like to increase performance with RcppArmadillo. However, I struggle to choose a good approach since I am not very experienced in any of the above subjects and have difficulties putting together the different concepts. It would be very helpful if you could give me your thoughts on what I think my main options are, tell me in case I misunderstood some concepts or point me into a better direction.
|
| Option 1: mainly R
| Keeping R S4 classes (or rewrite them as Reference Classes since I think they would have been a better choice for defining a class storing states of a model) and the algorithm written in R and write updating functions in RcppArmadillo. These functions can be exposed to R with Rcpp modules or attributes.
| + possibly the easiest version since I only have to rewrite the functions in Rcpp
| - Still a lot of R which could come with performance costs
|
| Option2: mainly Rcpp
| Define one Cpp class which stores input and updated parameters as member variables and the updating functions as member functions. Then I could write the main MCMC algorithm as a separate function that creates an instance of this class, calls the updating functions and returns a list containing the output. I could only expose this main algorithm (with Rcpp modules or attributes) to R and avoid exposing the internal classes. (Currently I don’t think exposure of the classes is necessary and seems more difficult than exposing functions).The part involving the class is taken from the Kalman filter example (RcppArmadillo-intro vignette) and I imagine the structure of the algorithm function and returned list similar to the gallery entry on the EM Algorithm for Probit Regressions (https://gallery.rcpp.org/articles/EM-algorithm-example/#attempt-1[https://deref-gmx.net/mail/client/Q6JdcN6Rp84/dereferrer/?redirectUrl=https%3A%2F%2Fgallery.rcpp.org%2Farticles%2FEM-algorithm-example%2F%23attempt-1]).
| + probably faster
| + possibly cleaner
| - more difficult and I have very little Cpp experience (especially with classes)
|
| Personally, I aim for a compromise between performance, implementation difficulty and the best approach since this is for my master’s thesis and I also have to consider the deadline. I think this compromise would be to stay with the main R version but rewrite the S4 classes as Reference Class.
| I'm looking forward to hearing your feedback, thoughts and experiences,
There is a lot to unpack here, and it is difficult to recommend something in
generalities. You also seem already pretty familiar with R, Rcpp and
RcppArmadillo which is good :)
As for Matlab to Armadillo, I have had pretty good experiences translating
Matlab code to RcppArmadillo on more than occassion. What you describe above
all sounds reasonable (also in terms of deadline).
Maybe a pragmatic approach is simplest: do some timings, find some
bottlenecks and replace this with Rcpp? Or easier still (but less
measurement-guided) start with the innermost loops and rethink / reimplement
those?
It is a little tricky to recommend anything more concrete at this stage.
Cheers, Dirk
--
http://dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org
More information about the Rcpp-devel
mailing list