[Rcpp-devel] CppBugs vs WinBugs

Whit Armstrong armstrong.whit at gmail.com
Mon Dec 19 16:22:45 CET 2011


You have the model specified wrong.

in your wrapper function you call:
> RadonVaryingInterceptModel m(group,level,basement,N,N_counties);

However, the constructor requires the 'level' variable to be in the
first position and group in the third position :
>  RadonVaryingInterceptModel(const vec& level_, const vec& basement_, const mat& group_,int N_, int N_counties_):

Additionally, when you pass the 'group' variable it has to be 0
indexed instead of 1 indexed.  So, had it been in the correct
position, you would have corrupted memory when you ran the program.

One additional thing I changed was the 'adapt' variable.  I set it to
1000 instead of 2000.  The adapt phase is very slow because it
perturbs each node individually and then estimates the impact to the
acceptance ratio.

In any case, after fixing these issues, these are the results I get
(using jags instead of winbugs):


           user.self sys.self   elapsed user.child sys.child
jags.time    16.461000    0.064 16.586000          0         0
cppbugs.time  2.416000    0.000  2.420000          0         0
             6.813328      Inf  6.853719        NaN       NaN
>


Which I think it more typical of the speedup I would expect.  The
front page of my github project was written over a year ago, and I
don't even remember which example had a 100x speedup.

I still think you could see a 20x speedup on a large dataset.  CppBugs
uses Armadillo as the backend, so on a large dataset, you can speed up
the linear algebra significantly if you use a multi-threaded blas (and
your dataset is large enough to benefit from it).  Most cases of using
multithreaded linear algebra on the R list are actually worse for
small datasets, and I'm sure the same would be true in this case.

I'll post this revised code up to my github, so you can see the changes I made.

One last thing.  CppBugs is still under rapid development, even though
you don't see the commits on the public branch.  A lot is changing on
the backend, and I want to make it stable before it's released.  The
closure features in C++0x make it much easier to define cppbugs models
without needing to declare a new class.

-Whit



On Mon, Dec 19, 2011 at 8:10 AM, Watson, Samuel
<S.I.Watson at warwick.ac.uk> wrote:
> The radon model is as follows (I haven’t changed the linear model posted on Github), I slightly altered the radon model from Github which is marked inline below:
>
> BUGS code (to be saved in WinBugs working directory as "radon.bug"):
>
> model {
>  for (i in 1:919){
>    log.radon[i] ~ dnorm (y.hat[i], tau.y)
>    y.hat[i] <- a[county2[i]] + b*floor[i]
>  }
>  b ~ dnorm (0, .0001)
>  tau.y <- pow(sigma.y, -2)
>  sigma.y ~ dunif (0, 100)
>
>  for (j in 1:85){
>    a[j] ~ dnorm (mu.a, tau.a)
>  }
>  mu.a ~ dnorm (0, .0001)
>  tau.a <- pow(sigma.a, -2)
>  sigma.a ~ dunif (0, 100)
> }
> -------------------------------------
> R code:
>
> require(inline)
> require(Rcpp)
> require(R2WinBUGS)
>
> model<-'
> #include <iostream>
> #include <fstream>
> #include <vector>
> #include <string>
> #include <algorithm>
> #include <cmath>
> #include <armadillo>
> #include <boost/random.hpp>
> #include <boost/algorithm/string.hpp>
> #include <cppbugs/cppbugs.hpp>
>
> using namespace arma;
> using namespace cppbugs;
> using std::vector;
> using std::string;
> using std::cout;
> using std::endl;
> using std::ifstream;
>
> class RadonVaryingInterceptModel: public MCModel {
> public:
>  const vec& level;
>  const vec& basement;
>  const mat& group;
>  int N, N_counties;
>  mat indicator_matrix; //indicator matrix used for county level random effects
>
>
>  Normal<vec> a;
>  Normal<double> b;
>  Deterministic<double> tau_y;
>  Uniform<double> sigma_y;
>  Normal<double> mu_a;
>  Deterministic<double> tau_a;
>  Uniform<double> sigma_a;
>  Deterministic<mat> y_hat;
>  Normal<mat> likelihood;
>
>  RadonVaryingInterceptModel(const vec& level_, const vec& basement_, const mat& group_,int N_, int N_counties_):
>    level(level_),basement(basement_),group(group_),
>    a(randn<vec>(N_counties_)), b(0),N(N_),N_counties(N_counties_),
>    indicator_matrix(N,N_counties),
>    tau_y(1),sigma_y(1),mu_a(0),tau_a(1),sigma_a(1),
>    y_hat(randn<mat>(level_.n_rows,1)),likelihood(level_,true)
>
>  {
>    indicator_matrix.fill(0.0);
>    for(int i=0;i<group.n_elem;i++){
>    indicator_matrix(i,group[i])=1.0;
>    }
>    add(a);
>    add(b);
>    add(tau_y);
>    add(sigma_y);
>    add(mu_a);
>    add(tau_a);
>    add(sigma_a);
>    add(y_hat);
>    add(likelihood);
>  }
>
>  void update() {
>    y_hat.value = indicator_matrix*a.value + b.value*basement;
>    tau_y.value = pow(sigma_y.value, -2.0);
>    tau_a.value = pow(sigma_a.value, -2.0);
>    a.dnorm(mu_a.value, tau_a.value);
>    b.dnorm(0, 0.0001);
>    sigma_y.dunif(0, 100);
>    mu_a.dnorm(0, 0.0001);
>    sigma_a.dunif(0, 100);
>    likelihood.dnorm(y_hat.value,tau_y.value);
>  }
> };
> '
> Src<-'
>
> mat group = Rcpp::as<arma::mat>(GR);
> vec level = Rcpp::as<arma::vec>(lev);
> vec basement = Rcpp::as<arma::vec>(basm);
> int N = 919;
> int N_counties = 85;
> int iterations_ = as<int>(iterations);
> int burn_ = as<int>(burn);
> int adapt_ = as<int>(adapt);
> int thin_ = as<int>(thin);
>
> RadonVaryingInterceptModel m(group,level,basement,N,N_counties);
> m.sample(iterations_, burn_, adapt_, thin_);
>
> return Rcpp::List::create(Rcpp::Named("b", m.b.mean()), Rcpp::Named("ar", m.acceptance_ratio()),
> Rcpp::Named("a", m.a.mean()));
> '
> radon.model <- cxxfunction(signature(GR="numeric", lev="numeric",basm="numeric",iterations="integer",burn="integer",adapt="adapt",thin="integer"), body=src, include=model, plugin="RcppArmadillo",verbose=F)
>
> df<-read.table("http://www.stat.columbia.edu/~gelman/arm/examples/radon/srrs2.dat", header=T, sep=",")
> df.mn <- df[df[,2]=="MN",]
> radon <- df.mn$activity
> log.radon <- log (ifelse (radon==0, .1, radon))
> floor <- df.mn$floor
>
> # get county index variable
> county.name <- as.vector(df.mn$county)
> uniq <- unique(county.name)
> J <- length(uniq)
> county <- rep (NA, J)
> for (i in 1:J){
>  county[county.name==uniq[i]] <- i
> }
>
> system.time(radon.model(GR=county22,lev=log.radon2,basm=floor2,iterations=1e4L,burn=1e4L,adapt=2e3L,thin=5L))
>
> #its necessary to change the structure of the data files or else winbugs has an error
> log.radon<-as.numeric(log.radon)
> floor<-as.numeric(floor)
> county<-as.numeric(county)
> radon.data<-list(log.radon=log.radon,floor=floor,county=county)
> radon.param<-c("a","b")
> radon.inits<-function(){list(a=rnorm(85),b=rnorm(1),sigma.a=runif(1),sigma.y=runif(1),mu.a=rnorm(1)}
>
> #n.burnin = n.iter/2 for bugs
> system.time(radon.test<-bugs(radon.data,radon.inits,radon.param,model.file="radon.bug",n.chains=1,n.iter=20000,working.directory=getwd(),bugs.directory="D:/R/WinBUGS14",n.thin=5))
>
> -----------------------------------------------------------------------
> The difference I get is 16.53s vs 13.85s
>
> Kind regards,
> Sam Watson
>
> -----Original Message-----
> From: Whit Armstrong [mailto:armstrong.whit at gmail.com]
> Sent: 19 December 2011 12:39
> To: Watson, Samuel
> Cc: rcpp-devel at r-forge.wu-wien.ac.at
> Subject: Re: [Rcpp-devel] CppBugs vs WinBugs
>
> post your code.
>
> The speedup depends on  your model.  My comparisons were based on R/JAGS and pymc on linux.
>
> -Whit
>
>
> On Mon, Dec 19, 2011 at 7:30 AM, Watson, Samuel <S.I.Watson at warwick.ac.uk> wrote:
>> I am currently using CppBugs with Rcpp through R. I am very interested
>> to use CppBugs as I am finding WinBugs to be prohibitively slow, I use
>> large amounts of data in large multilevel models, so when I found
>> cppbugs I was excited. It says on the Github page for cppbugs that I
>> can achieve speeds of 20-100x faster than winbugs.
>>
>> I have been testing some examples: for the linear model example that
>> is provided at Github, the time difference is about 2x
>>
>> I also tested the Radon model with Rcpp and Andrew Gelmans data and
>> for
>> 10000 iterations and 10000 burnin with a thinning parameter of 5 the
>> difference is only 16 seconds vs 13 seconds.
>>
>>
>>
>> I can run multiple chains in parallel through R with the ‘snow’
>> package,
>> cxxfunction() and package.skeleton() and this does seem to provide a
>> little bit of a boost (parallel winbugs vs parallel cppbugs). But
>> nothing greater than 4x.
>>
>>
>>
>> Is it possible to achieve the kind of difference mentioned on Whit
>> Armstrong’s page for cppbugs with R?
>>
>>
>>
>> I am happy to post all the code if required.
>>
>>
>>
>> Many thanks
>>
>> Sam Watson
>>
>>
>> _______________________________________________
>> Rcpp-devel mailing list
>> Rcpp-devel at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-deve
>> l


More information about the Rcpp-devel mailing list