[Rcpp-devel] CppBugs vs WinBugs

Mon Dec 19 17:04:48 CET 2011

Hi Whit,

Many thanks for your reply, my apologies for those mistakes in that code, I actually had just noticed them myself.
I amended the mistakes you mentioned, for example I have included the line 

group -= 1;

in the run.model.cpp function.

Here are the results with your suggested changes:

system.time(radon.test<-jags(radon.data,radon.inits,radon.param,model.file=model.file,n.chains=1,n.iter=20000,working.directory=getwd(),n.thin=5))
   user  system elapsed 
  11.03    0.00   11.12

system.time(res<-radon.model(GR=county22,lev=log.radon2,basm=floor2,iterations=1e4L,burn=1e4L,adapt=1e3L,thin=5L))
   user  system elapsed 
  10.86    0.00   10.86

If I set adapt=2e2L I still get 9.14 seconds. The model is definitely producing the correct answer (as compared to WinBUGS and jags), so I can't work out why I can't an equivalent speed to you. 
Would you have any other recommendations of things that could be slowing the model down?

I am running Windows 7 Enterprise on Intel Core i7-2600 @ 3.4 GHz w/ 8GB DDR3 RAM

Many thanks,
Sam Watson

-----Original Message-----
From: Whit Armstrong [mailto:armstrong.whit at gmail.com] 
Sent: 19 December 2011 15:23
To: Watson, Samuel
Cc: rcpp-devel at r-forge.wu-wien.ac.at
Subject: Re: [Rcpp-devel] CppBugs vs WinBugs

You have the model specified wrong.

in your wrapper function you call:
> RadonVaryingInterceptModel m(group,level,basement,N,N_counties);

However, the constructor requires the 'level' variable to be in the first position and group in the third position :
>  RadonVaryingInterceptModel(const vec& level_, const vec& basement_, const mat& group_,int N_, int N_counties_):

Additionally, when you pass the 'group' variable it has to be 0 indexed instead of 1 indexed.  So, had it been in the correct position, you would have corrupted memory when you ran the program.

One additional thing I changed was the 'adapt' variable.  I set it to
1000 instead of 2000.  The adapt phase is very slow because it perturbs each node individually and then estimates the impact to the acceptance ratio.

In any case, after fixing these issues, these are the results I get (using jags instead of winbugs):

           user.self sys.self   elapsed user.child sys.child jags.time    16.461000    0.064 16.586000          0         0 cppbugs.time  2.416000    0.000  2.420000          0         0
             6.813328      Inf  6.853719        NaN       NaN
>

Which I think it more typical of the speedup I would expect.  The front page of my github project was written over a year ago, and I don't even remember which example had a 100x speedup.

I still think you could see a 20x speedup on a large dataset.  CppBugs uses Armadillo as the backend, so on a large dataset, you can speed up the linear algebra significantly if you use a multi-threaded blas (and your dataset is large enough to benefit from it).  Most cases of using multithreaded linear algebra on the R list are actually worse for small datasets, and I'm sure the same would be true in this case.

I'll post this revised code up to my github, so you can see the changes I made.

One last thing.  CppBugs is still under rapid development, even though you don't see the commits on the public branch.  A lot is changing on the backend, and I want to make it stable before it's released.  The closure features in C++0x make it much easier to define cppbugs models without needing to declare a new class.

-Whit

On Mon, Dec 19, 2011 at 8:10 AM, Watson, Samuel <S.I.Watson at warwick.ac.uk> wrote:
> The radon model is as follows (I haven’t changed the linear model posted on Github), I slightly altered the radon model from Github which is marked inline below:
>
> BUGS code (to be saved in WinBugs working directory as "radon.bug"):
>
> model {
>  for (i in 1:919){
>    log.radon[i] ~ dnorm (y.hat[i], tau.y)
>    y.hat[i] <- a[county2[i]] + b*floor[i]
>  }
>  b ~ dnorm (0, .0001)
>  tau.y <- pow(sigma.y, -2)
>  sigma.y ~ dunif (0, 100)
>
>  for (j in 1:85){
>    a[j] ~ dnorm (mu.a, tau.a)
>  }
>  mu.a ~ dnorm (0, .0001)
>  tau.a <- pow(sigma.a, -2)
>  sigma.a ~ dunif (0, 100)
> }
> -------------------------------------
> R code:
>
> require(inline)
> require(Rcpp)
> require(R2WinBUGS)
>
> model<-'
> #include <iostream>
> #include <fstream>
> #include <vector>
> #include <string>
> #include <algorithm>
> #include <cmath>
> #include <armadillo>
> #include <boost/random.hpp>
> #include <boost/algorithm/string.hpp>
> #include <cppbugs/cppbugs.hpp>
>
> using namespace arma;
> using namespace cppbugs;
> using std::vector;
> using std::string;
> using std::cout;
> using std::endl;
> using std::ifstream;
>
> class RadonVaryingInterceptModel: public MCModel {
> public:
>  const vec& level;
>  const vec& basement;
>  const mat& group;
>  int N, N_counties;
>  mat indicator_matrix; //indicator matrix used for county level random 
> effects
>
>
>  Normal<vec> a;
>  Normal<double> b;
>  Deterministic<double> tau_y;
>  Uniform<double> sigma_y;
>  Normal<double> mu_a;
>  Deterministic<double> tau_a;
>  Uniform<double> sigma_a;
>  Deterministic<mat> y_hat;
>  Normal<mat> likelihood;
>
>  RadonVaryingInterceptModel(const vec& level_, const vec& basement_, const mat& group_,int N_, int N_counties_):
>    level(level_),basement(basement_),group(group_),
>    a(randn<vec>(N_counties_)), b(0),N(N_),N_counties(N_counties_),
>    indicator_matrix(N,N_counties),
>    tau_y(1),sigma_y(1),mu_a(0),tau_a(1),sigma_a(1),
>    y_hat(randn<mat>(level_.n_rows,1)),likelihood(level_,true)
>
>  {
>    indicator_matrix.fill(0.0);
>    for(int i=0;i<group.n_elem;i++){
>    indicator_matrix(i,group[i])=1.0;
>    }
>    add(a);
>    add(b);
>    add(tau_y);
>    add(sigma_y);
>    add(mu_a);
>    add(tau_a);
>    add(sigma_a);
>    add(y_hat);
>    add(likelihood);
>  }
>
>  void update() {
>    y_hat.value = indicator_matrix*a.value + b.value*basement;
>    tau_y.value = pow(sigma_y.value, -2.0);
>    tau_a.value = pow(sigma_a.value, -2.0);
>    a.dnorm(mu_a.value, tau_a.value);
>    b.dnorm(0, 0.0001);
>    sigma_y.dunif(0, 100);
>    mu_a.dnorm(0, 0.0001);
>    sigma_a.dunif(0, 100);
>    likelihood.dnorm(y_hat.value,tau_y.value);
>  }
> };
> '
> Src<-'
>
> mat group = Rcpp::as<arma::mat>(GR);
> vec level = Rcpp::as<arma::vec>(lev);
> vec basement = Rcpp::as<arma::vec>(basm); int N = 919; int N_counties 
> = 85; int iterations_ = as<int>(iterations); int burn_ = 
> as<int>(burn); int adapt_ = as<int>(adapt); int thin_ = as<int>(thin);
>
> RadonVaryingInterceptModel m(group,level,basement,N,N_counties);
> m.sample(iterations_, burn_, adapt_, thin_);
>
> return Rcpp::List::create(Rcpp::Named("b", m.b.mean()), 
> Rcpp::Named("ar", m.acceptance_ratio()), Rcpp::Named("a", 
> m.a.mean())); '
> radon.model <- cxxfunction(signature(GR="numeric", 
> lev="numeric",basm="numeric",iterations="integer",burn="integer",adapt
> ="adapt",thin="integer"), body=src, include=model, 
> plugin="RcppArmadillo",verbose=F)
>
> df<-read.table("http://www.stat.columbia.edu/~gelman/arm/examples/rado
> n/srrs2.dat", header=T, sep=",") df.mn <- df[df[,2]=="MN",] radon <- 
> df.mn$activity log.radon <- log (ifelse (radon==0, .1, radon)) floor 
> <- df.mn$floor
>
> # get county index variable
> county.name <- as.vector(df.mn$county) uniq <- unique(county.name) J 
> <- length(uniq) county <- rep (NA, J) for (i in 1:J){
>  county[county.name==uniq[i]] <- i
> }
>
> system.time(radon.model(GR=county22,lev=log.radon2,basm=floor2,iterati
> ons=1e4L,burn=1e4L,adapt=2e3L,thin=5L))
>
> #its necessary to change the structure of the data files or else 
> winbugs has an error
> log.radon<-as.numeric(log.radon)
> floor<-as.numeric(floor)
> county<-as.numeric(county)
> radon.data<-list(log.radon=log.radon,floor=floor,county=county)
> radon.param<-c("a","b")
> radon.inits<-function(){list(a=rnorm(85),b=rnorm(1),sigma.a=runif(1),s
> igma.y=runif(1),mu.a=rnorm(1)}
>
> #n.burnin = n.iter/2 for bugs
> system.time(radon.test<-bugs(radon.data,radon.inits,radon.param,model.
> file="radon.bug",n.chains=1,n.iter=20000,working.directory=getwd(),bug
> s.directory="D:/R/WinBUGS14",n.thin=5))
>
> ----------------------------------------------------------------------
> - The difference I get is 16.53s vs 13.85s
>
> Kind regards,
> Sam Watson
>
> -----Original Message-----
> From: Whit Armstrong [mailto:armstrong.whit at gmail.com]
> Sent: 19 December 2011 12:39
> To: Watson, Samuel
> Cc: rcpp-devel at r-forge.wu-wien.ac.at
> Subject: Re: [Rcpp-devel] CppBugs vs WinBugs
>
> post your code.
>
> The speedup depends on  your model.  My comparisons were based on R/JAGS and pymc on linux.
>
> -Whit
>
>
> On Mon, Dec 19, 2011 at 7:30 AM, Watson, Samuel <S.I.Watson at warwick.ac.uk> wrote:
>> I am currently using CppBugs with Rcpp through R. I am very 
>> interested to use CppBugs as I am finding WinBugs to be prohibitively 
>> slow, I use large amounts of data in large multilevel models, so when 
>> I found cppbugs I was excited. It says on the Github page for cppbugs 
>> that I can achieve speeds of 20-100x faster than winbugs.
>>
>> I have been testing some examples: for the linear model example that 
>> is provided at Github, the time difference is about 2x
>>
>> I also tested the Radon model with Rcpp and Andrew Gelmans data and 
>> for
>> 10000 iterations and 10000 burnin with a thinning parameter of 5 the 
>> difference is only 16 seconds vs 13 seconds.
>>
>>
>>
>> I can run multiple chains in parallel through R with the ‘snow’
>> package,
>> cxxfunction() and package.skeleton() and this does seem to provide a 
>> little bit of a boost (parallel winbugs vs parallel cppbugs). But 
>> nothing greater than 4x.
>>
>>
>>
>> Is it possible to achieve the kind of difference mentioned on Whit 
>> Armstrong’s page for cppbugs with R?
>>
>>
>>
>> I am happy to post all the code if required.
>>
>>
>>
>> Many thanks
>>
>> Sam Watson
>>
>>
>> _______________________________________________
>> Rcpp-devel mailing list
>> Rcpp-devel at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-dev
>> e
>> l