[Rcpp-devel] CppBugs vs WinBugs

Watson, Samuel S.I.Watson at warwick.ac.uk
Mon Dec 19 19:14:57 CET 2011


I changed the options for cxxfunction() in R using:

 settings=getPlugin("RcppArmadillo")
 settings$env$PKG_CXXFLAGS=paste('-O2',settings$env$PKG_CXXFLAGS,sep=' ')
 radon.model <- cxxfunction(signature(GR="numeric", lev="numeric",basm="numeric",iterations="integer",burn="integer",adapt="adapt",thin="integer"), body=src, include=model, plugin="RcppArmadillo",verbose=F,settings=settings)

This didn’t make any difference at all. 

For the linear model example you provide the difference for me for winbugs vs cppbugs is 40s vs 10s which is a big difference.
I will test some more models and hopefully this model is just an anomaly for me.

Thanks for a great package!

Sam

-----Original Message-----
From: Whit Armstrong [mailto:armstrong.whit at gmail.com] 
Sent: 19 December 2011 16:15
To: Watson, Samuel
Cc: rcpp-devel at r-forge.wu-wien.ac.at
Subject: Re: [Rcpp-devel] CppBugs vs WinBugs

Not sure.

You might have a look at what compiler options are used for the inline cpp function.

If you are not using O2, then that could be a big difference, but I'm not sure it would drop the time from 10s to 2.8s.

-Whit


On Mon, Dec 19, 2011 at 11:04 AM, Watson, Samuel <S.I.Watson at warwick.ac.uk> wrote:
> Hi Whit,
>
> Many thanks for your reply, my apologies for those mistakes in that code, I actually had just noticed them myself.
> I amended the mistakes you mentioned, for example I have included the 
> line
>
> group -= 1;
>
> in the run.model.cpp function.
>
> Here are the results with your suggested changes:
>
> system.time(radon.test<-jags(radon.data,radon.inits,radon.param,model.
> file=model.file,n.chains=1,n.iter=20000,working.directory=getwd(),n.th
> in=5))
>   user  system elapsed
>  11.03    0.00   11.12
>
> system.time(res<-radon.model(GR=county22,lev=log.radon2,basm=floor2,it
> erations=1e4L,burn=1e4L,adapt=1e3L,thin=5L))
>   user  system elapsed
>  10.86    0.00   10.86
>
> If I set adapt=2e2L I still get 9.14 seconds. The model is definitely producing the correct answer (as compared to WinBUGS and jags), so I can't work out why I can't an equivalent speed to you.
> Would you have any other recommendations of things that could be slowing the model down?
>
> I am running Windows 7 Enterprise on Intel Core i7-2600 @ 3.4 GHz w/ 
> 8GB DDR3 RAM
>
> Many thanks,
> Sam Watson
>
> -----Original Message-----
> From: Whit Armstrong [mailto:armstrong.whit at gmail.com]
> Sent: 19 December 2011 15:23
> To: Watson, Samuel
> Cc: rcpp-devel at r-forge.wu-wien.ac.at
> Subject: Re: [Rcpp-devel] CppBugs vs WinBugs
>
> You have the model specified wrong.
>
> in your wrapper function you call:
>> RadonVaryingInterceptModel m(group,level,basement,N,N_counties);
>
> However, the constructor requires the 'level' variable to be in the first position and group in the third position :
>>  RadonVaryingInterceptModel(const vec& level_, const vec& basement_, const mat& group_,int N_, int N_counties_):
>
> Additionally, when you pass the 'group' variable it has to be 0 indexed instead of 1 indexed.  So, had it been in the correct position, you would have corrupted memory when you ran the program.
>
> One additional thing I changed was the 'adapt' variable.  I set it to
> 1000 instead of 2000.  The adapt phase is very slow because it perturbs each node individually and then estimates the impact to the acceptance ratio.
>
> In any case, after fixing these issues, these are the results I get (using jags instead of winbugs):
>
>
>            user.self sys.self   elapsed user.child sys.child jags.time    
> 16.461000    0.064 16.586000          0         0 cppbugs.time  
> 2.416000    0.000  2.420000          0         0
>              6.813328      Inf  6.853719        NaN       NaN
>>
>
>
> Which I think it more typical of the speedup I would expect.  The front page of my github project was written over a year ago, and I don't even remember which example had a 100x speedup.
>
> I still think you could see a 20x speedup on a large dataset.  CppBugs uses Armadillo as the backend, so on a large dataset, you can speed up the linear algebra significantly if you use a multi-threaded blas (and your dataset is large enough to benefit from it).  Most cases of using multithreaded linear algebra on the R list are actually worse for small datasets, and I'm sure the same would be true in this case.
>
> I'll post this revised code up to my github, so you can see the changes I made.
>
> One last thing.  CppBugs is still under rapid development, even though you don't see the commits on the public branch.  A lot is changing on the backend, and I want to make it stable before it's released.  The closure features in C++0x make it much easier to define cppbugs models without needing to declare a new class.
>
> -Whit
>
>
>
> On Mon, Dec 19, 2011 at 8:10 AM, Watson, Samuel <S.I.Watson at warwick.ac.uk> wrote:
>> The radon model is as follows (I haven’t changed the linear model posted on Github), I slightly altered the radon model from Github which is marked inline below:
>>
>> BUGS code (to be saved in WinBugs working directory as "radon.bug"):
>>
>> model {
>>  for (i in 1:919){
>>    log.radon[i] ~ dnorm (y.hat[i], tau.y)
>>    y.hat[i] <- a[county2[i]] + b*floor[i]
>>  }
>>  b ~ dnorm (0, .0001)
>>  tau.y <- pow(sigma.y, -2)
>>  sigma.y ~ dunif (0, 100)
>>
>>  for (j in 1:85){
>>    a[j] ~ dnorm (mu.a, tau.a)
>>  }
>>  mu.a ~ dnorm (0, .0001)
>>  tau.a <- pow(sigma.a, -2)
>>  sigma.a ~ dunif (0, 100)
>> }
>> -------------------------------------
>> R code:
>>
>> require(inline)
>> require(Rcpp)
>> require(R2WinBUGS)
>>
>> model<-'
>> #include <iostream>
>> #include <fstream>
>> #include <vector>
>> #include <string>
>> #include <algorithm>
>> #include <cmath>
>> #include <armadillo>
>> #include <boost/random.hpp>
>> #include <boost/algorithm/string.hpp> #include <cppbugs/cppbugs.hpp>
>>
>> using namespace arma;
>> using namespace cppbugs;
>> using std::vector;
>> using std::string;
>> using std::cout;
>> using std::endl;
>> using std::ifstream;
>>
>> class RadonVaryingInterceptModel: public MCModel {
>> public:
>>  const vec& level;
>>  const vec& basement;
>>  const mat& group;
>>  int N, N_counties;
>>  mat indicator_matrix; //indicator matrix used for county level 
>> random effects
>>
>>
>>  Normal<vec> a;
>>  Normal<double> b;
>>  Deterministic<double> tau_y;
>>  Uniform<double> sigma_y;
>>  Normal<double> mu_a;
>>  Deterministic<double> tau_a;
>>  Uniform<double> sigma_a;
>>  Deterministic<mat> y_hat;
>>  Normal<mat> likelihood;
>>
>>  RadonVaryingInterceptModel(const vec& level_, const vec& basement_, const mat& group_,int N_, int N_counties_):
>>    level(level_),basement(basement_),group(group_),
>>    a(randn<vec>(N_counties_)), b(0),N(N_),N_counties(N_counties_),
>>    indicator_matrix(N,N_counties),
>>    tau_y(1),sigma_y(1),mu_a(0),tau_a(1),sigma_a(1),
>>    y_hat(randn<mat>(level_.n_rows,1)),likelihood(level_,true)
>>
>>  {
>>    indicator_matrix.fill(0.0);
>>    for(int i=0;i<group.n_elem;i++){
>>    indicator_matrix(i,group[i])=1.0;
>>    }
>>    add(a);
>>    add(b);
>>    add(tau_y);
>>    add(sigma_y);
>>    add(mu_a);
>>    add(tau_a);
>>    add(sigma_a);
>>    add(y_hat);
>>    add(likelihood);
>>  }
>>
>>  void update() {
>>    y_hat.value = indicator_matrix*a.value + b.value*basement;
>>    tau_y.value = pow(sigma_y.value, -2.0);
>>    tau_a.value = pow(sigma_a.value, -2.0);
>>    a.dnorm(mu_a.value, tau_a.value);
>>    b.dnorm(0, 0.0001);
>>    sigma_y.dunif(0, 100);
>>    mu_a.dnorm(0, 0.0001);
>>    sigma_a.dunif(0, 100);
>>    likelihood.dnorm(y_hat.value,tau_y.value);
>>  }
>> };
>> '
>> Src<-'
>>
>> mat group = Rcpp::as<arma::mat>(GR);
>> vec level = Rcpp::as<arma::vec>(lev); vec basement = 
>> Rcpp::as<arma::vec>(basm); int N = 919; int N_counties = 85; int 
>> iterations_ = as<int>(iterations); int burn_ = as<int>(burn); int 
>> adapt_ = as<int>(adapt); int thin_ = as<int>(thin);
>>
>> RadonVaryingInterceptModel m(group,level,basement,N,N_counties);
>> m.sample(iterations_, burn_, adapt_, thin_);
>>
>> return Rcpp::List::create(Rcpp::Named("b", m.b.mean()), 
>> Rcpp::Named("ar", m.acceptance_ratio()), Rcpp::Named("a", 
>> m.a.mean())); '
>> radon.model <- cxxfunction(signature(GR="numeric",
>> lev="numeric",basm="numeric",iterations="integer",burn="integer",adap
>> t ="adapt",thin="integer"), body=src, include=model,
>> plugin="RcppArmadillo",verbose=F)
>>
>> df<-read.table("http://www.stat.columbia.edu/~gelman/arm/examples/rad
>> o n/srrs2.dat", header=T, sep=",") df.mn <- df[df[,2]=="MN",] radon 
>> <- df.mn$activity log.radon <- log (ifelse (radon==0, .1, radon)) 
>> floor
>> <- df.mn$floor
>>
>> # get county index variable
>> county.name <- as.vector(df.mn$county) uniq <- unique(county.name) J
>> <- length(uniq) county <- rep (NA, J) for (i in 1:J){
>>  county[county.name==uniq[i]] <- i
>> }
>>
>> system.time(radon.model(GR=county22,lev=log.radon2,basm=floor2,iterat
>> i
>> ons=1e4L,burn=1e4L,adapt=2e3L,thin=5L))
>>
>> #its necessary to change the structure of the data files or else 
>> winbugs has an error
>> log.radon<-as.numeric(log.radon)
>> floor<-as.numeric(floor)
>> county<-as.numeric(county)
>> radon.data<-list(log.radon=log.radon,floor=floor,county=county)
>> radon.param<-c("a","b")
>> radon.inits<-function(){list(a=rnorm(85),b=rnorm(1),sigma.a=runif(1),
>> s
>> igma.y=runif(1),mu.a=rnorm(1)}
>>
>> #n.burnin = n.iter/2 for bugs
>> system.time(radon.test<-bugs(radon.data,radon.inits,radon.param,model.
>> file="radon.bug",n.chains=1,n.iter=20000,working.directory=getwd(),bu
>> g
>> s.directory="D:/R/WinBUGS14",n.thin=5))
>>
>> ---------------------------------------------------------------------
>> -
>> - The difference I get is 16.53s vs 13.85s
>>
>> Kind regards,
>> Sam Watson
>>
>> -----Original Message-----
>> From: Whit Armstrong [mailto:armstrong.whit at gmail.com]
>> Sent: 19 December 2011 12:39
>> To: Watson, Samuel
>> Cc: rcpp-devel at r-forge.wu-wien.ac.at
>> Subject: Re: [Rcpp-devel] CppBugs vs WinBugs
>>
>> post your code.
>>
>> The speedup depends on  your model.  My comparisons were based on R/JAGS and pymc on linux.
>>
>> -Whit
>>
>>
>> On Mon, Dec 19, 2011 at 7:30 AM, Watson, Samuel <S.I.Watson at warwick.ac.uk> wrote:
>>> I am currently using CppBugs with Rcpp through R. I am very 
>>> interested to use CppBugs as I am finding WinBugs to be 
>>> prohibitively slow, I use large amounts of data in large multilevel 
>>> models, so when I found cppbugs I was excited. It says on the Github 
>>> page for cppbugs that I can achieve speeds of 20-100x faster than winbugs.
>>>
>>> I have been testing some examples: for the linear model example that 
>>> is provided at Github, the time difference is about 2x
>>>
>>> I also tested the Radon model with Rcpp and Andrew Gelmans data and 
>>> for
>>> 10000 iterations and 10000 burnin with a thinning parameter of 5 the 
>>> difference is only 16 seconds vs 13 seconds.
>>>
>>>
>>>
>>> I can run multiple chains in parallel through R with the ‘snow’
>>> package,
>>> cxxfunction() and package.skeleton() and this does seem to provide a 
>>> little bit of a boost (parallel winbugs vs parallel cppbugs). But 
>>> nothing greater than 4x.
>>>
>>>
>>>
>>> Is it possible to achieve the kind of difference mentioned on Whit 
>>> Armstrong’s page for cppbugs with R?
>>>
>>>
>>>
>>> I am happy to post all the code if required.
>>>
>>>
>>>
>>> Many thanks
>>>
>>> Sam Watson
>>>
>>>
>>> _______________________________________________
>>> Rcpp-devel mailing list
>>> Rcpp-devel at lists.r-forge.r-project.org
>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-de
>>> v
>>> e
>>> l


More information about the Rcpp-devel mailing list