[Rcpp-devel] CppBugs vs WinBugs
Watson, Samuel
S.I.Watson at warwick.ac.uk
Mon Dec 19 19:14:57 CET 2011
I changed the options for cxxfunction() in R using:
settings=getPlugin("RcppArmadillo")
settings$env$PKG_CXXFLAGS=paste('-O2',settings$env$PKG_CXXFLAGS,sep=' ')
radon.model <- cxxfunction(signature(GR="numeric", lev="numeric",basm="numeric",iterations="integer",burn="integer",adapt="adapt",thin="integer"), body=src, include=model, plugin="RcppArmadillo",verbose=F,settings=settings)
This didn’t make any difference at all.
For the linear model example you provide the difference for me for winbugs vs cppbugs is 40s vs 10s which is a big difference.
I will test some more models and hopefully this model is just an anomaly for me.
Thanks for a great package!
Sam
-----Original Message-----
From: Whit Armstrong [mailto:armstrong.whit at gmail.com]
Sent: 19 December 2011 16:15
To: Watson, Samuel
Cc: rcpp-devel at r-forge.wu-wien.ac.at
Subject: Re: [Rcpp-devel] CppBugs vs WinBugs
Not sure.
You might have a look at what compiler options are used for the inline cpp function.
If you are not using O2, then that could be a big difference, but I'm not sure it would drop the time from 10s to 2.8s.
-Whit
On Mon, Dec 19, 2011 at 11:04 AM, Watson, Samuel <S.I.Watson at warwick.ac.uk> wrote:
> Hi Whit,
>
> Many thanks for your reply, my apologies for those mistakes in that code, I actually had just noticed them myself.
> I amended the mistakes you mentioned, for example I have included the
> line
>
> group -= 1;
>
> in the run.model.cpp function.
>
> Here are the results with your suggested changes:
>
> system.time(radon.test<-jags(radon.data,radon.inits,radon.param,model.
> file=model.file,n.chains=1,n.iter=20000,working.directory=getwd(),n.th
> in=5))
> user system elapsed
> 11.03 0.00 11.12
>
> system.time(res<-radon.model(GR=county22,lev=log.radon2,basm=floor2,it
> erations=1e4L,burn=1e4L,adapt=1e3L,thin=5L))
> user system elapsed
> 10.86 0.00 10.86
>
> If I set adapt=2e2L I still get 9.14 seconds. The model is definitely producing the correct answer (as compared to WinBUGS and jags), so I can't work out why I can't an equivalent speed to you.
> Would you have any other recommendations of things that could be slowing the model down?
>
> I am running Windows 7 Enterprise on Intel Core i7-2600 @ 3.4 GHz w/
> 8GB DDR3 RAM
>
> Many thanks,
> Sam Watson
>
> -----Original Message-----
> From: Whit Armstrong [mailto:armstrong.whit at gmail.com]
> Sent: 19 December 2011 15:23
> To: Watson, Samuel
> Cc: rcpp-devel at r-forge.wu-wien.ac.at
> Subject: Re: [Rcpp-devel] CppBugs vs WinBugs
>
> You have the model specified wrong.
>
> in your wrapper function you call:
>> RadonVaryingInterceptModel m(group,level,basement,N,N_counties);
>
> However, the constructor requires the 'level' variable to be in the first position and group in the third position :
>> RadonVaryingInterceptModel(const vec& level_, const vec& basement_, const mat& group_,int N_, int N_counties_):
>
> Additionally, when you pass the 'group' variable it has to be 0 indexed instead of 1 indexed. So, had it been in the correct position, you would have corrupted memory when you ran the program.
>
> One additional thing I changed was the 'adapt' variable. I set it to
> 1000 instead of 2000. The adapt phase is very slow because it perturbs each node individually and then estimates the impact to the acceptance ratio.
>
> In any case, after fixing these issues, these are the results I get (using jags instead of winbugs):
>
>
> user.self sys.self elapsed user.child sys.child jags.time
> 16.461000 0.064 16.586000 0 0 cppbugs.time
> 2.416000 0.000 2.420000 0 0
> 6.813328 Inf 6.853719 NaN NaN
>>
>
>
> Which I think it more typical of the speedup I would expect. The front page of my github project was written over a year ago, and I don't even remember which example had a 100x speedup.
>
> I still think you could see a 20x speedup on a large dataset. CppBugs uses Armadillo as the backend, so on a large dataset, you can speed up the linear algebra significantly if you use a multi-threaded blas (and your dataset is large enough to benefit from it). Most cases of using multithreaded linear algebra on the R list are actually worse for small datasets, and I'm sure the same would be true in this case.
>
> I'll post this revised code up to my github, so you can see the changes I made.
>
> One last thing. CppBugs is still under rapid development, even though you don't see the commits on the public branch. A lot is changing on the backend, and I want to make it stable before it's released. The closure features in C++0x make it much easier to define cppbugs models without needing to declare a new class.
>
> -Whit
>
>
>
> On Mon, Dec 19, 2011 at 8:10 AM, Watson, Samuel <S.I.Watson at warwick.ac.uk> wrote:
>> The radon model is as follows (I haven’t changed the linear model posted on Github), I slightly altered the radon model from Github which is marked inline below:
>>
>> BUGS code (to be saved in WinBugs working directory as "radon.bug"):
>>
>> model {
>> for (i in 1:919){
>> log.radon[i] ~ dnorm (y.hat[i], tau.y)
>> y.hat[i] <- a[county2[i]] + b*floor[i]
>> }
>> b ~ dnorm (0, .0001)
>> tau.y <- pow(sigma.y, -2)
>> sigma.y ~ dunif (0, 100)
>>
>> for (j in 1:85){
>> a[j] ~ dnorm (mu.a, tau.a)
>> }
>> mu.a ~ dnorm (0, .0001)
>> tau.a <- pow(sigma.a, -2)
>> sigma.a ~ dunif (0, 100)
>> }
>> -------------------------------------
>> R code:
>>
>> require(inline)
>> require(Rcpp)
>> require(R2WinBUGS)
>>
>> model<-'
>> #include <iostream>
>> #include <fstream>
>> #include <vector>
>> #include <string>
>> #include <algorithm>
>> #include <cmath>
>> #include <armadillo>
>> #include <boost/random.hpp>
>> #include <boost/algorithm/string.hpp> #include <cppbugs/cppbugs.hpp>
>>
>> using namespace arma;
>> using namespace cppbugs;
>> using std::vector;
>> using std::string;
>> using std::cout;
>> using std::endl;
>> using std::ifstream;
>>
>> class RadonVaryingInterceptModel: public MCModel {
>> public:
>> const vec& level;
>> const vec& basement;
>> const mat& group;
>> int N, N_counties;
>> mat indicator_matrix; //indicator matrix used for county level
>> random effects
>>
>>
>> Normal<vec> a;
>> Normal<double> b;
>> Deterministic<double> tau_y;
>> Uniform<double> sigma_y;
>> Normal<double> mu_a;
>> Deterministic<double> tau_a;
>> Uniform<double> sigma_a;
>> Deterministic<mat> y_hat;
>> Normal<mat> likelihood;
>>
>> RadonVaryingInterceptModel(const vec& level_, const vec& basement_, const mat& group_,int N_, int N_counties_):
>> level(level_),basement(basement_),group(group_),
>> a(randn<vec>(N_counties_)), b(0),N(N_),N_counties(N_counties_),
>> indicator_matrix(N,N_counties),
>> tau_y(1),sigma_y(1),mu_a(0),tau_a(1),sigma_a(1),
>> y_hat(randn<mat>(level_.n_rows,1)),likelihood(level_,true)
>>
>> {
>> indicator_matrix.fill(0.0);
>> for(int i=0;i<group.n_elem;i++){
>> indicator_matrix(i,group[i])=1.0;
>> }
>> add(a);
>> add(b);
>> add(tau_y);
>> add(sigma_y);
>> add(mu_a);
>> add(tau_a);
>> add(sigma_a);
>> add(y_hat);
>> add(likelihood);
>> }
>>
>> void update() {
>> y_hat.value = indicator_matrix*a.value + b.value*basement;
>> tau_y.value = pow(sigma_y.value, -2.0);
>> tau_a.value = pow(sigma_a.value, -2.0);
>> a.dnorm(mu_a.value, tau_a.value);
>> b.dnorm(0, 0.0001);
>> sigma_y.dunif(0, 100);
>> mu_a.dnorm(0, 0.0001);
>> sigma_a.dunif(0, 100);
>> likelihood.dnorm(y_hat.value,tau_y.value);
>> }
>> };
>> '
>> Src<-'
>>
>> mat group = Rcpp::as<arma::mat>(GR);
>> vec level = Rcpp::as<arma::vec>(lev); vec basement =
>> Rcpp::as<arma::vec>(basm); int N = 919; int N_counties = 85; int
>> iterations_ = as<int>(iterations); int burn_ = as<int>(burn); int
>> adapt_ = as<int>(adapt); int thin_ = as<int>(thin);
>>
>> RadonVaryingInterceptModel m(group,level,basement,N,N_counties);
>> m.sample(iterations_, burn_, adapt_, thin_);
>>
>> return Rcpp::List::create(Rcpp::Named("b", m.b.mean()),
>> Rcpp::Named("ar", m.acceptance_ratio()), Rcpp::Named("a",
>> m.a.mean())); '
>> radon.model <- cxxfunction(signature(GR="numeric",
>> lev="numeric",basm="numeric",iterations="integer",burn="integer",adap
>> t ="adapt",thin="integer"), body=src, include=model,
>> plugin="RcppArmadillo",verbose=F)
>>
>> df<-read.table("http://www.stat.columbia.edu/~gelman/arm/examples/rad
>> o n/srrs2.dat", header=T, sep=",") df.mn <- df[df[,2]=="MN",] radon
>> <- df.mn$activity log.radon <- log (ifelse (radon==0, .1, radon))
>> floor
>> <- df.mn$floor
>>
>> # get county index variable
>> county.name <- as.vector(df.mn$county) uniq <- unique(county.name) J
>> <- length(uniq) county <- rep (NA, J) for (i in 1:J){
>> county[county.name==uniq[i]] <- i
>> }
>>
>> system.time(radon.model(GR=county22,lev=log.radon2,basm=floor2,iterat
>> i
>> ons=1e4L,burn=1e4L,adapt=2e3L,thin=5L))
>>
>> #its necessary to change the structure of the data files or else
>> winbugs has an error
>> log.radon<-as.numeric(log.radon)
>> floor<-as.numeric(floor)
>> county<-as.numeric(county)
>> radon.data<-list(log.radon=log.radon,floor=floor,county=county)
>> radon.param<-c("a","b")
>> radon.inits<-function(){list(a=rnorm(85),b=rnorm(1),sigma.a=runif(1),
>> s
>> igma.y=runif(1),mu.a=rnorm(1)}
>>
>> #n.burnin = n.iter/2 for bugs
>> system.time(radon.test<-bugs(radon.data,radon.inits,radon.param,model.
>> file="radon.bug",n.chains=1,n.iter=20000,working.directory=getwd(),bu
>> g
>> s.directory="D:/R/WinBUGS14",n.thin=5))
>>
>> ---------------------------------------------------------------------
>> -
>> - The difference I get is 16.53s vs 13.85s
>>
>> Kind regards,
>> Sam Watson
>>
>> -----Original Message-----
>> From: Whit Armstrong [mailto:armstrong.whit at gmail.com]
>> Sent: 19 December 2011 12:39
>> To: Watson, Samuel
>> Cc: rcpp-devel at r-forge.wu-wien.ac.at
>> Subject: Re: [Rcpp-devel] CppBugs vs WinBugs
>>
>> post your code.
>>
>> The speedup depends on your model. My comparisons were based on R/JAGS and pymc on linux.
>>
>> -Whit
>>
>>
>> On Mon, Dec 19, 2011 at 7:30 AM, Watson, Samuel <S.I.Watson at warwick.ac.uk> wrote:
>>> I am currently using CppBugs with Rcpp through R. I am very
>>> interested to use CppBugs as I am finding WinBugs to be
>>> prohibitively slow, I use large amounts of data in large multilevel
>>> models, so when I found cppbugs I was excited. It says on the Github
>>> page for cppbugs that I can achieve speeds of 20-100x faster than winbugs.
>>>
>>> I have been testing some examples: for the linear model example that
>>> is provided at Github, the time difference is about 2x
>>>
>>> I also tested the Radon model with Rcpp and Andrew Gelmans data and
>>> for
>>> 10000 iterations and 10000 burnin with a thinning parameter of 5 the
>>> difference is only 16 seconds vs 13 seconds.
>>>
>>>
>>>
>>> I can run multiple chains in parallel through R with the ‘snow’
>>> package,
>>> cxxfunction() and package.skeleton() and this does seem to provide a
>>> little bit of a boost (parallel winbugs vs parallel cppbugs). But
>>> nothing greater than 4x.
>>>
>>>
>>>
>>> Is it possible to achieve the kind of difference mentioned on Whit
>>> Armstrong’s page for cppbugs with R?
>>>
>>>
>>>
>>> I am happy to post all the code if required.
>>>
>>>
>>>
>>> Many thanks
>>>
>>> Sam Watson
>>>
>>>
>>> _______________________________________________
>>> Rcpp-devel mailing list
>>> Rcpp-devel at lists.r-forge.r-project.org
>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-de
>>> v
>>> e
>>> l
More information about the Rcpp-devel
mailing list