[Rcpp-devel] CppBugs vs WinBugs
Whit Armstrong
armstrong.whit at gmail.com
Mon Dec 19 20:02:59 CET 2011
> If its of interest I can post any code I get from this too
yes, please.
> Are you planning on developing a way of specifying initial values for each Markov chain?
Well, you can do that in the constructor. Each stochastic object
takes an argument of the underlying type as the initial value.
For most of my examples, I simply init to randn<mat>(nrow,ncol), but
you can use explicit values as well.
For multiple chains, you can call sample again, but cppbugs will
continue to use the same vector as storage. So, you can either copy
and clear the vector after each run, or simply use the iterations/thin
to determine the boundary between separate chains.
For the dev version of cppbugs, you will not need to write a model
object, so you can simply initialize all the arma objects to the
values you want.
Here is quick example of what the linear model will look like under
the new setup:
Normal<vec> b(randn<vec>(2)); // the initial value is: randn<vec>(2)
Deterministic<mat> y_hat;
Deterministic<double> rsq;
Uniform<double> tau_y(1); // initial value is 1.0
Normal<mat> likelihood(y,true);
b.dnorm(0.0, 0.0001);
tau_y.dunif(0.,100.);
likelihood.dnorm(y_hat,tau_y);
// model now declared as a closure (std::function):
std::function<void ()> model = [&]() {
y_hat = X * b.value;
rsq = as_scalar(1 - var(y - y_hat.value) / var(y));
};
// model declaration still needs a list of the mcmc objects
MCModel m({&b, &y_hat, &tau_y, &likelihood, &rsq},model);
int iterations = 1e5;
m.sample(iterations, 1e4, 1e4, 10);
-Whit
On Mon, Dec 19, 2011 at 1:33 PM, Watson, Samuel
<S.I.Watson at warwick.ac.uk> wrote:
> the 2x was before I changed the adapt parameter as per your advice.
>
> I am getting the cppbugs to return the whole history of iterations list at the moment, from different chains running in parallel using the 'snow' package in R, as I need to perform convergence diagnostics on the chains, and keep running if necessary, something I know only how to do in R.
> If its of interest I can post any code I get from this too
> Are you planning on developing a way of specifying initial values for each Markov chain?
>
> Sam
> -----Original Message-----
> From: Whit Armstrong [mailto:armstrong.whit at gmail.com]
> Sent: 19 December 2011 18:24
> To: Watson, Samuel
> Cc: rcpp-devel at r-forge.wu-wien.ac.at
> Subject: Re: [Rcpp-devel] CppBugs vs WinBugs
>
> Oh, that's great. I thought you said the linear model was only showing 2x in your first email.
>
> Glad you are finding it useful. Eventually, one should be able to specify the model directly in R, which should make a huge difference in ease of use. But I'm not quite there yet. I have more backend changes to make before I can get to the user interface improvements.
>
> Please let me know if you find bugs. Or file them here:
> https://github.com/armstrtw/CppBugs/issues?sort=created&direction=asc&_pjax=true&state=open
>
> Cheers,
> Whit
>
>
> On Mon, Dec 19, 2011 at 1:14 PM, Watson, Samuel <S.I.Watson at warwick.ac.uk> wrote:
>> I changed the options for cxxfunction() in R using:
>>
>> settings=getPlugin("RcppArmadillo")
>> settings$env$PKG_CXXFLAGS=paste('-O2',settings$env$PKG_CXXFLAGS,sep='
>> ')
>> radon.model <- cxxfunction(signature(GR="numeric",
>> lev="numeric",basm="numeric",iterations="integer",burn="integer",adapt
>> ="adapt",thin="integer"), body=src, include=model,
>> plugin="RcppArmadillo",verbose=F,settings=settings)
>>
>> This didn’t make any difference at all.
>>
>> For the linear model example you provide the difference for me for winbugs vs cppbugs is 40s vs 10s which is a big difference.
>> I will test some more models and hopefully this model is just an anomaly for me.
>>
>> Thanks for a great package!
>>
>> Sam
>>
>> -----Original Message-----
>> From: Whit Armstrong [mailto:armstrong.whit at gmail.com]
>> Sent: 19 December 2011 16:15
>> To: Watson, Samuel
>> Cc: rcpp-devel at r-forge.wu-wien.ac.at
>> Subject: Re: [Rcpp-devel] CppBugs vs WinBugs
>>
>> Not sure.
>>
>> You might have a look at what compiler options are used for the inline cpp function.
>>
>> If you are not using O2, then that could be a big difference, but I'm not sure it would drop the time from 10s to 2.8s.
>>
>> -Whit
>>
>>
>> On Mon, Dec 19, 2011 at 11:04 AM, Watson, Samuel <S.I.Watson at warwick.ac.uk> wrote:
>>> Hi Whit,
>>>
>>> Many thanks for your reply, my apologies for those mistakes in that code, I actually had just noticed them myself.
>>> I amended the mistakes you mentioned, for example I have included the
>>> line
>>>
>>> group -= 1;
>>>
>>> in the run.model.cpp function.
>>>
>>> Here are the results with your suggested changes:
>>>
>>> system.time(radon.test<-jags(radon.data,radon.inits,radon.param,model.
>>> file=model.file,n.chains=1,n.iter=20000,working.directory=getwd(),n.t
>>> h
>>> in=5))
>>> user system elapsed
>>> 11.03 0.00 11.12
>>>
>>> system.time(res<-radon.model(GR=county22,lev=log.radon2,basm=floor2,i
>>> t
>>> erations=1e4L,burn=1e4L,adapt=1e3L,thin=5L))
>>> user system elapsed
>>> 10.86 0.00 10.86
>>>
>>> If I set adapt=2e2L I still get 9.14 seconds. The model is definitely producing the correct answer (as compared to WinBUGS and jags), so I can't work out why I can't an equivalent speed to you.
>>> Would you have any other recommendations of things that could be slowing the model down?
>>>
>>> I am running Windows 7 Enterprise on Intel Core i7-2600 @ 3.4 GHz w/
>>> 8GB DDR3 RAM
>>>
>>> Many thanks,
>>> Sam Watson
>>>
>>> -----Original Message-----
>>> From: Whit Armstrong [mailto:armstrong.whit at gmail.com]
>>> Sent: 19 December 2011 15:23
>>> To: Watson, Samuel
>>> Cc: rcpp-devel at r-forge.wu-wien.ac.at
>>> Subject: Re: [Rcpp-devel] CppBugs vs WinBugs
>>>
>>> You have the model specified wrong.
>>>
>>> in your wrapper function you call:
>>>> RadonVaryingInterceptModel m(group,level,basement,N,N_counties);
>>>
>>> However, the constructor requires the 'level' variable to be in the first position and group in the third position :
>>>> RadonVaryingInterceptModel(const vec& level_, const vec& basement_, const mat& group_,int N_, int N_counties_):
>>>
>>> Additionally, when you pass the 'group' variable it has to be 0 indexed instead of 1 indexed. So, had it been in the correct position, you would have corrupted memory when you ran the program.
>>>
>>> One additional thing I changed was the 'adapt' variable. I set it to
>>> 1000 instead of 2000. The adapt phase is very slow because it perturbs each node individually and then estimates the impact to the acceptance ratio.
>>>
>>> In any case, after fixing these issues, these are the results I get (using jags instead of winbugs):
>>>
>>>
>>> user.self sys.self elapsed user.child sys.child
>>> jags.time
>>> 16.461000 0.064 16.586000 0 0 cppbugs.time
>>> 2.416000 0.000 2.420000 0 0
>>> 6.813328 Inf 6.853719 NaN NaN
>>>>
>>>
>>>
>>> Which I think it more typical of the speedup I would expect. The front page of my github project was written over a year ago, and I don't even remember which example had a 100x speedup.
>>>
>>> I still think you could see a 20x speedup on a large dataset. CppBugs uses Armadillo as the backend, so on a large dataset, you can speed up the linear algebra significantly if you use a multi-threaded blas (and your dataset is large enough to benefit from it). Most cases of using multithreaded linear algebra on the R list are actually worse for small datasets, and I'm sure the same would be true in this case.
>>>
>>> I'll post this revised code up to my github, so you can see the changes I made.
>>>
>>> One last thing. CppBugs is still under rapid development, even though you don't see the commits on the public branch. A lot is changing on the backend, and I want to make it stable before it's released. The closure features in C++0x make it much easier to define cppbugs models without needing to declare a new class.
>>>
>>> -Whit
>>>
>>>
>>>
>>> On Mon, Dec 19, 2011 at 8:10 AM, Watson, Samuel <S.I.Watson at warwick.ac.uk> wrote:
>>>> The radon model is as follows (I haven’t changed the linear model posted on Github), I slightly altered the radon model from Github which is marked inline below:
>>>>
>>>> BUGS code (to be saved in WinBugs working directory as "radon.bug"):
>>>>
>>>> model {
>>>> for (i in 1:919){
>>>> log.radon[i] ~ dnorm (y.hat[i], tau.y)
>>>> y.hat[i] <- a[county2[i]] + b*floor[i]
>>>> }
>>>> b ~ dnorm (0, .0001)
>>>> tau.y <- pow(sigma.y, -2)
>>>> sigma.y ~ dunif (0, 100)
>>>>
>>>> for (j in 1:85){
>>>> a[j] ~ dnorm (mu.a, tau.a)
>>>> }
>>>> mu.a ~ dnorm (0, .0001)
>>>> tau.a <- pow(sigma.a, -2)
>>>> sigma.a ~ dunif (0, 100)
>>>> }
>>>> -------------------------------------
>>>> R code:
>>>>
>>>> require(inline)
>>>> require(Rcpp)
>>>> require(R2WinBUGS)
>>>>
>>>> model<-'
>>>> #include <iostream>
>>>> #include <fstream>
>>>> #include <vector>
>>>> #include <string>
>>>> #include <algorithm>
>>>> #include <cmath>
>>>> #include <armadillo>
>>>> #include <boost/random.hpp>
>>>> #include <boost/algorithm/string.hpp> #include <cppbugs/cppbugs.hpp>
>>>>
>>>> using namespace arma;
>>>> using namespace cppbugs;
>>>> using std::vector;
>>>> using std::string;
>>>> using std::cout;
>>>> using std::endl;
>>>> using std::ifstream;
>>>>
>>>> class RadonVaryingInterceptModel: public MCModel {
>>>> public:
>>>> const vec& level;
>>>> const vec& basement;
>>>> const mat& group;
>>>> int N, N_counties;
>>>> mat indicator_matrix; //indicator matrix used for county level
>>>> random effects
>>>>
>>>>
>>>> Normal<vec> a;
>>>> Normal<double> b;
>>>> Deterministic<double> tau_y;
>>>> Uniform<double> sigma_y;
>>>> Normal<double> mu_a;
>>>> Deterministic<double> tau_a;
>>>> Uniform<double> sigma_a;
>>>> Deterministic<mat> y_hat;
>>>> Normal<mat> likelihood;
>>>>
>>>> RadonVaryingInterceptModel(const vec& level_, const vec& basement_, const mat& group_,int N_, int N_counties_):
>>>> level(level_),basement(basement_),group(group_),
>>>> a(randn<vec>(N_counties_)), b(0),N(N_),N_counties(N_counties_),
>>>> indicator_matrix(N,N_counties),
>>>> tau_y(1),sigma_y(1),mu_a(0),tau_a(1),sigma_a(1),
>>>> y_hat(randn<mat>(level_.n_rows,1)),likelihood(level_,true)
>>>>
>>>> {
>>>> indicator_matrix.fill(0.0);
>>>> for(int i=0;i<group.n_elem;i++){
>>>> indicator_matrix(i,group[i])=1.0;
>>>> }
>>>> add(a);
>>>> add(b);
>>>> add(tau_y);
>>>> add(sigma_y);
>>>> add(mu_a);
>>>> add(tau_a);
>>>> add(sigma_a);
>>>> add(y_hat);
>>>> add(likelihood);
>>>> }
>>>>
>>>> void update() {
>>>> y_hat.value = indicator_matrix*a.value + b.value*basement;
>>>> tau_y.value = pow(sigma_y.value, -2.0);
>>>> tau_a.value = pow(sigma_a.value, -2.0);
>>>> a.dnorm(mu_a.value, tau_a.value);
>>>> b.dnorm(0, 0.0001);
>>>> sigma_y.dunif(0, 100);
>>>> mu_a.dnorm(0, 0.0001);
>>>> sigma_a.dunif(0, 100);
>>>> likelihood.dnorm(y_hat.value,tau_y.value);
>>>> }
>>>> };
>>>> '
>>>> Src<-'
>>>>
>>>> mat group = Rcpp::as<arma::mat>(GR); vec level =
>>>> Rcpp::as<arma::vec>(lev); vec basement = Rcpp::as<arma::vec>(basm);
>>>> int N = 919; int N_counties = 85; int iterations_ =
>>>> as<int>(iterations); int burn_ = as<int>(burn); int adapt_ =
>>>> as<int>(adapt); int thin_ = as<int>(thin);
>>>>
>>>> RadonVaryingInterceptModel m(group,level,basement,N,N_counties);
>>>> m.sample(iterations_, burn_, adapt_, thin_);
>>>>
>>>> return Rcpp::List::create(Rcpp::Named("b", m.b.mean()),
>>>> Rcpp::Named("ar", m.acceptance_ratio()), Rcpp::Named("a",
>>>> m.a.mean())); '
>>>> radon.model <- cxxfunction(signature(GR="numeric",
>>>> lev="numeric",basm="numeric",iterations="integer",burn="integer",ada
>>>> p t ="adapt",thin="integer"), body=src, include=model,
>>>> plugin="RcppArmadillo",verbose=F)
>>>>
>>>> df<-read.table("http://www.stat.columbia.edu/~gelman/arm/examples/ra
>>>> d o n/srrs2.dat", header=T, sep=",") df.mn <- df[df[,2]=="MN",]
>>>> radon
>>>> <- df.mn$activity log.radon <- log (ifelse (radon==0, .1, radon))
>>>> floor
>>>> <- df.mn$floor
>>>>
>>>> # get county index variable
>>>> county.name <- as.vector(df.mn$county) uniq <- unique(county.name) J
>>>> <- length(uniq) county <- rep (NA, J) for (i in 1:J){
>>>> county[county.name==uniq[i]] <- i
>>>> }
>>>>
>>>> system.time(radon.model(GR=county22,lev=log.radon2,basm=floor2,itera
>>>> t
>>>> i
>>>> ons=1e4L,burn=1e4L,adapt=2e3L,thin=5L))
>>>>
>>>> #its necessary to change the structure of the data files or else
>>>> winbugs has an error
>>>> log.radon<-as.numeric(log.radon)
>>>> floor<-as.numeric(floor)
>>>> county<-as.numeric(county)
>>>> radon.data<-list(log.radon=log.radon,floor=floor,county=county)
>>>> radon.param<-c("a","b")
>>>> radon.inits<-function(){list(a=rnorm(85),b=rnorm(1),sigma.a=runif(1)
>>>> ,
>>>> s
>>>> igma.y=runif(1),mu.a=rnorm(1)}
>>>>
>>>> #n.burnin = n.iter/2 for bugs
>>>> system.time(radon.test<-bugs(radon.data,radon.inits,radon.param,model.
>>>> file="radon.bug",n.chains=1,n.iter=20000,working.directory=getwd(),b
>>>> u
>>>> g
>>>> s.directory="D:/R/WinBUGS14",n.thin=5))
>>>>
>>>> --------------------------------------------------------------------
>>>> -
>>>> -
>>>> - The difference I get is 16.53s vs 13.85s
>>>>
>>>> Kind regards,
>>>> Sam Watson
>>>>
>>>> -----Original Message-----
>>>> From: Whit Armstrong [mailto:armstrong.whit at gmail.com]
>>>> Sent: 19 December 2011 12:39
>>>> To: Watson, Samuel
>>>> Cc: rcpp-devel at r-forge.wu-wien.ac.at
>>>> Subject: Re: [Rcpp-devel] CppBugs vs WinBugs
>>>>
>>>> post your code.
>>>>
>>>> The speedup depends on your model. My comparisons were based on R/JAGS and pymc on linux.
>>>>
>>>> -Whit
>>>>
>>>>
>>>> On Mon, Dec 19, 2011 at 7:30 AM, Watson, Samuel <S.I.Watson at warwick.ac.uk> wrote:
>>>>> I am currently using CppBugs with Rcpp through R. I am very
>>>>> interested to use CppBugs as I am finding WinBugs to be
>>>>> prohibitively slow, I use large amounts of data in large multilevel
>>>>> models, so when I found cppbugs I was excited. It says on the
>>>>> Github page for cppbugs that I can achieve speeds of 20-100x faster than winbugs.
>>>>>
>>>>> I have been testing some examples: for the linear model example
>>>>> that is provided at Github, the time difference is about 2x
>>>>>
>>>>> I also tested the Radon model with Rcpp and Andrew Gelmans data and
>>>>> for
>>>>> 10000 iterations and 10000 burnin with a thinning parameter of 5
>>>>> the difference is only 16 seconds vs 13 seconds.
>>>>>
>>>>>
>>>>>
>>>>> I can run multiple chains in parallel through R with the ‘snow’
>>>>> package,
>>>>> cxxfunction() and package.skeleton() and this does seem to provide
>>>>> a little bit of a boost (parallel winbugs vs parallel cppbugs). But
>>>>> nothing greater than 4x.
>>>>>
>>>>>
>>>>>
>>>>> Is it possible to achieve the kind of difference mentioned on Whit
>>>>> Armstrong’s page for cppbugs with R?
>>>>>
>>>>>
>>>>>
>>>>> I am happy to post all the code if required.
>>>>>
>>>>>
>>>>>
>>>>> Many thanks
>>>>>
>>>>> Sam Watson
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Rcpp-devel mailing list
>>>>> Rcpp-devel at lists.r-forge.r-project.org
>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-d
>>>>> e
>>>>> v
>>>>> e
>>>>> l
More information about the Rcpp-devel
mailing list