[Rcpp-devel] How to increase the coding efficiency

terrance savitsky tds151 at gmail.com
Thu Dec 13 03:11:02 CET 2012


The comment of Doug Bates on this thread motivated me to better examine
opportunities for algorithm improvements in my R package c++ code written
under Rcpp and RcppArmadillo.   I performed a few changes:  firstly, i
removed all general matrix inverse computations, replacing many with
triangular solving from cholesky decompositions (e.g. to conduct posterior
sampling from a multivariate gaussian where the precision matrix is readily
available).   secondly, instead of repeatedly performing high-dimensional
matrix computations composing sampled parameter and design objects during a
sequential sampling scan, I made the computation once per iteration and
then operated on the resulting large object by repeatedly extracting
portions, pre-sampling, and inserting back the post-sampled result during a
sequential scan.   thirdly,  while linear algebra compositions of
parameters and data objects must be rendered repeatedly across posterior
sampling iterations, compositions of design/data objects (with each other)
may be performed only once and re-used.   I had been sloppy and was
repeatedly performing computations purely involving data objects inside the
sampling loops.   I replaced this approach by slicing and dicing the design
objects and performing the matrix operations, once, and then repeatedly
inserting the objects during sampling.   While the first two steps -
particularly the second - produced some runtime speed improvements, I was
surprised that the third step produced only a very small improvement where
I had expected a large reduction in runtime.   I used RcppArmadillo and
placed these sliced and diced matrix objects into fields from which I then
extracted elements during sampling.   So I replaced matrix computations
with extractions from field containers.   Is it possible that extraction
speeds from fields and multiplying a chain of moderate-dimensioned matrices
(e.g. 5 x 600)  are of equivalent computational intensity?  Was the
compiler compensating for my hackery?


On Tue, Dec 11, 2012 at 7:07 AM, Honglang Wang
<wanghonglang2008 at gmail.com>wrote:

> I tried out that if the dim of the matrix less than or equal to 4, inv()
> and solve() have the same speed.
>
>
> Best wishes!
>
> Honglang Wang
>
> Office C402 Wells Hall
> Department of Statistics and Probability
> Michigan State University
> 1579 I Spartan Village, East Lansing, MI 48823
> wangho16 at msu.edu
>
>
>
> On Mon, Dec 10, 2012 at 8:05 AM, c s <conradsand.arma at gmail.com> wrote:
>
>> Hi Honglang,
>>
>> I recommend looking into using the solve() function in Armadillo, instead
>> of inv(). Using solve() will be considerably faster.
>>
>> More details here:
>> http://arma.sourceforge.net/docs.html#solve
>>
>>
>>
>> On Thursday, December 6, 2012, Honglang Wang <wanghonglang2008 at gmail.com>
>> wrote:
>> > The following is a full example although I don't know whether it's
>> minimal or not:
>> >
>> > library(Rcpp)
>> > library(RcppArmadillo)
>> > sourceCpp("betahat_mod.cpp")
>> >
>> > #The following is data generation.
>> >   n=200
>> >   m=20
>> >   p=2
>> >   t=runif(m*n,min=0, max=1)
>> >   X1=rnorm(m*n,0,1)
>> >   X1=as.matrix(1+2*exp(t)+X1)
>> >   X2=rnorm(m*n,0,1)
>> >   X2=as.matrix(3-4*t^2+X2)
>> >   X=cbind(X1,X2)
>> >   beta1=0.5*sin(t)
>> >   beta2=beta1
>> >   rho=0.2
>> >   sig2=1
>> >
>> eps=unlist(lapply(1:n,function(x){as.matrix(arima.sim(list(ar=rho),m,sd=sqrt(sig2*(1-rho^2))))}))
>> >   y=as.matrix(X1*beta1+X2*beta2+eps)
>> >
>> > #A simple kernel function.
>> > ker=function(x,h)
>> >   {
>> >     ans=x
>> >     lo=(abs(x)<h)
>> >     ans[lo]=(3/4)*(1-(x[lo]/h)^2)/h
>> >     ans[!lo]=0
>> >     return(ans)
>> >   }
>> >
>> >
>> > h=0.3
>> >
>> > #assess the time for evaluating bethat of 100 t's:
>> > system.time((betahatt=t(apply(as.matrix(t[1:100]),1,function(x)
>> betahat(ker,x,X,y,t,h,m)$betahat))))
>> >
>> > And the .cpp file is the following:
>> >
>> > // [[Rcpp::depends(RcppArmadillo)]]
>> >
>> > #include <RcppArmadillo.h>
>> >
>> > using namespace Rcpp;
>> >
>> > // [[Rcpp::export]]
>> > List betahat(Function ker, double t0, NumericMatrix Xr, NumericMatrix
>> yr, NumericVector tr, double h, int m) {
>> >   int n = Xr.nrow(), p = Xr.ncol();
>> >   arma::mat X(Xr.begin(), n, p, false);
>> >   arma::mat y(yr.begin(), n, 1, false);
>> >   arma::colvec t(tr.begin(), tr.size(), false);
>> >   arma::mat T = X;
>> >   T.each_col() %= (t-t0)/h;
>> >   arma::mat D = arma::join_rows(X,T);
>> >   arma::vec kk =as<arma::vec>(ker(tr-t0,h));
>> >   arma::mat W = (arma::diagmat(kk))/m;
>> >   arma::mat Inv = arma::trans(D)*W*D;
>> >   arma::vec betahat = arma::inv(Inv)*arma::trans(D)*W*y;
>> >   arma::colvec betahat0(betahat.begin(),betahat.size()/2,false);
>> >   return List::create(Named("betahat") = betahat0);
>> > }
>> >
>> > Best wishes!
>> >
>> > Honglang Wang
>> >
>> > Office C402 Wells Hall
>> > Department of Statistics and Probability
>> > Michigan State University
>> > 1579 I Spartan Village, East Lansing, MI 48823
>> > wangho16 at msu.edu
>> >
>> >
>> > On Wed, Dec 5, 2012 at 4:07 AM, Christian Gunning <xian at unm.edu> wrote:
>> >>
>> >> Can you post a minimal full example?
>> >> -Christian
>> >>
>> >> On Tue, Dec 4, 2012 at 8:39 PM, Honglang Wang
>> >> <wanghonglang2008 at gmail.com> wrote:
>> >> > Yes, the main issue for my coding is the allocation of memory.  And
>> I have
>> >> > fixed one of the biggest memory allocation issue: 4000 by 4000
>> diagonal
>> >> > matrix. And since I am not familiar with Rcpp and RcppArmadillo, I
>> have no
>> >> > idea how to reuse the memory. I hope I can have some materials to
>> learn
>> >> > this. Thanks.
>> >> >
>> >> >
>> >> >>
>> >> >> What exactly do these timings show?  A single call you your
>> function?
>> >> >> How many calls?
>> >> >>
>> >> > Here I called my function for 100 times.
>> >> >
>> >> >>
>> >> >> Building on Romain's point: -- a portion of your function's runtime
>> is
>> >> >> in memory allocation
>> >> >> (and you have a lot of allocations here).
>> >> >> If you're calling your function thousands or millions of times, then
>> >> >> it might pay to closely
>> >> >> examine your memory allocation strategies and figure out what's
>> >> >> temporary, for example.
>> >> >> It looks like you're already using  copy_aux_mem = false in a number
>> >> >> of places, but you're
>> >> >> allocating a lot of objects -- of approx what size?
>> >> >>
>> >> >> For example, wouldn't this work just as well with one less
>> allocation?
>> >> >> arma::vec kk = t;
>> >> >> arma::uvec q1 = arma::find(arma::abs(tp)<h);
>> >> >> kk.elem(q1) = ((1-arma::pow(tp.elem(q1)/h,2))/h)*0.75;
>> >> >> // done with q1.  let's reuse it.
>> >> >> q1 = arma::find(arma::abs(tp)>=h);
>> >> >> // was q2
>> >> >> kk.elem(q1).zeros();
>> >> >>
>> >> >> You could potentially allocate memory for temporary working space in
>> >> >> R, grab it with copy_aux_mem = false, write your temp results there,
>> >> >> and reuse these objects in subsequent function calls.  It doesn't
>> make
>> >> >> sense to go to this trouble, though, if your core algorithm consumes
>> >> >> the bulk of runtime.
>> >> >>
>> >> >> Have you looked on the armadillo notes r.e. inv?  Matrix inversion
>> has
>> >> >> O(>n^2).  You may be aided by pencil-and-paper math here.
>> >> >> http://arma.sourceforge.net/docs.html#inv
>> >> >>
>> >> > Here my matrix for inverse is only 4 by 4, so I think it's ok.
>> >> >
>> >> >>
>> >> >> best,
>> >> >> Christian
>> >> >>
>> >> >> > Dear All,
>> >> >> > I have tried out the first example by using RcppArmadillo, but I
>> am not
>> >> >> > sure whether the code is efficient or not. And I did the
>> comparison of
>> >> >> > the
>> >> >> > computation time.
>> >> >> >
>> >> >> > 1) R code using for loop in R: 87.22s
>> >> >> > 2) R code using apply: 77.86s
>> >> >> > 3) RcppArmadillo by using for loop in C++: 53.102s
>> >> >> > 4) RcppArmadillo together with apply in R: 47.310s
>> >> >> >
>> >> >> > It is kind of not so big increase. I am wondering whether I used
>> an
>> >> >> > inefficient way for the C++ coding:
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> A man, a plan, a cat, a ham, a yak, a yam, a hat, a canal – Panama!
>> >> >
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> A man, a plan, a cat, a ham, a yak, a yam, a hat, a canal – Panama!
>> >
>> >
>>
>
>
> _______________________________________________
> Rcpp-devel mailing list
> Rcpp-devel at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
>



-- 
Thank you, Terrance Savitsky
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20121212/f29e1fa6/attachment-0001.html>


More information about the Rcpp-devel mailing list