[Rcpp-devel] How to increase the coding efficiency

Honglang Wang wanghonglang2008 at gmail.com
Wed Dec 5 20:14:03 CET 2012

I am sorry. I think you are right. I know little about computation. And it
will be great that you write what you would like to wrote to explain to me.
I am really learning a lot from this discussion. Before I even have not
heard of profiling of a program. Thanks.

Best wishes!

Honglang Wang

Office C402 Wells Hall
Department of Statistics and Probability
Michigan State University
1579 I Spartan Village, East Lansing, MI 48823
wangho16 at msu.edu

On Wed, Dec 5, 2012 at 1:31 PM, Douglas Bates <bates at stat.wisc.edu> wrote:

> On Tue, Dec 4, 2012 at 9:39 PM, Honglang Wang <wanghonglang2008 at gmail.com>wrote:
>> Yes, the main issue for my coding is the allocation of memory.  And I
>> have fixed one of the biggest memory allocation issue: 4000 by 4000
>> diagonal matrix. And since I am not familiar with Rcpp and RcppArmadillo, I
>> have no idea how to reuse the memory. I hope I can have some materials to
>> learn this. Thanks.
>> No, your main issue is not thinking about the computation.  As soon as
> you write something like
> arma::vec betahat = arma::inv(Inv)*arma::trans(D)*W*y;
> you are in theory land which has very little relationship to practical
> numerical linear algebra.  If you want to perform linear algebra
> calculations like weighted least squares you should first take a bit of
> time to learn about numerical linear algebra as opposed to theoretical
> linear algebra.  They are very different disciplines.  In theoretical
> linear algebra you write the solution to a system of linear equations as
> above, using the inverse of the system matrix.  The first rule of numerical
> linear algebra is that you never calculate the inverse of a matrix, unless
> you only plan to do toy examples.  You mentioned sizes of 4000 by 4000
> which means that the method you have chosen is doing thousands of times
> more work than necessary (hint: how do you think that the inverse of a
> matrix is calculated in practice? - ans: by solving n systems of equations,
> which you are doing here when you could be solving only one).
> Dirk and I wrote about 7 different methods of solving least squares
> problems in our vignette on RcppEigen.  None of those methods involve
> taking the inverse of an n by n matrix.
> R and Rcpp and whatever other programming technologies come along will
> never be a "special sauce" that takes the place of thinking about what you
> are trying to do in a computation.
> I could explain about using matrix decompositions, especially the Cholesky
> and QR decompositions, to solve such problems but you have already ignored
> all the other suggestions to think about the problem you are addressing and
> decompose it into manageable chunks so I have no expectation that you would
> pay attention to what I would write.
>>> What exactly do these timings show?  A single call you your function?
>>> How many calls?
>>> Here I called my function for 100 times.
>>> Building on Romain's point: -- a portion of your function's runtime is
>>> in memory allocation
>>> (and you have a lot of allocations here).
>>> If you're calling your function thousands or millions of times, then
>>> it might pay to closely
>>> examine your memory allocation strategies and figure out what's
>>> temporary, for example.
>>> It looks like you're already using  copy_aux_mem = false in a number
>>> of places, but you're
>>> allocating a lot of objects -- of approx what size?
>>> For example, wouldn't this work just as well with one less allocation?
>>> arma::vec kk = t;
>>> arma::uvec q1 = arma::find(arma::abs(tp)<h);
>>> kk.elem(q1) = ((1-arma::pow(tp.elem(q1)/h,2))/h)*0.75;
>>> // done with q1.  let's reuse it.
>>> q1 = arma::find(arma::abs(tp)>=h);
>>> // was q2
>>> kk.elem(q1).zeros();
>>> You could potentially allocate memory for temporary working space in
>>> R, grab it with copy_aux_mem = false, write your temp results there,
>>> and reuse these objects in subsequent function calls.  It doesn't make
>>> sense to go to this trouble, though, if your core algorithm consumes
>>> the bulk of runtime.
>>> Have you looked on the armadillo notes r.e. inv?  Matrix inversion has
>>> O(>n^2).  You may be aided by pencil-and-paper math here.
>>> http://arma.sourceforge.net/docs.html#inv
>>> Here my matrix for inverse is only 4 by 4, so I think it's ok.
>>> best,
>>> Christian
>>> > Dear All,
>>> > I have tried out the first example by using RcppArmadillo, but I am not
>>> > sure whether the code is efficient or not. And I did the comparison of
>>> the
>>> > computation time.
>>> >
>>> > 1) R code using for loop in R: 87.22s
>>> > 2) R code using apply: 77.86s
>>> > 3) RcppArmadillo by using for loop in C++: 53.102s
>>> > 4) RcppArmadillo together with apply in R: 47.310s
>>> >
>>> > It is kind of not so big increase. I am wondering whether I used an
>>> > inefficient way for the C++ coding:
>>> --
>>> A man, a plan, a cat, a ham, a yak, a yam, a hat, a canal – Panama!
>> _______________________________________________
>> Rcpp-devel mailing list
>> Rcpp-devel at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20121205/e866b620/attachment-0001.html>

More information about the Rcpp-devel mailing list