[Rcpp-devel] How to increase the coding efficiency

Douglas Bates bates at stat.wisc.edu
Wed Dec 5 19:31:49 CET 2012


On Tue, Dec 4, 2012 at 9:39 PM, Honglang Wang <wanghonglang2008 at gmail.com>wrote:

> Yes, the main issue for my coding is the allocation of memory.  And I have
> fixed one of the biggest memory allocation issue: 4000 by 4000 diagonal
> matrix. And since I am not familiar with Rcpp and RcppArmadillo, I have no
> idea how to reuse the memory. I hope I can have some materials to learn
> this. Thanks.
>
> No, your main issue is not thinking about the computation.  As soon as you
write something like

arma::vec betahat = arma::inv(Inv)*arma::trans(D)*W*y;

you are in theory land which has very little relationship to practical
numerical linear algebra.  If you want to perform linear algebra
calculations like weighted least squares you should first take a bit of
time to learn about numerical linear algebra as opposed to theoretical
linear algebra.  They are very different disciplines.  In theoretical
linear algebra you write the solution to a system of linear equations as
above, using the inverse of the system matrix.  The first rule of numerical
linear algebra is that you never calculate the inverse of a matrix, unless
you only plan to do toy examples.  You mentioned sizes of 4000 by 4000
which means that the method you have chosen is doing thousands of times
more work than necessary (hint: how do you think that the inverse of a
matrix is calculated in practice? - ans: by solving n systems of equations,
which you are doing here when you could be solving only one).

Dirk and I wrote about 7 different methods of solving least squares
problems in our vignette on RcppEigen.  None of those methods involve
taking the inverse of an n by n matrix.

R and Rcpp and whatever other programming technologies come along will
never be a "special sauce" that takes the place of thinking about what you
are trying to do in a computation.

I could explain about using matrix decompositions, especially the Cholesky
and QR decompositions, to solve such problems but you have already ignored
all the other suggestions to think about the problem you are addressing and
decompose it into manageable chunks so I have no expectation that you would
pay attention to what I would write.


>
>> What exactly do these timings show?  A single call you your function?
>> How many calls?
>>
>> Here I called my function for 100 times.
>
>
>> Building on Romain's point: -- a portion of your function's runtime is
>> in memory allocation
>> (and you have a lot of allocations here).
>> If you're calling your function thousands or millions of times, then
>> it might pay to closely
>> examine your memory allocation strategies and figure out what's
>> temporary, for example.
>> It looks like you're already using  copy_aux_mem = false in a number
>> of places, but you're
>> allocating a lot of objects -- of approx what size?
>>
>> For example, wouldn't this work just as well with one less allocation?
>> arma::vec kk = t;
>> arma::uvec q1 = arma::find(arma::abs(tp)<h);
>> kk.elem(q1) = ((1-arma::pow(tp.elem(q1)/h,2))/h)*0.75;
>> // done with q1.  let's reuse it.
>> q1 = arma::find(arma::abs(tp)>=h);
>> // was q2
>> kk.elem(q1).zeros();
>>
>> You could potentially allocate memory for temporary working space in
>> R, grab it with copy_aux_mem = false, write your temp results there,
>> and reuse these objects in subsequent function calls.  It doesn't make
>> sense to go to this trouble, though, if your core algorithm consumes
>> the bulk of runtime.
>>
>> Have you looked on the armadillo notes r.e. inv?  Matrix inversion has
>> O(>n^2).  You may be aided by pencil-and-paper math here.
>> http://arma.sourceforge.net/docs.html#inv
>>
>> Here my matrix for inverse is only 4 by 4, so I think it's ok.
>
>
>> best,
>> Christian
>>
>> > Dear All,
>> > I have tried out the first example by using RcppArmadillo, but I am not
>> > sure whether the code is efficient or not. And I did the comparison of
>> the
>> > computation time.
>> >
>> > 1) R code using for loop in R: 87.22s
>> > 2) R code using apply: 77.86s
>> > 3) RcppArmadillo by using for loop in C++: 53.102s
>> > 4) RcppArmadillo together with apply in R: 47.310s
>> >
>> > It is kind of not so big increase. I am wondering whether I used an
>> > inefficient way for the C++ coding:
>>
>>
>>
>> --
>> A man, a plan, a cat, a ham, a yak, a yam, a hat, a canal – Panama!
>>
>
>
> _______________________________________________
> Rcpp-devel mailing list
> Rcpp-devel at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20121205/0200d30c/attachment.html>


More information about the Rcpp-devel mailing list