I am sorry. I think you are right. I know little about computation. And it will be great that you write what you would like to wrote to explain to me. I am really learning a lot from this discussion. Before I even have not heard of profiling of a program. Thanks.<br>

<br clear="all"><div>Best wishes!</div><div> </div><div>Honglang Wang</div><div> </div><div>Office C402 Wells Hall</div><div>Department of Statistics and Probability</div><div>Michigan State University</div><div>1579 I Spartan Village, East Lansing, MI 48823</div>

<div><a href="mailto:wangho16@msu.edu" target="_blank">wangho16@msu.edu</a></div><br>

<br><br><div class="gmail_quote">On Wed, Dec 5, 2012 at 1:31 PM, Douglas Bates <span dir="ltr"><<a href="mailto:bates@stat.wisc.edu" target="_blank">bates@stat.wisc.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br><div class="gmail_extra"><div class="gmail_quote"><div class="im">On Tue, Dec 4, 2012 at 9:39 PM, Honglang Wang <span dir="ltr"><<a href="mailto:wanghonglang2008@gmail.com" target="_blank">wanghonglang2008@gmail.com</a>></span> wrote:<br>


</div><div class="im"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">Yes, the main issue for my coding is the allocation of memory.  And I have fixed one of the biggest memory allocation issue: 4000 by 4000 diagonal matrix. And since I am not familiar with Rcpp and RcppArmadillo, I have no idea how to reuse the memory. I hope I can have some materials to learn this. Thanks.<br clear="all">


<div><br></div></blockquote></div><div>No, your main issue is not thinking about the computation.  As soon as you write something like</div><div class="im"><div><br></div><div><span style="font-family:arial,sans-serif;font-size:13px">arma::vec betahat = arma::inv(Inv)*arma::trans(D)*</span><span style="font-family:arial,sans-serif;font-size:13px">W*y;</span></div>


<div><br></div></div><div>you are in theory land which has very little relationship to practical numerical linear algebra.  If you want to perform linear algebra calculations like weighted least squares you should first take a bit of time to learn about numerical linear algebra as opposed to theoretical linear algebra.  They are very different disciplines.  In theoretical linear algebra you write the solution to a system of linear equations as above, using the inverse of the system matrix.  The first rule of numerical linear algebra is that you never calculate the inverse of a matrix, unless you only plan to do toy examples.  You mentioned sizes of 4000 by 4000 which means that the method you have chosen is doing thousands of times more work than necessary (hint: how do you think that the inverse of a matrix is calculated in practice? - ans: by solving n systems of equations, which you are doing here when you could be solving only one).</div>


<div><br></div><div>Dirk and I wrote about 7 different methods of solving least squares problems in our vignette on RcppEigen.  None of those methods involve taking the inverse of an n by n matrix.</div><div><br></div><div>


R and Rcpp and whatever other programming technologies come along will never be a "special sauce" that takes the place of thinking about what you are trying to do in a computation.</div><div><br></div><div>I could explain about using matrix decompositions, especially the Cholesky and QR decompositions, to solve such problems but you have already ignored all the other suggestions to think about the problem you are addressing and decompose it into manageable chunks so I have no expectation that you would pay attention to what I would write.</div>


<div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div><div class="h5"><div></div><div> </div>

<div class="gmail_quote">

<div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">What exactly do these timings show?  A single call you your function?<br>


How many calls?<br>

<br></blockquote></div><div>Here I called my function for 100 times.<br> </div><div><div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">


Building on Romain's point: -- a portion of your function's runtime is<br>

in memory allocation<br>

(and you have a lot of allocations here).<br>

If you're calling your function thousands or millions of times, then<br>

it might pay to closely<br>

examine your memory allocation strategies and figure out what's<br>

temporary, for example.<br>

It looks like you're already using  copy_aux_mem = false in a number<br>

of places, but you're<br>

allocating a lot of objects -- of approx what size?<br>

<br>

For example, wouldn't this work just as well with one less allocation?<br>

<div>arma::vec kk = t;<br>

</div><div>arma::uvec q1 = arma::find(arma::abs(tp)<h);<br>

</div>kk.elem(q1) = ((1-arma::pow(tp.elem(q1)/h,2))/h)*0.75;<br>

// done with q1.  let's reuse it.<br>

q1 = arma::find(arma::abs(tp)>=h);<br>

// was q2<br>

kk.elem(q1).zeros();<br>

<br>

You could potentially allocate memory for temporary working space in<br>

R, grab it with copy_aux_mem = false, write your temp results there,<br>

and reuse these objects in subsequent function calls.  It doesn't make<br>

sense to go to this trouble, though, if your core algorithm consumes<br>

the bulk of runtime.<br>

<br>

Have you looked on the armadillo notes r.e. inv?  Matrix inversion has<br>

O(>n^2).  You may be aided by pencil-and-paper math here.<br>

<a href="http://arma.sourceforge.net/docs.html#inv" target="_blank">http://arma.sourceforge.net/docs.html#inv</a><br>

<br></blockquote></div></div><div>Here my matrix for inverse is only 4 by 4, so I think it's ok.<br> </div><div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">


best,<br>

Christian<br>

<div><div><br>

> Dear All,<br>

> I have tried out the first example by using RcppArmadillo, but I am not<br>

> sure whether the code is efficient or not. And I did the comparison of the<br>

> computation time.<br>

><br>

> 1) R code using for loop in R: 87.22s<br>

> 2) R code using apply: 77.86s<br>

> 3) RcppArmadillo by using for loop in C++: 53.102s<br>

> 4) RcppArmadillo together with apply in R: 47.310s<br>

><br>

> It is kind of not so big increase. I am wondering whether I used an<br>

> inefficient way for the C++ coding:<br>

<br>

<br>

<br>

</div></div><span><font color="#888888">--<br>

A man, a plan, a cat, a ham, a yak, a yam, a hat, a canal – Panama!<br>

</font></span></blockquote></div></div><br>

<br></div></div><div class="im">_______________________________________________<br>

Rcpp-devel mailing list<br>

<a href="mailto:Rcpp-devel@lists.r-forge.r-project.org" target="_blank">Rcpp-devel@lists.r-forge.r-project.org</a><br>

<a href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel" target="_blank">https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel</a><br></div></blockquote></div><br></div>

</blockquote></div><br>