# [Rcpp-devel] How much speedup for matrix operations?

Paul Johnson pauljohn32 at gmail.com
Sun Nov 10 02:54:53 CET 2013

```Hi

If you have the time to test, you will find that some things that you think
will be faster are actually slower, while some seemingly unimportant things
will give huge acceleration.

tcrossprod, and after you do that, my guess it can go faster using Rcpp & a
BLAS like openBLAS.

I wish you'd try these and see how they affect performance. Then let us
know what you find out.  If you find anything clear, I'll add to my

1. Avoid repeated costly calculations.  Please examine the use of exp(A).
and 1/A in this part

E <- (exp(A) * (1 - 1 / A) + 1 / A) / (exp(A) - 1)

If A is a matrix by then, isn't exp a very slow (and imprecise:
http://blogs.mathworks.com/cleve/2012/07/23/a-balancing-act-for-the-matrix-exponential/)
operation, isn't it?
You do it twice on the same matrix.

Did you know that DIVISION is much slower than multiplication in a modern
CPU? Surprised me to learn that last year. Can't you re-arrange this so
that 1/A is not calculated repeatedly?

delta <- (t(X) %*% E - t(X2) %*% E2)
W <- W + delta

This allocates a big bloc "delta" that you don't need to do.

W <- W + (t(X) %*% E - t(X2) %*% E2)

Please write back what you find out. I'm always eager to have clear "do
this, don't do that" examples in the classroom.

In my blog, look at item 3. That was a big shocker to me.

pj

On Wed, Nov 6, 2013 at 12:04 PM, Xavier Robin <xavier at cbs.dtu.dk> wrote:

> On 11/6/13 6:38 PM, Romain Francois wrote:
>
>> This very much depends on the code but there is a good chance that
>> RcppArmadillo will generate code making less data copies, etc ...
>>
>> Hard to say without seeing the code.
>>
>> Romain
>>
>>  Most of the code (or at least the slow, highly repeated parts) look like:
>
>      A <- t(c + t(W) %*% X)
>>     E <- (exp(A) * (1 - 1 / A) + 1 / A) / (exp(A) - 1)
>>     E[abs(A) < sqrt(.Machine\$double.eps) * 2 ] <- 0.5
>>
>>     B <- t(b + W %*% t(E))
>>     X2 <- 1 / (1 + exp(-B))
>>
>>     A2 <- t(c + t(W) %*% X2)
>>     E2 <- (exp(A2) * (1 - 1 / A2) + 1 / A2) / (exp(A2) - 1)
>>     E2[abs(A2) < sqrt(.Machine\$double.eps) * 2 ] <- 0.5
>>
>>     delta <- (t(X) %*% E - t(X2) %*% E2)
>>     W <- W + delta
>>
>
> Where b and c are vectors, W and X matrices. All this is encapsulated in a
> function, that is called a few thousand times in a for loop, with some
> sanity checks. (But it didn't appear to have much impact on the speed... if
> I remove the matrix operations so it does nothing, it executes nearly
> instantly). I understand from Dirk and Douglas that it probably isn't going
> to make a huge difference, though (not by orders).
>
>
> Thanks,
> Xavier
>
> --
> Xavier Robin, PhD
> Cellular Signal Integration Group (C-SIG) - http://www.lindinglab.org
>

--
Paul E. Johnson
Professor, Political Science      Assoc. Director
1541 Lilac Lane, Room 504      Center for Research Methods
University of Kansas                 University of Kansas
http://pj.freefaculty.org               http://quant.ku.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20131109/5f7bf36a/attachment.html>
```