[Rcpp-devel] [R] fast rowCumsums wanted for calculating the cdf

Fri Oct 15 15:05:24 CEST 2010

Hi,

This would probably deserve some abstraction, we had C++ versions of 
apply in our TODO list for some time, but here is a shot:

require( Rcpp )
require( inline )

f.Rcpp <- cxxfunction( signature( x = "matrix" ), '

     NumericMatrix input( x ) ;
     NumericMatrix output  = clone<NumericMatrix>( input ) ;

     int nr = input.nrow(), nc = input.ncol() ;
     NumericVector tmp( nr );
     for( int i=0; i<nc; i++){
         tmp = tmp + input.column(i) ;
         NumericMatrix::Column target( output, i ) ;
         std::copy( tmp.begin(), tmp.end(), target.begin() ) ;
     }
     return output ;

', plugin = "Rcpp" )

f.R <- function( x ){
     t(apply(probs, 1, cumsum)) #SLOOOW!
}

  probs <- t(matrix(rep(1:100),nrow=10)) # matrix with row-wise probabilites
stopifnot( all.equal( f.R( probs ), f.Rcpp( probs ) ) )

require( rbenchmark )
probs <- t(matrix(rep(1:10000000), nrow=10)) # matrix with row-wise 
probabilites

bm <- benchmark(
     f.R( probs ),
     f.Rcpp( probs ),
     columns=c("test", "elapsed", "relative", "user.self", "sys.self"),
     order="relative",
     replications=10)
print( bm )

I get this on my iMac with R 2.12.0 and Rcpp as just released to CRAN.

$ Rscript cumsum.R
            test elapsed relative user.self sys.self
2 f.Rcpp(probs)   4.614  1.00000     4.375    0.239
1    f.R(probs)  68.645 14.87755    67.765    0.877

When we implement "apply" in C++, we will probably leverage loop 
unrolling to achieve greater performance.

Romain

Le 15/10/10 14:34, Douglas Bates a écrit :
> Although I know there is another message in this thread I am replying
> to this message to be able to include the whole discussion to this
> point.
>
> Gregor mentioned the possibility of extending the compiled code for
> cumsum so that it would handle the matrix case.  The work by Dirk
> Eddelbuettel and Romain Francois on developing the Rcpp package make
> it surprisingly easy to create compiled code for this task.  I am
> copying the Rcpp-devel list on this in case one of the readers of that
> list has time to create a sample implementation before I can get to
> it.  It's midterm season and I have to tackle the stack of blue books
> on my desk before doing fun things like writing code.
>
> On Fri, Oct 15, 2010 at 2:23 AM, Joshua Wiley<jwiley.psych at gmail.com>  wrote:
>> Hi,
>>
>> You might look at Reduce().  It seems faster.  I converted the matrix
>> to a list in an incredibly sloppy way (which you should not emulate)
>> because I cannot think of  the simple way.
>>
>>> probs<- t(matrix(rep(1:10000000), nrow=10)) # matrix with row-wise probabilites
>>> F<- matrix(0, nrow=nrow(probs), ncol=ncol(probs));
>>> F[,1]<- probs[,1,drop=TRUE];
>>> add<- function(x) {Reduce(`+`, x, accumulate = TRUE)}
>>>
>>>
>>> system.time(F.slow<- t(apply(probs, 1, cumsum)))
>>    user  system elapsed
>>   36.758   0.416  42.464
>>>
>>> system.time(for (cc in 2:ncol(F)) {
>> +  F[,cc]<- F[,cc-1,drop=TRUE] + probs[,cc,drop=TRUE];
>> + })
>>    user  system elapsed
>>   0.980   0.196   1.328
>>>
>>> system.time(add(list(probs[,1], probs[,2], probs[,3], probs[,4], probs[,5], probs[,6], probs[,7], probs[,8], probs[,9], probs[,10])))
>>    user  system elapsed
>>   0.420   0.072   0.539
>>>
>>
>> .539 seconds for 10 vectors each with 1 million elements does not seem
>> bad.  For 30000 each, on my system it took .014 seconds, which for
>> 200,000 iterations, I estimated as:
>>
>>> (200000*.014)/60/60
>> [1] 0.7777778
>>
>> (and this is on a stone age system with a single core processor and
>> only 756MB of RAM)
>>
>> It should not be difficult to get the list back to a matrix.
>>
>> Cheers,
>>
>> Josh
>>
>>
>>
>> On Thu, Oct 14, 2010 at 11:51 PM, Gregor<mailinglist at gmx.at>  wrote:
>>> Dear all,
>>>
>>> Maybe the "easiest" solution: Is there anything that speaks against generalizing
>>> cumsum from base to cope with matrices (as is done in matlab)? E.g.:
>>>
>>> "cumsum(Matrix, 1)"
>>> equivalent to
>>> "apply(Matrix, 1, cumsum)"
>>>
>>> The main advantage could be optimized code if the Matrix is extreme nonsquare
>>> (e.g. 100,000x10), but the summation is done over the short side (in this case 10).
>>> apply would practically yield a loop over 100,000 elements, and vectorization w.r.t.
>>> the long side (loop over 10 elements) provides considerable efficiency gains.
>>>
>>> Many regards,
>>> Gregor
>>>
>>>
>>>
>>>
>>> On Tue, 12 Oct 2010 10:24:53 +0200
>>> Gregor<mailinglist at gmx.at>  wrote:
>>>
>>>> Dear all,
>>>>
>>>> I am struggling with a (currently) cost-intensive problem: calculating the
>>>> (non-normalized) cumulative distribution function, given the (non-normalized)
>>>> probabilities. something like:
>>>>
>>>> probs<- t(matrix(rep(1:100),nrow=10)) # matrix with row-wise probabilites
>>>> F<- t(apply(probs, 1, cumsum)) #SLOOOW!
>>>>
>>>> One (already faster, but for sure not ideal) solution - thanks to Henrik Bengtsson:
>>>>
>>>> F<- matrix(0, nrow=nrow(probs), ncol=ncol(probs));
>>>> F[,1]<- probs[,1,drop=TRUE];
>>>> for (cc in 2:ncol(F)) {
>>>>    F[,cc]<- F[,cc-1,drop=TRUE] + probs[,cc,drop=TRUE];
>>>> }
>>>>
>>>> In my case, probs is a (30,000 x 10) matrix, and i need to iterate this step around
>>>> 200,000 times, so speed is crucial. I currently can make sure to have no NAs, but
>>>> in order to extend matrixStats, this could be a nontrivial issue.
>>>>
>>>> Any ideas for speeding up this - probably routine - task?
>>>>
>>>> Thanks in advance,
>>>> Gregor

-- 
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30
http://romainfrancois.blog.free.fr
|- http://bit.ly/b8wOqW : LondonR Rcpp slides
|- http://bit.ly/cCmbgg : Rcpp 0.8.6
`- http://bit.ly/bzoWrs : Rcpp svn revision 2000