[Rcpp-devel] [R] fast rowCumsums wanted for calculating the cdf
Romain Francois
romain at r-enthusiasts.com
Fri Oct 15 15:05:24 CEST 2010
Hi,
This would probably deserve some abstraction, we had C++ versions of
apply in our TODO list for some time, but here is a shot:
require( Rcpp )
require( inline )
f.Rcpp <- cxxfunction( signature( x = "matrix" ), '
NumericMatrix input( x ) ;
NumericMatrix output = clone<NumericMatrix>( input ) ;
int nr = input.nrow(), nc = input.ncol() ;
NumericVector tmp( nr );
for( int i=0; i<nc; i++){
tmp = tmp + input.column(i) ;
NumericMatrix::Column target( output, i ) ;
std::copy( tmp.begin(), tmp.end(), target.begin() ) ;
}
return output ;
', plugin = "Rcpp" )
f.R <- function( x ){
t(apply(probs, 1, cumsum)) #SLOOOW!
}
probs <- t(matrix(rep(1:100),nrow=10)) # matrix with row-wise probabilites
stopifnot( all.equal( f.R( probs ), f.Rcpp( probs ) ) )
require( rbenchmark )
probs <- t(matrix(rep(1:10000000), nrow=10)) # matrix with row-wise
probabilites
bm <- benchmark(
f.R( probs ),
f.Rcpp( probs ),
columns=c("test", "elapsed", "relative", "user.self", "sys.self"),
order="relative",
replications=10)
print( bm )
I get this on my iMac with R 2.12.0 and Rcpp as just released to CRAN.
$ Rscript cumsum.R
test elapsed relative user.self sys.self
2 f.Rcpp(probs) 4.614 1.00000 4.375 0.239
1 f.R(probs) 68.645 14.87755 67.765 0.877
When we implement "apply" in C++, we will probably leverage loop
unrolling to achieve greater performance.
Romain
Le 15/10/10 14:34, Douglas Bates a écrit :
> Although I know there is another message in this thread I am replying
> to this message to be able to include the whole discussion to this
> point.
>
> Gregor mentioned the possibility of extending the compiled code for
> cumsum so that it would handle the matrix case. The work by Dirk
> Eddelbuettel and Romain Francois on developing the Rcpp package make
> it surprisingly easy to create compiled code for this task. I am
> copying the Rcpp-devel list on this in case one of the readers of that
> list has time to create a sample implementation before I can get to
> it. It's midterm season and I have to tackle the stack of blue books
> on my desk before doing fun things like writing code.
>
> On Fri, Oct 15, 2010 at 2:23 AM, Joshua Wiley<jwiley.psych at gmail.com> wrote:
>> Hi,
>>
>> You might look at Reduce(). It seems faster. I converted the matrix
>> to a list in an incredibly sloppy way (which you should not emulate)
>> because I cannot think of the simple way.
>>
>>> probs<- t(matrix(rep(1:10000000), nrow=10)) # matrix with row-wise probabilites
>>> F<- matrix(0, nrow=nrow(probs), ncol=ncol(probs));
>>> F[,1]<- probs[,1,drop=TRUE];
>>> add<- function(x) {Reduce(`+`, x, accumulate = TRUE)}
>>>
>>>
>>> system.time(F.slow<- t(apply(probs, 1, cumsum)))
>> user system elapsed
>> 36.758 0.416 42.464
>>>
>>> system.time(for (cc in 2:ncol(F)) {
>> + F[,cc]<- F[,cc-1,drop=TRUE] + probs[,cc,drop=TRUE];
>> + })
>> user system elapsed
>> 0.980 0.196 1.328
>>>
>>> system.time(add(list(probs[,1], probs[,2], probs[,3], probs[,4], probs[,5], probs[,6], probs[,7], probs[,8], probs[,9], probs[,10])))
>> user system elapsed
>> 0.420 0.072 0.539
>>>
>>
>> .539 seconds for 10 vectors each with 1 million elements does not seem
>> bad. For 30000 each, on my system it took .014 seconds, which for
>> 200,000 iterations, I estimated as:
>>
>>> (200000*.014)/60/60
>> [1] 0.7777778
>>
>> (and this is on a stone age system with a single core processor and
>> only 756MB of RAM)
>>
>> It should not be difficult to get the list back to a matrix.
>>
>> Cheers,
>>
>> Josh
>>
>>
>>
>> On Thu, Oct 14, 2010 at 11:51 PM, Gregor<mailinglist at gmx.at> wrote:
>>> Dear all,
>>>
>>> Maybe the "easiest" solution: Is there anything that speaks against generalizing
>>> cumsum from base to cope with matrices (as is done in matlab)? E.g.:
>>>
>>> "cumsum(Matrix, 1)"
>>> equivalent to
>>> "apply(Matrix, 1, cumsum)"
>>>
>>> The main advantage could be optimized code if the Matrix is extreme nonsquare
>>> (e.g. 100,000x10), but the summation is done over the short side (in this case 10).
>>> apply would practically yield a loop over 100,000 elements, and vectorization w.r.t.
>>> the long side (loop over 10 elements) provides considerable efficiency gains.
>>>
>>> Many regards,
>>> Gregor
>>>
>>>
>>>
>>>
>>> On Tue, 12 Oct 2010 10:24:53 +0200
>>> Gregor<mailinglist at gmx.at> wrote:
>>>
>>>> Dear all,
>>>>
>>>> I am struggling with a (currently) cost-intensive problem: calculating the
>>>> (non-normalized) cumulative distribution function, given the (non-normalized)
>>>> probabilities. something like:
>>>>
>>>> probs<- t(matrix(rep(1:100),nrow=10)) # matrix with row-wise probabilites
>>>> F<- t(apply(probs, 1, cumsum)) #SLOOOW!
>>>>
>>>> One (already faster, but for sure not ideal) solution - thanks to Henrik Bengtsson:
>>>>
>>>> F<- matrix(0, nrow=nrow(probs), ncol=ncol(probs));
>>>> F[,1]<- probs[,1,drop=TRUE];
>>>> for (cc in 2:ncol(F)) {
>>>> F[,cc]<- F[,cc-1,drop=TRUE] + probs[,cc,drop=TRUE];
>>>> }
>>>>
>>>> In my case, probs is a (30,000 x 10) matrix, and i need to iterate this step around
>>>> 200,000 times, so speed is crucial. I currently can make sure to have no NAs, but
>>>> in order to extend matrixStats, this could be a nontrivial issue.
>>>>
>>>> Any ideas for speeding up this - probably routine - task?
>>>>
>>>> Thanks in advance,
>>>> Gregor
--
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30
http://romainfrancois.blog.free.fr
|- http://bit.ly/b8wOqW : LondonR Rcpp slides
|- http://bit.ly/cCmbgg : Rcpp 0.8.6
`- http://bit.ly/bzoWrs : Rcpp svn revision 2000
More information about the Rcpp-devel
mailing list