[Rcpp-devel] calling R function triggers tryCatch() ???

Mon Oct 14 13:33:26 CEST 2013

Hello,

I've added some of my spices in 
https://gist.github.com/romainfrancois/6974012

rowApply3 creates a NumericVector of the appropriate dimensions and 
assign data to it in the loop. Since the dimensions match, the 
assignment operator will not change the underlying SEXP so the call made 
by `Language call(FUN, row);` does not need to be updated by the mean of 
the proxy as in rowApply2. In theory, this should lead to less 
allocations so better performance. In rowApply2 we allocated a vector at 
each iteration of the loop. Now we don't see much of a performance 
boost. This might be related to the implementation of MatrixRow.

Now, the second potential source of performance improvement is with 
using mean_ instead of mean :

mean_ <- function(.) .Internal(mean(.))

mean gives you a lot of nice things like dispatching and additional 
arguments to handle trimming, etc ... but you have to pay for them even 
when you don't use them.

rowApply4 uses less syntactic sugar, so it is less nice to look at. The 
idea was to see what was the cost of MatrixRow.

Here are the results of the benchmarks on my machine on the 30x50 example:

Unit: microseconds
                 expr   min    lq median    uq    max neval
          rowMeans(M)  10.7  14.4   15.1  15.7   21.1   100
   apply(M, 1L, mean) 350.6 362.0  371.2 382.7 2225.8   100
   rowApply0(M, mean) 635.8 655.2  672.3 716.8 2595.1   100
   rowApply1(M, mean) 225.1 233.8  239.7 249.8 2212.1   100
   rowApply2(M, mean) 221.3 229.4  236.6 260.2  329.0   100
   rowApply3(M, mean) 217.0 223.9  229.1 247.7 2077.2   100
  rowApply3(M, mean_)  34.3  38.2   39.3  41.1   47.0   100
  rowApply4(M, mean_)  34.1  36.7   38.0  39.3   47.1   100

rowMeans is better because:
- it does not involve calling the interpreter in the loop. even with 
fast_eval, this has cost.
- all is done internally with a single loop looking like this pseudo code:

NumericVector means( ncol ) ;
for( int j=0, k=0; j=ncol; j++ )
    for(int i=0; i<nrow; i++)
        means[i] += x[k] ;
means /= ncol ;

However rowMeans can only do means. Our rowApply can apply any R function.

One strategy we are using dplyrRcpp (soon to be merged into dplyr) is 
recognize known patterns and then decide whether we want to use the R 
interpreter or some optimized code. Here we would recognize "mean" and 
not use the R interpreter at all.

This is a nice game as it involves non standard evaluation, etc ... so 
that sort of goes beyond the scope of this thread, but using the 
internal dplyr tools, we can definitely make a fast flexible rowApply 
that would call optimized code when recognizing known patterns and fall 
back to the R interpreter otherwise.

Romain

Le 11/10/13 16:27, Hadley Wickham a écrit :
> Interestingly, that shows that rowApply2 is actually slightly _slower_
> than rowApply1.
>
> Hadley
>
> On Fri, Oct 11, 2013 at 9:24 AM, Hadley Wickham <h.wickham at gmail.com> wrote:
>> FYI, I recommend using microbenchmark which uses a much higher
>> precision timer so you can see the variability as well as the mean
>> times. Also note that you can use /*** R */ to include R code that's
>> automatically run when benchmarking:
>> https://gist.github.com/hadley/6935459
>>
>> That gives:
>>
>>> microbenchmark(rowMeans(M),
>> +   apply(M, 1L, mean),
>> +   rowApply0(M, mean),
>> +   rowApply1(M, mean),
>> +   rowApply2(M, mean))
>> Unit: microseconds
>>                 expr   min    lq median    uq   max neval
>>          rowMeans(M)  5.31  7.11   7.54  7.84  13.9   100
>>   apply(M, 1L, mean) 46.23 49.46  50.78 53.06 136.9   100
>>   rowApply0(M, mean) 50.87 53.81  55.36 57.54  85.3   100
>>   rowApply1(M, mean) 18.92 20.14  21.20 22.44  36.9   100
>>   rowApply2(M, mean) 18.73 20.18  21.27 22.40  34.0   100
>>
>>> M <- matrix(rnorm(1500L), nrow=30L);
>>
>>> microbenchmark(rowMeans(M),
>> +   apply(M, 1L, mean),
>> +   rowApply0(M, mean),
>> +   rowApply1(M, mean),
>> +   rowApply2(M, mean))
>> Unit: microseconds
>>                 expr    min  lq median    uq    max neval
>>          rowMeans(M)   9.86  12     13  14.1   21.9   100
>>   apply(M, 1L, mean) 265.28 288    308 349.8  475.2   100
>>   rowApply0(M, mean) 506.54 541    575 604.9 5469.3   100
>>   rowApply1(M, mean) 174.37 185    206 232.4  334.0   100
>>   rowApply2(M, mean) 171.55 183    223 245.8 4244.7   100
>>
>> On Fri, Oct 4, 2013 at 11:36 AM, Thomas Tse <tommy_228_228 at yahoo.com.hk> wrote:
>>> The next test compares the speeds:
>>>
>>> // [[Rcpp::export]]
>>> NumericVector rowApply0(NumericMatrix& x, const Function& FUN)
>>> {
>>>    int n = x.nrow();
>>>    NumericVector result = no_init(n);
>>>
>>>    for (int r = 0; r < n; r++) {
>>>      result[r] = as<double>(FUN(x(r, _) ) );
>>>    }
>>>    return result;
>>> }
>>>
>>> // [[Rcpp::export]]
>>> NumericVector rowApply1(NumericMatrix& x, const Function& FUN)
>>> {
>>>    int n = x.nrow();
>>>    NumericVector result = no_init(n);
>>>
>>>    for (int r = 0; r < n; r++) {
>>>      Language call(FUN, x(r, _)) ;
>>>      result[r] = as<double>(call.fast_eval() );
>>>    }
>>>    return result;
>>> }
>>>
>>> // [[Rcpp::export]]
>>> NumericVector rowApply2(NumericMatrix& x, const Function& FUN)
>>> {
>>>    int n = x.nrow();
>>>    NumericVector result = no_init(n);
>>>
>>>    Language call(FUN, R_NilValue);
>>>    Language::Proxy proxy(call, 1);
>>>    for (int r = 0; r < n; r++) {
>>>      proxy = x(r, _) ;
>>>      result[r] = as<double>(call.fast_eval() );
>>>    }
>>>    return result;
>>> }
>>>
>>>> M <- matrix(rnorm(15L), nrow=3L);
>>>> identical(rowMeans(M), apply(M, 1L, mean));
>>> [1] TRUE
>>>> identical(rowMeans(M), rowApply0(M, mean));
>>> [1] TRUE
>>>> identical(rowMeans(M), rowApply1(M, mean));
>>> [1] TRUE
>>>> identical(rowMeans(M), rowApply2(M, mean));
>>> [1] TRUE
>>>> benchmark(rowMeans(M),
>>> +           apply(M, 1L, mean),
>>> +           rowApply0(M, mean),
>>> +           rowApply1(M, mean),
>>> +           rowApply2(M, mean),
>>> +           replications=300000L);
>>>                  test replications elapsed relative user.self sys.self
>>> user.child sys.child
>>> 3 rowApply0(M, mean)       300000   23.26    8.551     23.14        0
>>> NA        NA
>>> 2  apply(M, 1, mean)       300000   21.30    7.831     21.13        0
>>> NA        NA
>>> 4 rowApply1(M, mean)       300000   10.23    3.761     10.22        0
>>> NA        NA
>>> 5 rowApply2(M, mean)       300000    9.87    3.629      9.82        0
>>> NA        NA
>>> 1        rowMeans(M)       300000    2.72    1.000      2.72        0
>>> NA        NA
>>>
>>>> M <- matrix(rnorm(1500L), nrow=30L);
>>>> benchmark(rowMeans(M),
>>> +           apply(M, 1L, mean),
>>> +           rowApply0(M, mean),
>>> +           rowApply1(M, mean),
>>> +           rowApply2(M, mean),
>>> +           replications=30000L);
>>>                  test replications elapsed relative user.self sys.self
>>> user.child sys.child
>>> 3 rowApply0(M, mean)        30000   21.32   48.455     21.30        0
>>> NA        NA
>>> 2  apply(M, 1, mean)        30000   11.17   25.386     11.17        0
>>> NA        NA
>>> 4 rowApply1(M, mean)        30000    8.24   18.727      8.19        0
>>> NA        NA
>>> 5 rowApply2(M, mean)        30000    8.10   18.409      8.08        0
>>> NA        NA
>>> 1        rowMeans(M)        30000    0.44    1.000      0.43        0
>>> NA        NA
>>>
>>>> M <- matrix(rnorm(150000L), nrow=300L);
>>>> benchmark(rowMeans(M),
>>> +           apply(M, 1L, mean),
>>> +           rowApply0(M, mean),
>>> +           rowApply1(M, mean),
>>> +           rowApply2(M, mean),
>>> +           replications=3000L);
>>>                  test replications elapsed relative user.self sys.self
>>> user.child sys.child
>>> 3 rowApply0(M, mean)         3000   26.65   18.379     26.41     0.01
>>> NA        NA
>>> 2  apply(M, 1, mean)         3000   19.76   13.628     19.71     0.00
>>> NA        NA
>>> 4 rowApply1(M, mean)         3000   12.79    8.821     12.78     0.00
>>> NA        NA
>>> 5 rowApply2(M, mean)         3000   12.69    8.752     12.68     0.00
>>> NA        NA
>>> 1        rowMeans(M)         3000    1.45    1.000      1.45     0.00
>>> NA        NA
>>>
>>> again, as Romain suggested, implementation rowApply2 is faster than
>>> rowApply1, and rowApply0 (which triggers tryCatch) is even slower than R's
>>> apply.
>>>
>>> Thanks Romain !
>>>
>>>
>>> _______________________________________________
>>> Rcpp-devel mailing list
>>> Rcpp-devel at lists.r-forge.r-project.org
>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
>>
>>
>>
>> --
>> Chief Scientist, RStudio
>> http://had.co.nz/
>
>
>

-- 
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30