[Rcpp-devel] Short blog post on loop speed in R

Douglas Bates bates at stat.wisc.edu
Sun Jun 19 20:20:07 CEST 2011


Interesting post, Christian.  It turns out that the apply family can
get better performance than the explicit loop but, as you have seen,
no using aaply.  The choices in this case are sapply or, slightly
faster, vapply from the base package.  I also tried using cmpfun from
the compiler package but vapply is already very tight so  compiling
that call doesn't change the timing.

As for the use of Rcpp, the STL containers and algorithms can be
exploited to an even greater extent.  If you are willing to jump
through the hoops of defining a struct that inherits from
std::binary_function then you can shorten the Rcpp-based code
considerably.  See the enclosed.

On Sat, Jun 18, 2011 at 10:54 PM, Christian Gunning <xian at unm.edu> wrote:
> Dear all,
>
> I haven't been able to keep up with the list this month (I'm looking
> forward to catching up on the Module developments), but I just posted
> this, which might be of interest to some.
>
> http://helmingstay.blogspot.com/2011/06/efficient-loops-in-r-complexity-versus.html
>
> If anyone has a few slides that might be good for introducing Rcpp to
> a scientist-centric, non-CS audience, I'd love to hear.
>
> best,
> Christian
> _______________________________________________
> Rcpp-devel mailing list
> Rcpp-devel at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
>
-------------- next part --------------

R version 2.13.0 (2011-04-13)
Copyright (C) 2011 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(rbenchmark)
> library(compiler)
> library(plyr)
> library(inline)
> library(Rcpp)
> 
> ## use aaply -- the simplest code
> f0 <- function(x, y)  aaply(x, 1, function(x) rbinom(1, x, y[1]))
> 
> ## rewrite the above as an explicit loop
> f1 <- function(x, y) {
+     nrep <- length(x)
+     ii   <- 0L;                         # loop index
+     ret  <- numeric(nrep)               # result vector
+     while(ii < nrep) {
+         ii <- ii + 1L; 
+         ret[ii] <- rbinom(1, x[ii], y[1])
+     }
+     ret
+ }
> 
> ## other versions of apply
> f2 <- function(x, y) sapply(x, rbinom, n = 1, prob = y[1])
> f3 <- function(x, y) vapply(x, rbinom, 1, n = 1, prob = y[1])
> f4 <- cmpfun(f3)
> 
> ## Using Rcpp, inline and STL algorithms and containers
> 
> ## use the std::binary_function templated struct
> inc <-  '
+ struct Rb : std::binary_function<double, double, double> {
+     double operator() (double size, double prob) const { return ::Rf_rbinom(size, prob); }
+ };
+ '
> ## define source code as a character vector, pass to inline
> src <-  ' 
+     NumericVector sz(x);
+     RNGScope scope;
+ 
+     return NumericVector::import_transform(sz.begin(), sz.end(), std::bind2nd(Rb(), as<double>(y)));
+ '
> 
> ## compile the function, inspect the process with verbose=TRUE
> f5 <- cxxfunction(signature(x='numeric', y='numeric'), src, 'Rcpp', inc, verbose=TRUE)
 >> setting environment variables: 
PKG_LIBS =  -L/home/bates/R/x86_64-unknown-linux-gnu-library/2.13/Rcpp/lib -lRcpp -Wl,-rpath,/home/bates/R/x86_64-unknown-linux-gnu-library/2.13/Rcpp/lib

 >> LinkingTo : Rcpp
CLINK_CPPFLAGS =  -I"/home/bates/R/x86_64-unknown-linux-gnu-library/2.13/Rcpp/include" 

 >> Program source :

   1 : 
   2 : // includes from the plugin
   3 : 
   4 : #include <Rcpp.h>
   5 : 
   6 : 
   7 : #ifndef BEGIN_RCPP
   8 : #define BEGIN_RCPP
   9 : #endif
  10 : 
  11 : #ifndef END_RCPP
  12 : #define END_RCPP
  13 : #endif
  14 : 
  15 : using namespace Rcpp;
  16 : 
  17 : 
  18 : // user includes
  19 : 
  20 : struct Rb : std::binary_function<double, double, double> {
  21 :     double operator() (double size, double prob) const { return ::Rf_rbinom(size, prob); }
  22 : };
  23 : 
  24 : 
  25 : // declarations
  26 : extern "C" {
  27 : SEXP file697a8a45( SEXP x, SEXP y) ;
  28 : }
  29 : 
  30 : // definition
  31 : 
  32 : SEXP file697a8a45( SEXP x, SEXP y ){
  33 : BEGIN_RCPP
  34 :  
  35 :     NumericVector sz(x);
  36 :     RNGScope scope;
  37 : 
  38 :     return NumericVector::import_transform(sz.begin(), sz.end(), std::bind2nd(Rb(), as<double>(y)));
  39 : 
  40 : END_RCPP
  41 : }
  42 : 
  43 : 
Compilation argument:
 /usr/lib64/R/bin/R CMD SHLIB file697a8a45.cpp 2> file697a8a45.cpp.err.txt 
ccache g++-4.5 -I/usr/share/R/include   -I"/home/bates/R/x86_64-unknown-linux-gnu-library/2.13/Rcpp/include"   -fpic  -g -O3 -pipe -Wall -pedantic -c file697a8a45.cpp -o file697a8a45.o
g++ -shared -o file697a8a45.so file697a8a45.o -L/home/bates/R/x86_64-unknown-linux-gnu-library/2.13/Rcpp/lib -lRcpp -Wl,-rpath,/home/bates/R/x86_64-unknown-linux-gnu-library/2.13/Rcpp/lib -L/usr/lib64/R/lib -lR
>                             
> 
> ## Input vector
> set.seed(1)
> bb <- rbinom(1e5, 20, 0.5)
> xtabs( ~ bb)
bb
    1     2     3     4     5     6     7     8     9    10    11    12    13 
    5    16   110   471  1524  3702  7575 11942 15894 17527 15998 12001  7425 
   14    15    16    17    18 
 3676  1533   488    99    14 
> 
> ## check that results are consistent
> set.seed(20); system.time(r0 <- f0(bb, 0.1))
   user  system elapsed 
 42.860   0.160  43.116 
> set.seed(20); system.time(r1 <- f1(bb, 0.1))
   user  system elapsed 
  2.080   0.000   2.081 
> all.equal(unname(r0), r1)
[1] TRUE
> set.seed(20); system.time(r2 <- f2(bb, 0.1))
   user  system elapsed 
  1.080   0.000   1.082 
> all.equal(r1, r2)
[1] TRUE
> set.seed(20); system.time(r3 <- f3(bb, 0.1))
   user  system elapsed 
  0.880   0.000   0.882 
> all.equal(r2, r3)
[1] TRUE
> set.seed(20); system.time(r4 <- f4(bb, 0.1))
   user  system elapsed 
  0.870   0.000   0.885 
> all.equal(r3, r4)
[1] TRUE
> set.seed(20); system.time(r5 <- f5(bb, 0.1))
   user  system elapsed 
  0.030   0.000   0.027 
> all.equal(r4, r5)
[1] TRUE
> 
> ## omit f0 from the benchmark comparison as it is far too slow
> benchmark(f1 = f1(bb, 0.1), f2 = f2(bb, 0.1), f3 = f3(bb, 0.1), f4 = f4(bb, 0.1), f5 = f5(bb, 0.1),
+           columns =
+           c("test", "elapsed", "relative", "user.self", "sys.self"),
+           order="user.self", replications=20)
  test elapsed relative user.self sys.self
5   f5   0.596  1.00000      0.60     0.00
3   f3  18.616 31.23490     18.52     0.05
4   f4  18.820 31.57718     18.71     0.04
2   f2  27.263 45.74329     27.17     0.04
1   f1  42.952 72.06711     42.83     0.03
> sessionInfo()
R version 2.13.0 (2011-04-13)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] compiler  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] Rcpp_0.9.4     inline_0.3.8   plyr_1.5.1     rbenchmark_0.3

loaded via a namespace (and not attached):
[1] tools_2.13.0
> 
> proc.time()
   user  system elapsed 
168.500   1.440 171.935 


More information about the Rcpp-devel mailing list