[Rcpp-commits] r2179 - papers/rjournal

Sat Sep 25 21:59:14 CEST 2010

Author: romain
Date: 2010-09-25 21:59:13 +0200 (Sat, 25 Sep 2010)
New Revision: 2179

Modified:
   papers/rjournal/EddelbuettelFrancois.tex
Log:
having a go at the performance section

Modified: papers/rjournal/EddelbuettelFrancois.tex
===================================================================

--- papers/rjournal/EddelbuettelFrancois.tex	2010-09-25 18:08:34 UTC (rev 2178)
+++ papers/rjournal/EddelbuettelFrancois.tex	2010-09-25 19:59:13 UTC (rev 2179)
@@ -756,22 +756,26 @@
 
 \section{Performance comparison}
 
-In this section, we illustrate how C++ features may well come with a price
-in terms of performance. However, as users of \pkg{Rcpp}, we do not need to
-compromise performance for ease of use.
+% In this section, we illustrate how C++ features may well come with a price
+% in terms of performance. However, as users of \pkg{Rcpp}, we do not need to
+% compromise performance for ease of use.
 
+In this section, we present several ways to leverage \pkg{Rcpp} to 
+rewrite the convolution example from \cite{R:exts}. 
+
 As part of the redesign of \pkg{Rcpp}, data copy is kept to the
 absolute minimum. The \code{RObject} class and all its derived
 classes are just a container for a \code{SEXP}. We let R perform
 all memory management and access data though the macros or functions
-offered by the standard R API. In contrast, some data structures
-of the classic \pkg{Rcpp} interface such as the templated 
-\code{RcppVector} used containers offered by the standard template
-library to hold the data, requiring explicit copies of the data 
-from R to C++ and back.
+offered by the standard R API. 
+% In contrast, some data structures
+% of the classic \pkg{Rcpp} interface such as the templated 
+% \code{RcppVector} used containers offered by the standard template
+% library to hold the data, requiring explicit copies of the data 
+% from R to C++ and back.
 
-Here we illustrate how to take advantage of \code{Rcpp} to get
-the best of both worlds. 
+% Here we illustrate how to take advantage of \code{Rcpp} to get
+% the best of both worlds. 
 
 The implementation of the \code{operator[]} is designed as 
 efficiently as possible, using both inlining and caching, 
@@ -779,43 +783,72 @@
 reference C implementation described in \cite{R:exts}.
 % [dirk]  well not according to our newest tests
 
+\pkg{Rcpp} follows design principles from the STL, and classes such 
+as \code{NumericVector} expose iterators that can be used for 
+iterative scans of the data. Algorithms using iterators are 
+usually more efficient than those that operate on objects using the 
+\code{operator[]}. The following version illustrate the use of the
+\code{NumericVector::iterator}. 
 
-In order to achieve maximum efficiency, the reference implementation
-extracts the underlying array pointer \code{double*} and works 
-with pointer arithmetic, which is a built-in operation as opposed to 
-calling the \code{operator[]} on a user-defined class which has to 
-pay the price of object encapsulation.
+% In order to achieve maximum efficiency, the reference implementation
+% extracts the underlying array pointer \code{double*} and works 
+% with pointer arithmetic, which is a built-in operation as opposed to 
+% calling the \code{operator[]} on a user-defined class which has to 
+% pay the price of object encapsulation.
+% 
+% Modelled after containers of the C++ STL,
+% the \code{NumericVector} class provides two member functions \code{begin}
+% and \code{end} that can use used to retrieve respectively 
+% the pointer to the first and past-to-end elements of the underlying array.
+% We can revisit the code to take advantage of this feature : 
 
-Modelled after containers of the C++ STL,
-the \code{NumericVector} class provides two member functions \code{begin}
-and \code{end} that can use used to retrieve respectively 
-the pointer to the first and past-to-end elements of the underlying array.
-We can revisit the code to take advantage of this feature : 
-
 \begin{example}
 #include <Rcpp.h>
 
 RcppExport SEXP convolve4cpp(SEXP a, SEXP b)\{
-    Rcpp::NumericVector xa(a);
-    Rcpp::NumericVector xb(b);
-    int n_xa = xa.size();
-    int n_xb = xb.size();
+    Rcpp::NumericVector xa(a), xb(b);
+    int n_xa = xa.size(), n_xb = xb.size();
     Rcpp::NumericVector xab(n_xa + n_xb - 1);
     
-    double* pa = xa.begin();
-    double* pb = xb.begin();
-    double* pab = xab.begin();
-    int i,j=0; 
-    for (i = 0; i < n_xa; i++)
-        for (j = 0; j < n_xb; j++) 
-            pab[i + j] += pa[i] * pb[j];
+    typedef Rcpp::NumericVector::iterator vec_iterator ;
+    vec_iterator ia = xa.begin(), ib = xb.begin();
+    vec_iterator iab = xab.begin();
+    for (int i = 0; i < n_xa; i++)
+        for (int j = 0; j < n_xb; j++) 
+            iab[i + j] += ia[i] * ib[j];
 
     return xab;
 \}
 \end{example}
 
-We have benchmarked the various implementations by averaging over 1000 calls of each
-function with \code{a} and \code{b} containing 100 elements
+One of the focus of recent developments of \pkg{Rcpp} is called Rcpp sugar, 
+and aims at providing R-like syntax in C++. A discussion of Rcpp sugar is 
+beyond the scope of this article, but for illustration purposes we have included
+another version of the convolution algorithm based on Rcpp sugar. 
+
+\begin{example}
+RcppExport SEXP convolve11cpp(SEXP a, SEXP b) \{
+    NumericVector xa(a); int n_xa = xa.size() ;
+    NumericVector xb(b); int n_xb = xb.size() ;
+    NumericVector xab(n_xa + n_xb - 1,0.0);
+    
+    Range r( 0, n_xb-1 );
+    for(int i=0; i<n_xa; i++, r++)
+        xab[ r ] += nona(xa[i]) * nona(xb) ;
+    return xab ;
+\}
+\end{example}
+
+Rcpp sugar allows manipulation of entire subset of vectors at once, thanks to 
+the \code{Range} class. Rcpp sugar uses techniques such as expression templates, 
+lazy evaluation and loop unrolling to generate very efficient code. 
+The \code{nona} template function marks its argument to indicates that it does 
+not contain any missing value --- an assumption made implicitely by other versions ---
+allowing sugar to compute the individual operations without dealing with 
+missing values. 
+
+We have benchmarked the various implementations by averaging over 5000 calls 
+of each function with \code{a} and \code{b} containing 500 elements
 each.\footnote{The code for this example is contained in the directory
   \code{inst/examples/ConvolveBenchmarks} in the \pkg{Rcpp} package.} The timings
 are summarized in the table below:
@@ -828,10 +861,11 @@
         Implementation & Time in   & Relative \\ 
                        &  millisec  & to R API \\ 
         \cmidrule(r){2-3}
-        R API (as benchmark) & 32 & \\
-        \code{RcppVector<double>} & 354 & 11.1 \\
-        \code{NumericVector::operator[]} & 52 & 1.6 \\
-        \code{NumericVector::begin} & 33 &  1.0 \\
+        R API (as benchmark) & 255 & \\
+        \code{RcppVector<double>} & 354 & 13.74 \\
+        \code{NumericVector::operator[]} & 640 & 2.51 \\
+        \code{NumericVector::iterator} & 248 & 0.97 \\
+        Rcpp sugar & 168 & 0.66 \\
         \bottomrule
       \end{tabular}
     \end{small}
@@ -839,19 +873,12 @@
   \end{center}
 \end{table}
 
-% [dirk]   so what do we want to show here?   I like our new table, I
-%          particularly like the difference between R API "naive" (which does
-%          pretty badly !!) and the highly optimised one.  We do look good.
-%          So we toss Classic, and I guess we also toss Sugar for now?
-% [romain] things have changed now. we definitely want the nona version of sugar
-%          I'm not convinced about showing the naive version of R API
+The first implementation, written in C and using the traditional R API 
+provides out base case. It takes advantage of pointer 
+arithmetics, does not pay the price of C++'s object encapsulation or 
+operator overloading. 
 
-The first implementation, using the traditional R API, unsurprisingly 
-appears to be the most efficient. It takes advantage of pointer 
-arithmetics and does not pay the price of object encapsulation. This provides
-our base case.
-
-The second implementation---from the classic \pkg{Rcpp} API---is
+The second implementation---from the (deprecated) classic \pkg{Rcpp} API---is
 clearly behind in terms of efficiency. The difference is mainly 
 caused by the many unnecessary copies that the \code{RcppVector<double>}
 class performs. First, both objects (\code{a} and \code{b})
@@ -860,20 +887,20 @@
 (\code{xab}) that is filled using the \code{operator()} which checks
 at each access that the index is suitable for the object. Finally, \code{xab}
 is converted back to an R object. 
-% [dirk]  nuke this paragraph, and test?
+% [dirk]   : nuke this paragraph, and test?
+% [romain] : I don't want to show its code, but keeping it for reference perhaps
 
 The third implementation---using the more efficient new \pkg{Rcpp} API---is
 already orders of magnitude faster than the preceding solution. Yet it
 illustrates the price of object encapsulation and of calling an overloaded
 \code{operator[]} as opposed to using pointer arithmetics.
 
-Finally, the last implementation comes very close to the base case and shows
-the code using the new API can essentially as fast as the R API base case
-while being easier to write. 
+The fourth implementation uses iterators rather than indexing. It appears slightly
+more efficient than the base case, mainly because initialization of the values
+leverages the \code{std::fill} algorithm from the STL.
 
-% [dirk] TODO Should we talk about sugar? 
-% [dirk] TODO Should we talk about modules?
-% [romain] let's do another paper, or start working on the book
+Finally, the last implementation uses Rcpp sugar and performs significantly 
+better than the base case. Loop unrolling is responsible for the speedup. 
 
 \section{Summary}
 
@@ -895,9 +922,9 @@
 standard template library and its containers and algorithms. The
 \code{wrap()} and \code{as()} template functions are extensible by design and
 can be used either explicitly or implicitly throughout the API.
-By using only thin wrappers around \code{SEXP} objects, 
-the footprint of the \code{Rcpp} API is very lightweight, and does not 
-induces a significant performance price. 
+By using only thin wrappers around \code{SEXP} objects and adopting C++
+idioms such as iterators, the footprint of the \code{Rcpp} API 
+is very lightweight, and does not induces a significant performance price. 
 
 The \code{Rcpp} API offers opportunities to dramatically reduce the complexity 
 of code, which should improve code readability, maintainability and reuse.