[Rcpp-commits] r2144 - papers/rjournal

Thu Sep 23 15:39:10 CEST 2010

Author: romain
Date: 2010-09-23 15:39:09 +0200 (Thu, 23 Sep 2010)
New Revision: 2144

Modified:
   papers/rjournal/EddelbuettelFrancois.tex
Log:
first pass after revising the exception section

Modified: papers/rjournal/EddelbuettelFrancois.tex
===================================================================

--- papers/rjournal/EddelbuettelFrancois.tex	2010-09-22 11:09:28 UTC (rev 2143)
+++ papers/rjournal/EddelbuettelFrancois.tex	2010-09-23 13:39:09 UTC (rev 2144)
@@ -36,6 +36,7 @@
 %            as well be mentioned. We have nothing to hide
 %            please see the next paragraph where I now talk about classic
 %            but also note deprecated and link to the recommended new API
+% [romain] : alright
 
 %The current version of 
 The \pkg{Rcpp} package combines two distinct
@@ -91,6 +92,7 @@
 %          I gave it a spin above (and sorry about the reindent)
 %          [ minutes later ] 
 %          ok, one 'new' is gone above as the corresponding section title is gone
+% [romain] fine
 
 \subsection{Comparison}
 
@@ -125,6 +127,7 @@
 %
 The \pkg{cxxPack} package \citep{cran:cxxPack} builds on top of
 \pkg{Rcpp} and adds a small collection of diverse functions.
+% [romain] So what ? Is this the mention you want to remove ? Go right ahead !
 
 %DE: Removed per editor  
 %A critical comparison of these packages that addresses relevant aspects such
@@ -141,10 +144,11 @@
 \subsection{Rcpp Use Cases}  % or some such
 \label{sec:classic_rcpp}
 
-The core focus of \pkg{Rcpp}---particularly for the earlier API described in
-this section---has always been on allowing the programmer to add C++-based
-functions. We use this term in the standard mathematical sense of providing
-results (output) given a set of parameters or data (input). This was
+The core focus of \pkg{Rcpp} has always been on allowing the 
+programmer to add C++-based functions. 
+We use this term in the standard mathematical sense of providing
+results (output) given a set of parameters or data (input). 
+This was
 facilitated from the earliest releases using C++ classes for receiving
 various types of R objects, converting them to C++ objects and allowing the
 programmer to return the results to R with relative use. 
@@ -160,69 +164,9 @@
 and parameters are passed via \pkg{Rcpp} to a function set-up to call code
 from an external library.  
 
-TODO: Wrap this this so that it ties in better with what follows
+% TODO: Wrap this this so that it ties in better with what follows
+% [romain] : should this section be merged with the next one. It looks odd on its own.
 
-% An illustration can be provided using the time-tested example of a
-% convolution of two vectors. This example is shown in sections 5.2 (for the
-% \code{.C()} interface) and 5.9 (for the \code{.Call()} interface) of 'Writing
-% R Extensions' \citep{R:exts}. We have rewritten it here using classes of the
-% classic \pkg{Rcpp} API:
-
-% \begin{example}
-% #include <Rcpp.h>
-
-% RcppExport SEXP convolve2cpp(SEXP a,SEXP b) \{
-%   RcppVector<double> xa(a);
-%   RcppVector<double> xb(b);
-%   int nab = xa.size() + xb.size() - 1;
-
-%   RcppVector<double> xab(nab);
-%   for (int i = 0; i < nab; i++) xab(i) = 0.0;
-
-%   for (int i = 0; i < xa.size(); i++)
-%     for (int j = 0; j < xb.size(); j++) 
-%        xab(i + j) += xa(i) * xb(j);
-
-%   RcppResultSet rs;
-%   rs.add("ab", xab);
-%   return rs.getReturnList();
-% \}
-% \end{example}
-
-% We can highlight several aspects. First, only a single header file
-% \code{Rcpp.h} is needed to use the \pkg{Rcpp} API.  Second, given two
-% \code{SEXP} types, a third is returned.  Third, both inputs are converted to
-% templated.
-% \footnote{C++ templates allow functions or classes to be written
-%   somewhat independently from the template parameter. The actual class is
-%   instantiated by the compiler by replacing occurrences of the templated
-%   parameter(s). A simple example would be a templated function
-%   \texttt{abs(T)} which returns the negative of the template argument $T$
-%   when $T<0$ and $T$ otherwise. While the source code is written with a
-%   `templated' type $T$, the compiler will create a concrete instance using an
-%   \texttt{int} or \texttt{double} type dependent on the context is which the
-%   code is called.}  
-% C++ vector types, here a standard \code{double} type is
-% used to create a vector of doubles from the template type.  Fourth, the
-% usefulness of these classes can be seen when we query the vectors directly
-% for their size---using the \code{size()} member function---in order to
-% reserve a new result type of appropriate length whereas use based on C arrays
-% would have required additional parameters for the length of vectors $a$ and
-% $b$, leaving open the possibility of mismatches between the actual length and
-% the length reported by the programmer.  Fifth, the computation itself is
-% straightforward embedded looping just as in the original examples in the
-% 'Writing R Extensions' manual \citep{R:exts}.  Sixth, a return type
-% (\code{RcppResultSet}) is prepared as a named object which is then converted
-% to a list object that is returned.  We should note that the
-% \code{RcppResultSet} supports the return of numerous (named) objects which
-% can also be of different types.
-
-% We argue that this usage is already much easier to read, write and debug than the
-% C macro-based approach supported by R itself. Possible performance issues and
-% other potential limitations will be discussed throughout the article and
-% reviewed at the end.
-
-%\section{New \pkg{Rcpp} API}
 \section{The \pkg{Rcpp} API}
 \label{sec:new_rcpp}
 
@@ -262,10 +206,16 @@
 conversions below).  Fourth, the
 usefulness of these classes can be seen when we query the vectors directly
 for their size---using the \code{size()} member function---in order to
-reserve a new result type of appropriate length whereas use based on C arrays
-would have required additional parameters for the length of vectors $a$ and
-$b$, leaving open the possibility of mismatches between the actual length and
-the length reported by the programmer.  Fifth, the computation itself is
+reserve a new result type of appropriate length 
+% whereas use based on C arrays
+% would have required additional parameters for the length of vectors $a$ and
+% $b$, leaving open the possibility of mismatches between the actual length and
+% the length reported by the programmer.
+% [romain] : hmmm. There is no need for extra parameters if you use .Call
+%            with the R API. I don't think the point is valid.
+and with the use of the 
+\verb|operator[]| to extract and set individual elements of the vector. 
+Fifth, the computation itself is
 straightforward embedded looping just as in the original examples in the
 'Writing R Extensions' manual \citep{R:exts}.  Sixth, the return conversion
 is also automatic from the \code{NumericVector} to the \code{SEXP} type.
@@ -330,8 +280,8 @@
 member functions to manage objects in the associated environment. 
 Similarly, classes related to vectors (\code{IntegerVector}, \code{NumericVector}, 
 \code{RawVector}, \code{LogicalVector}, \code{CharacterVector}, 
-\code{GenericVector} and \code{ExpressionVector}) expose functionality
-to extract and set values from the vectors.
+\code{GenericVector} (also known as \code{List}) and \code{ExpressionVector}) 
+expose functionality to extract and set values from the vectors.
 
 The following sub-sections present typical uses of \pkg{Rcpp} classes in
 comparison with the same code expressed using functions of the R API.
@@ -377,14 +327,29 @@
 the first and second elements of the vector as \code{NumericVector} overloads
 the \code{operator[]}.
 
-With the most recent compilers (e.g. GNU g++ >= 4.4) which already implement
-parts of the next C++ standard (C++0x) currently being drafted, the preceding
-code may even be reduced to this:
+% With the most recent compilers (e.g. GNU g++ >= 4.4) which already implement
+% parts of the next C++ standard (C++0x) currently being drafted, the preceding
+% code may even be reduced to this:
+% 
+% \begin{example}
+% Rcpp::NumericVector ab = \{123.45, 67.89\};
+% \end{example}
+% [romain] I'm trading this for the use of create, as this always works 
+%          so that we don't confuse readers because if you have gcc 4.4
+%          you don't get this automatically, you have to enable it, etc ...
 
+The snippet can also be written more concisely using the \code{create}
+static member function of the \code{NumericVector} class: 
+
 \begin{example}
-Rcpp::NumericVector ab = \{123.45, 67.89\};
+Rcpp::NumericVector ab = 
+    Rcpp::NumericVector::create( 123.45, 67.89 );
 \end{example}
 
+It should be noted that although the copy constructor of the 
+\code{NumericVector} class is used, it does not imply copies of the 
+underlying array, only the \code{SEXP} is copied. 
+
 \subsection{Character vectors}
 
 A second example deals with character vectors and emulates this R code
@@ -439,7 +404,7 @@
 object and converts this object into a \code{SEXP}, which is what R expects. 
 Currently wrappable types are :
 \begin{itemize}
-\item primitive types, \code{int}, \code{double}, ... which are converted 
+\item primitive types: \code{int}, \code{double}, ... which are converted 
 into the corresponding atomic R vectors;
 \item \code{std::string} which are converted to R atomic character vectors;
 \item STL containers such as \code{std::vector<T>} or \code{std::list<T>}, 
@@ -449,16 +414,14 @@
 the type \code{T} is wrappable;
 \item any type that implements implicit conversion to \code{SEXP} through the 
 \code{operator SEXP()};
-\item any type for which the \code{wrap} template is partially or fully 
-specialized.
+\item any type for which the \code{wrap} template is % partially or  [romain] partially is not true anymore
+fully specialized.
 \end{itemize}
-%One example for the specialisation of the templated \code{wrap} function is
-%provided in \pkg{RInside} \citep{cran:rinside} by \code{vector< vector<
-%  double > >} and \code{vector< vector< int > >} which are used for
-%representing numeric matrices.
 
 Wrappability of an object type is resolved at compile time using 
-modern techniques of template meta programming and class traits.
+modern techniques of template meta programming and class traits. The 
+\code{Rcpp-extending} vignette discusses in depth how to extend \code{wrap}
+to third party types and the \pkg{RcppArmadillo} features several examples.
 
 The following code snippet illustrates that the design allows
 composition:
@@ -490,7 +453,7 @@
 \code{Rcpp::as} template whose signature is:
 \begin{example}
 template <typename T> 
-T as(SEXP x);
+T as(SEXP x) throw(not_compatible) ;
 \end{example}
 
 It offers less flexibility and currently
@@ -534,16 +497,19 @@
 \end{example}
 
 In the first part of the example, the code extracts a 
-\code{std::vector<double>} from the global environment. This is 
-achieved by the templated \code{operator[]} of \code{Environment}
-that first extracts the requested object from the environment as a \code{SEXP}, 
-and then outsources to \code{Rcpp::as} the creation of the 
-requested type. 
+\code{std::vector<double>} from the global environment. In order to achieve this, 
+the \code{operator[]}  of \code{Environment} uses the proxy pattern to distinguish 
+between left hand side (LHS) and right hand side (RHS) use. 
+% [TODO] : reference (meyers more effective C++ I think?)
+The output of the operator is an instance of the nested class
+\code{Environment::Binding}, which defines a templated implicit conversion 
+operator that allows a \code{Binding} to be assigned to any type that 
+\code{Rcpp::as} is able to handle. 
 
-In the second part of the example, the \code{operator[]} 
-delegates to \code{wrap} the production of an R object based on the 
-type that is passed in (\code{std::map<std::string,std::string>}), 
-and then assigns the object to the requested name.
+In the second part of the example, LHS use of the \code{Binding} instance is 
+implemented through its assignment operator, which is also templated and uses
+\code{Rcpp::wrap} to perform the conversion to a \code{SEXP} that can be 
+assigned to the requested symbol in the global environment. 
 
 The same mechanism is used throughout the API. Examples include access/modification
 of object attributes, slots, elements of generic vectors (lists), 
@@ -627,8 +593,8 @@
 that is easier to read, write and maintain. More examples are available as
 part of the documentation included in the \pkg{Rcpp} package, as well as
 among its over one hundred and ninety unit tests.
+% TODO: bump this up to the current test count
 
-
 \section{Using code `inline'}
 \label{sec:inline}
 
@@ -668,8 +634,7 @@
 \pkg{inline} as shown below.  The function body is provided by the character
 variable \code{src}, the function header is defined by the argument
 \code{signature}---and we only need to enable \code{plugin="Rcpp"} to obtain a
-new function \code{fun} based on the C++ code in \code{src} where we also
-switched from the classic \pkg{Rcpp} API to the new one:
+new function \code{fun} based on the C++ code in \code{src}: 
 
 \begin{example}
 > src <- '
@@ -686,17 +651,23 @@
 > fun <- cxxfunction( 
 + \ \ \ \	signature(a="numeric", b="numeric"), 
 + \ \ \ \	src, plugin="Rcpp")
+> fun( 1:3, 1:4 )
+[1]  1  4 10 16 17 12
 \end{example}
 
-The main difference to the previous solution is that the input parameters are
-directly passed to types \code{Rcpp::NumericVector}, and that the return
-vector is automatically converted to a \code{SEXP} type through implicit
-conversion. Also in this version, the vector \code{xab} is not 
-initialized because the constructor already performs initialization
-to match the behavior of the R function \code{numeric}.
+% The main difference to the previous solution is that the input parameters are
+% directly passed to types \code{Rcpp::NumericVector}, and that the return
+% vector is automatically converted to a \code{SEXP} type through implicit
+% conversion. 
+% Also in this version, the vector \code{xab} is not 
+% initialized because the constructor already performs initialization
+% to match the behavior of the R function \code{numeric}.
 
 \section{Using STL algorithms}
 
+% [romain] hmmmm. we do now have sapply and lapply. I think we should mention
+%                 them here.
+
 % This is taken from :
 % http://www.cplusplus.com/reference/algorithm/
 
@@ -714,7 +685,6 @@
 ellipsis (\code{...}).} version of \code{lapply}
 using the \code{transform} algorithm from the STL. 
 
-% [Romain] does the code need comments ?
 \begin{example}
 > src <- '
 +   Rcpp::List input(data); 
@@ -759,38 +729,76 @@
 
 \subsection{C++ exceptions in R}
 
-The traditional way of dealing with C++ exceptions in R is to
-catch them through explicit try/catch blocks and
-convert this exception into an R error manually. 
+The internals of the R condition mechanism and the implementation of 
+C++ exceptions are both based on a layer above posix jumps. These layers 
+both assume total control over the call stack and should not be used together
+without extra precaution. \pkg{Rcpp} contains facilities to combine both systems
+so that a C++ exception is caught and recycled into the R condition 
+mechanism. 
 
-In C++, when an application throws an exception that is not caught, 
-a special function (called the terminate handler) is invoked. This typically causes 
-the program to abort. \pkg{Rcpp} takes advantage of this mechanism
-and installs its own terminate handler which translates C++
-exceptions into R conditions. The following code gives an illustration. 
+\pkg{Rcpp} defines the \code{BEGIN\_RCPP} and \code{END\_RCPP} macros that should 
+be used to bracket code that might throw C++ exceptions. 
 
 \begin{example}
-> fun <- cxxfunction(signature(x = "integer"), '
-+  int dx = as<int>(x);
-+   if( dx > 10 ) 
-+      throw std::range_error("too big") ;
-+   return wrap(dx*dx);
-+ ', plugin="Rcpp", 
-+  includes = "using namespace Rcpp;" )
-> tryCatch( fun(12), 
-+ "std::range_error" = function(e){
-+    writeLines( conditionMessage(e) )
-+ } )
-too big
+RcppExport SEXP fun( SEXP x )\{
+BEGIN_RCPP
+    int dx = Rcpp::as<int>(x);
+    if( dx > 10 ) 
+        throw std::range_error("too big") ;
+    return Rcpp::wrap( dx * dx) ; 
+END_RCPP
+\}
 \end{example}
 
+The macros are simply defined to avoid code repetition, they expand to 
+simple try/catch blocks: 
+
+\begin{example}
+RcppExport SEXP fun( SEXP x )\{
+    try\{
+        int dx = Rcpp::as<int>(x);
+        if( dx > 10 ) 
+            throw std::range_error("too big") ;
+        return Rcpp::wrap( dx * dx) ; 
+    \} catch( std::exception& __ex__ )\{ 
+        forward_exception_to_r( __ex__ ) ;
+    \} catch(...)\{ 
+        ::Rf_error( "c++ exception (unknown reason)" ) ;
+    \}
+\}
+\end{example}
+
+Using \code{BEGIN\_RCPP} and \code{END\_RCPP} --- or the expanded versions ---
+guarantess that the stack is first unwound in terms of C++ exceptions, before 
+the problem is converted to the standard R error management system (\code{Rf\_error}).
+
+The \code{forward\_exception\_to\_r} uses run-time type information to 
+extract information about the class of the C++ exception and its message, so that 
+dedicated handlers can be installed on the R side. 
+
+\begin{example}
+> f <- function(x) .Call( "fun", x )
+> tryCatch( f( 12 ), 
++    "std::range_error" = function(e) \{
++       conditionMessage( e )
++    \} )
+[1] "too big"
+> tryCatch( f( 12 ), 
++    "std::range_error" = function(e) \{
++       class( e )
++    \} )
+[1] "std::range_error" "C++Error"
+[3] "error"            "condition" 
+\end{example}
+
 \subsection{R error in C++}
 
 R currently does not offer C-level mechanisms to deal with errors. To 
 overcome this problem, \pkg{Rcpp} uses the \code{Rcpp::Evaluator}
 class to evaluate an expression with an R-level \code{tryCatch}
 block. The error, if any, that occurs while evaluating the 
-function is then translated into an C++ exception. 
+function is then translated into an C++ exception that can be dealt with using 
+regular C++ try/catch syntax.
 
 \section{Performance comparison}
 
@@ -809,9 +817,7 @@
 from R to C++ and back.
 
 Here we illustrate how to take advantage of \code{Rcpp} to get
-the best of both worlds. The classic \pkg{Rcpp} translation of the convolve example from
-\cite{R:exts} appears twice above where the second example showed the use
-with the new API.
+the best of both worlds. 
 
 The implementation of the \code{operator[]} is designed as 
 efficiently as possible, using both inlining and caching,