[Rcpp-commits] r571 - papers/rjournal

Fri Feb 5 03:30:07 CET 2010

Author: edd
Date: 2010-02-05 03:30:04 +0100 (Fri, 05 Feb 2010)
New Revision: 571

Modified:
   papers/rjournal/EddelbuettelFrancois.tex
Log:
another round of changes


Modified: papers/rjournal/EddelbuettelFrancois.tex
===================================================================

--- papers/rjournal/EddelbuettelFrancois.tex	2010-02-04 17:04:18 UTC (rev 570)
+++ papers/rjournal/EddelbuettelFrancois.tex	2010-02-05 02:30:04 UTC (rev 571)
@@ -116,6 +116,8 @@
 % [Romain:] Why 'at least initial'
 % [Dirk:] For 'Classic Rcpp'
 % [Romain:] I'd argue it is still the case with the new api
+% [Dirk:] Conceded in last rewrite: 'has always been'  
+%         (and I think we can nuke the comments)
 The core focus of \pkg{Rcpp}---particularly for the earlier API described in
 this section---has always been on allowing the programmer to add C++-based
 functions. We use this term in the standard mathematical sense of providing
@@ -195,75 +197,6 @@
 other potentual limitations will be discussed throughout the article and
 reviewed at the end.
 
-\section{Using code `inline'}
-
-Extending R with compiled code also needs to address how to reliably compile,
-link and load the code.  While using a package is preferable in the long run,
-it may be to heavy a framework for quick explorations.  An alternative is
-provided by the \pkg{inline} package \citep{cran:inline} which compiles,
-links and loads a C, C++ or Fortran function---directly from the R prompt
-using a simple function \code{cfunction}.  It was recently extended to work
-with \pkg{Rcpp} by allowing for the use of additional header files and
-libraries. This works particularly well with the \pkg{Rcpp} package where
-headers and the library are automatically found if the appropriate option
-\code{Rcpp} to \texttt{cfunction} is set to true.
-
-% [Romain] : the next paragraph is very confusing
-% [Dirk] Is this better?
-% [Romain] Not sure. It seems to be only readable backwards. what about a 
-%          separate section before 'inline code' just about this
-% 
-%          it might also be useful to show a quick example of inlining
-%          c++ code, for example say that we use it for our unit tests
-%          and show an example unit test
-% [Dirk] Done in last round
-% [Romain] But this shows the old api !!! and the same code as above so that 
-%          people get to see it twice. I'd prefer moving these bits after 
-%          the new Rcpp api section and show new api code inlined
-The use of \pkg{inline} is possible as \pkg{Rcpp} can be installed and
-updated just like any other R package using \textsl{e.g.} the
-\code{install.packages()} function for initial installation as well as
-\code{update.packages()} for upgrades.  So even though R / C++ interfacing
-would otherwise require source code, the \pkg{Rcpp} library is always provided
-ready for use as a pre-built library through the CRAN package mechanism.
-
-The library and header files provided by \pkg{Rcpp} for use by other packages
-are installed along with the \pkg{Rcpp} package making it possible for
-\pkg{Rcpp} to provide the appropriate \code{-I} and \code{-L} switches needed
-for compilation and linking.  So internally, \pkg{inline} makes uses of the
-two functions \code{Rcpp:::CxxFlags()} and \code{Rcpp:::LdFlags()} that
-provide this information (and which are also used by \code{Makefiles} of
-other packages).  Here, however, all this is done behind the scenes and the
-user need not worry about compiler or linker options or settings.
-
-The convolution example provided above now can be rewritten for use by
-\pkg{inline} as shown here.  The function body is provided by character
-variable \code{src}, the function header is defined by the argument
-\code{signature}---and we only need to enable \code{Rcpp=TRUE} to obtain a
-new function \code{fun} based on the C++ code in \code{src}:
-\begin{example}
-src <- '
-  RcppVector<double> xa(a);
-  RcppVector<double> xb(b);
-  int nab = xa.size() + xb.size() - 1;
-
-  RcppVector<double> xab(nab);
-  for (int i = 0; i < nab; i++) xab(i) = 0.0;
-
-  for (int i = 0; i < xa.size(); i++)
-    for (int j = 0; j < xb.size(); j++)
-       xab(i + j) += xa(i) * xb(j);
-
-  RcppResultSet rs;
-  rs.add("ab", xab);
-  return rs.getReturnList();
-';
-fun <- cfunction(signature(a="numeric", 
-                           b="numeric"),
-                 src, Rcpp=TRUE)
-\end{example}
-
-
 \section{New \pkg{Rcpp} API}
 \label{sec:new_rcpp}
 
@@ -328,10 +261,10 @@
 \code{GenericVector} and \code{ExpressionVector}) expose functionality
 to extract and set values from the vectors, etc ...
 
-The following sub sections present typical uses Rcpp classes in
+The following sub-sections present typical uses Rcpp classes in
 comparison with the same code expressed using functions of the R api.
 
-\subsection{numeric vector}
+\subsection{Numeric vectors}  % [Dirk] I think we need upper case
 
 The following code snippet is extracted from Writing R extensions
 \citep{R:exts}. It creates a \code{numeric} vector of two elements 
@@ -347,16 +280,25 @@
 
 Although this is one of the simplest examples in Writing R extensions, 
 it seems verbose and it is not trivial at first sight what is happening.
-\begin{itemize}
-\item \code{allocVector} is used to allocate memory. We must supply to it 
-the type of data (\code{REALSXP}) and the number of elements.
-\item once allocated, the \code{ab} object must be protected from
-garbage collection. Since the garbage collector can happen at any time, 
-not protecting an object means its memory might be reclaimed before we are
-finished with it.
-\item The \code{REAL} macro returns a pointer to the beginning of the 
-actual array; its indexing is does not resemble either R or C++.
-\end{itemize}
+%\begin{itemize}
+%\item \code{allocVector} is used to allocate memory. We must supply to it 
+%the type of data (\code{REALSXP}) and the number of elements.
+%\item once allocated, the \code{ab} object must be protected from
+%garbage collection. Since the garbage collector can happen at any time, 
+%not protecting an object means its memory might be reclaimed before we are
+%finished with it.
+%\item The \code{REAL} macro returns a pointer to the beginning of the 
+%actual array; its indexing is does not resemble either R or C++.
+%\end{itemize}
+% [Dirk] More compact without enumerate list?
+\code{allocVector} is used to allocate memory; we must also supply it with
+the type of data (\code{REALSXP}) and the number of elements.  Once
+allocated, the \code{ab} object must be protected from garbage
+collection. Since the garbage collector can happen at any time, not
+protecting an object means its memory might be reclaimed before we are
+finished with it. Lastly, the \code{REAL} macro returns a pointer to the
+beginning of the actual array; its indexing is does not resemble either R or
+C++.
 
 Using the \code{Rcpp::NumericVector} class, the code can be rewritten: 
 
@@ -366,17 +308,25 @@
 ab[1] = 67.89;
 \end{example}
 
-The code contains much less idiomatic decorations. Here are the steps involved: 
-\begin{itemize}
-\item The \code{NumericVector} constructor is given the number
-of elements the vector contains (2), this hides a call to the 
-\code{allocVector} we saw previously. 
-\item Also hidden is protection of the 
-object from garbage collection, which is a behavior that \code{NumericVector}
-inherits from \code{RObject}
-\item values are assigned to the first and second elements of the vector. 
-This is achieved \code{NumericVector} overloads the \code{operator[]}.
-\end{itemize}
+% The code contains much less idiomatic decorations. Here are the steps involved: 
+% \begin{itemize}
+% \item The \code{NumericVector} constructor is given the number
+% of elements the vector contains (2), this hides a call to the 
+% \code{allocVector} we saw previously. 
+% \item Also hidden is protection of the 
+% object from garbage collection, which is a behavior that \code{NumericVector}
+% inherits from \code{RObject}
+% \item values are assigned to the first and second elements of the vector. 
+% This is achieved \code{NumericVector} overloads the \code{operator[]}.
+% \end{itemize}
+% [Dirk] Idem: no bullets 
+The code contains fewer idiomatic decorations. The \code{NumericVector}
+constructor is given the number of elements the vector contains (2), this
+hides a call to the \code{allocVector} we saw previously. Also hidden is
+protection of the object from garbage collection, which is a behavior that
+\code{NumericVector} inherits from \code{RObject}.  Values are assigned to
+the first and second elements of the vector as \code{NumericVector} overloads
+the \code{operator[]}.
 
 With the most recent compilers (e.g. G++ >= 4.4) which already implement
 parts of the forthcoming C++ standard (C++0x), the preceding code may even be
@@ -386,7 +336,7 @@
 Rcpp::NumericVector ab = \{123.45, 67.89\};
 \end{example}
 
-\subsection{character vectors}
+\subsection{Character vectors}
 
 A second example deals with character vectors and emulates this R code
 
@@ -419,7 +369,7 @@
 CharacterVector ab = \{"foo","bar"\};
 \end{example}
 
-\section{Data interchange between R and C++}
+\section{R and C++ data interchange} % [Dirk] Reorder to fit on 1 line
 
 In addition to classes, the \pkg{Rcpp} package contains two additional
 functions to perform conversion of C++ objects to R objects and back. 
@@ -431,15 +381,15 @@
 currently handle these C++ types: 
 \begin{itemize}
 \item primitive types, \code{int}, \code{double}, ... are converted 
-into R vectors of the appropriate type
-\item \code{std::string} are converted to R character vectors
-\item STL-like containers, e.g \code{std::vector<T>}, \code{std::list<T>}, 
-are wrappable as long as the type they contain (T) is wrappable. 
-\item STL-like maps, e.g. \code{std::map<std::string,T>}, 
-which uses \code{std::string} for their keys, are wrappable as long as 
-the type \code{T} is wrappable
-\item any type that implements implicit conversion to \code{SEXP}, through the 
-\code{operator SEXP()} are wrappable
+into R vectors of the appropriate type;
+\item \code{std::string} are converted to R character vectors;
+\item STL containers such as \code{std::vector<T>} or \code{std::list<T>}
+are wrappable as long as the template type T that they contain is wrappable;
+\item STL maps (e.g. \code{std::map<std::string,T>});
+which uses \code{std::string} for keys are also wrappable as long as 
+the type \code{T} is wrappable;
+\item any type that implements implicit conversion to \code{SEXP} through the 
+\code{operator SEXP()} are wrappable.
 \end{itemize}
 
 In addition, the \code{wrap} template may be partially or fully specialized by
@@ -460,7 +410,7 @@
 m1["foo"] = 1 ; m1["bar"] = 2 ;
 
 std::map< std::string, int > m2 ;
-m2["foo"] = 1 ; m2["bar"] = 2 ; m2["bling"] = 3 ;
+m2["foo"] = 1; m2["bar"] = 2; m2["bling"] = 3;
 
 v.push_back( m1) ;
 v.push_back( m2) ;
@@ -520,7 +470,7 @@
 \code{std::map<std::string,std::string>} into an R object, a named
 character vector in this case.
 
-\section{other examples}
+\section{Other examples}
 
 The last example shows how to use \pkg{Rcpp} to emulate the R code below.
 
@@ -593,6 +543,78 @@
 as well as the many examples that the package contains as part of 
 its unit tests. 
 
+\section{Using code `inline'}
+\label{sec:inline}
+
+Extending R with compiled code also needs to address how to reliably compile,
+link and load the code.  While using a package is preferable in the long run,
+it may be to heavy a framework for quick explorations.  An alternative is
+provided by the \pkg{inline} package \citep{cran:inline} which compiles,
+links and loads a C, C++ or Fortran function---directly from the R prompt
+using a simple function \code{cfunction}.  It was recently extended to work
+with \pkg{Rcpp} by allowing for the use of additional header files and
+libraries. This works particularly well with the \pkg{Rcpp} package where
+headers and the library are automatically found if the appropriate option
+\code{Rcpp} to \texttt{cfunction} is set to true.
+
+% [Romain] : the next paragraph is very confusing
+% [Dirk] Is this better?
+% [Romain] Not sure. It seems to be only readable backwards. what about a 
+%          separate section before 'inline code' just about this
+% 
+%          it might also be useful to show a quick example of inlining
+%          c++ code, for example say that we use it for our unit tests
+%          and show an example unit test
+% [Dirk] Done in last round
+% [Romain] But this shows the old api !!! and the same code as above so that 
+%          people get to see it twice. I'd prefer moving these bits after 
+%          the new Rcpp api section and show new api code inlined
+% [Dirk]  Agreed -- Will to past 'New Cpp API'
+The use of \pkg{inline} is possible as \pkg{Rcpp} can be installed and
+updated just like any other R package using \textsl{e.g.} the
+\code{install.packages()} function for initial installation as well as
+\code{update.packages()} for upgrades.  So even though R / C++ interfacing
+would otherwise require source code, the \pkg{Rcpp} library is always provided
+ready for use as a pre-built library through the CRAN package mechanism.
+
+The library and header files provided by \pkg{Rcpp} for use by other packages
+are installed along with the \pkg{Rcpp} package making it possible for
+\pkg{Rcpp} to provide the appropriate \code{-I} and \code{-L} switches needed
+for compilation and linking.  So internally, \pkg{inline} makes uses of the
+two functions \code{Rcpp:::CxxFlags()} and \code{Rcpp:::LdFlags()} that
+provide this information (and which are also used by \code{Makefiles} of
+other packages).  Here, however, all this is done behind the scenes and the
+user need not worry about compiler or linker options or settings.
+
+The convolution example provided above now can be rewritten for use by
+\pkg{inline} as shown here.  The function body is provided by character
+variable \code{src}, the function header is defined by the argument
+\code{signature}---and we only need to enable \code{Rcpp=TRUE} to obtain a
+new function \code{fun} based on the C++ code in \code{src} where we also
+switched fromn the classic Rcpp API to the new one:
+\begin{example}
+src <- '
+  Rcpp::NumericVector xa(a);
+  Rcpp::NumericVector xb(b);
+  int n_xa = xa.size(), n_xb = xb.size();
+  int nab = n_xa + n_xb - 1;
+  Rcpp::NumericVector xab(nab);
+  for (int i = 0; i < nab; i++) xab[i] = 0.0;
+  for (int i = 0; i < n_xa; i++)
+    for (int j = 0; j < n_xb; j++)
+       xab[i + j] += xa[i] * xb[j];
+  return xab;
+';
+fun <- cfunction(signature(a="numeric", 
+                           b="numeric"),
+                 src, Rcpp=TRUE)
+\end{example}
+
+The main difference to the previous solution is that the input parameters are
+directly passed to types \code{Rcpp::NumericVector}, and that the return
+vector is automatically converted to a \code{SEXP} type through implicit
+conversion.
+
 \section{Performance/Limitations}
 
 In this section, we illustrate that C++ features come with a price
@@ -611,38 +633,37 @@
 
 In this section, we illustrate how to take advantage of \code{Rcpp} to get
 the best of it. The classic Rcpp translation of the convolve example from
-\cite{R:exts} appears in section~\ref{sec:classic_rcpp}.  With the new API,
-the code can be written as shown below. The main difference is that the input
-parameters are directly passed to types \code{Rcpp::NumericVector}, and that
-the return vector is automatically converted to a \code{SEXP} type through 
-implicit conversion.
+\cite{R:exts} appears in sections~\ref{sec:classic_rcpp} and
+\ref{sec:inline} where the latter example showed the use with the new API.
+%
+% [Dirk] Showing this example is now a little redundant as we just showed it
+%        for inline.  Shall we nuke it?
+% \begin{example}
+% #include <Rcpp.h>
 
-\begin{example}
-#include <Rcpp.h>
+% RcppExport SEXP convolve3cpp(SEXP a, SEXP b)\{
+%     Rcpp::NumericVector xa(a);
+%     Rcpp::NumericVector xb(b);
+%     int n_xa = xa.size() ;
+%     int n_xb = xb.size() ;
+%     int nab = n_xa + n_xb - 1;
+%     Rcpp::NumericVector xab(nab);
 
-RcppExport SEXP convolve3cpp(SEXP a, SEXP b)\{
-    Rcpp::NumericVector xa(a);
-    Rcpp::NumericVector xb(b);
-    int n_xa = xa.size() ;
-    int n_xb = xb.size() ;
-    int nab = n_xa + n_xb - 1;
-    Rcpp::NumericVector xab(nab);
+%     for (int i = 0; i < nab; i++) xab[i] = 0.0;
+%     for (int i = 0; i < n_xa; i++)
+%         for (int j = 0; j < n_xb; j++) 
+%             xab[i + j] += xa[i] * xa[j];
 
-    for (int i = 0; i < nab; i++) xab[i] = 0.0;
-    for (int i = 0; i < n_xa; i++)
-        for (int j = 0; j < n_xb; j++) 
-            xab[i + j] += xa[i] * xa[j];
-
-    return xab ;
-\}
-\end{example}
-
+%     return xab ;
+% \}
+% \end{example}
+%
 The implementation of the \code{operator[]} is implemented as 
 efficiently as possible, using inlining and caching, 
-but the implementation above is however less efficient than the 
+but this implementation is still less efficient than the 
 reference C imlementation described in \cite{R:exts}. 
 
-In order to achieve maximulm effociency, the reference implementation
+In order to achieve maximulm efficiency, the reference implementation
 extracts the underlying array pointer : \code{double*} and works 
 with pointer arithmetics, which is a built-in operation as opposed to 
 calling the \code{operator[]} on a user-defined class which has to