[Rcpp-commits] r520 - papers/rjournal

noreply at r-forge.r-project.org noreply at r-forge.r-project.org
Sat Jan 30 20:06:54 CET 2010


Author: edd
Date: 2010-01-30 20:06:54 +0100 (Sat, 30 Jan 2010)
New Revision: 520

Modified:
   papers/rjournal/EddelbuettelFrancois.tex
Log:
some more spit, polish and presumably the usual array of typos


Modified: papers/rjournal/EddelbuettelFrancois.tex
===================================================================
--- papers/rjournal/EddelbuettelFrancois.tex	2010-01-30 18:13:55 UTC (rev 519)
+++ papers/rjournal/EddelbuettelFrancois.tex	2010-01-30 19:06:54 UTC (rev 520)
@@ -1,3 +1,5 @@
+%% Emacs please consider this:  -*- mode: latex; TeX-master: "RJwrapper.tex"; -*-
+
 \title{Mesh R and C++ with Rcpp}
 \author{by Dirk Eddelbuettel and Romain Franc\c{c}ois}
 
@@ -7,8 +9,16 @@
 
 \section{Introduction}
 
-\subsection{Overview}
+TBD, probably close to last
 
+One idea: call 'classic Rcpp' a \textsl{vertical} approach as it is chiefly
+concerned with getting data from R to C++ and back from C++ to R. On the
+other hand, 'new Rcpp' is more \textsl{horizontal as well as vertical} as it
+also significantly eases access from C++ itself (and to the C++
+representation of R objects).  Does that make sense?
+
+\section{Overview}
+
 % [Romain:] the overview is really messy and probably 
 % needs a complete rewrite when all other sections are finished
 % [Dirk:] Agreed, see Gelman piece. We need more meat on the bones first.
@@ -42,14 +52,15 @@
 \pkg{Rcpp} takes advantage of some features of the forthcoming \code{C++0x} 
 standard, already supported by recent versions of the GCC.
 
-\subsection{Context}
+\subsection{Historical Context}
 
-\pkg{Rcpp} was first released in 2005 as a contribution
-to the \pkg{RQuantLib} package \citep{eddelbuettelkhan09:rquantlib}.
-\pkg{Rcpp} was then released as a package of the same name in early 2006,
-quickly followed by several releases before being renamed to
-\pkg{RcppTemplate}. More releases followed during 2006 under the new name,
-but no releases or updates were made during 2007, 2008 or most of 2009.
+\pkg{Rcpp} first appeared in 2005 as a contribution to the \pkg{RQuantLib}
+package \citep{eddelbuettelkhan09:rquantlib} before being released as a CRAN
+package in early 2006. Several releases followed in quick succession; all of
+these were under the name \pkg{Rcpp}. The package was then renamed to
+\pkg{RcppTemplate} and several more releases followed during 2006 under the
+new name.  However, no new releases or updates were made during 2007, 2008
+and most of 2009.
 
 Given the continued use of the package, it was revived using the former name
 \pkg{Rcpp}. New releases started in November 2008 which include an improved
@@ -58,25 +69,24 @@
 `classic \pkg{Rcpp}' interface (described in the next section)
 which will be provided for the forseeable future.
 
-Yet C++ coding standards continued to evolved. So, in late 2009 the codebase
-was significantly extended and numerous new features were added.  Several of
-these are described below following section
-This constitutes the `enhanced \pkg{Rcpp}' interface which we 
-also intend to support going forward.
+Yet C++ coding standards continued to evolved. So, starting in late 2009 the
+codebase was significantly extended and numerous new features were added.
+Several of these are described below in the section on the the `New
+\pkg{Rcpp}' interface which we also intend to support going forward.
 
 \subsection{Comparison}
 
-Integration of C++ and R has been addressed by several authors starting with
-\cite{batesdebroy01:cppclasses}. \cite{javagailemanly07:r_cpp}, in an
-unpublished paper, express several ideas that are close to some of our
-approaches, though not yet fully fleshed out.
+Integration of C++ and R has been addressed by several authors; the earliest
+published reference is probably \cite{batesdebroy01:cppclasses}.
+An unpublished paper by \cite{javagailemanly07:r_cpp} expresses several ideas
+that are close to some of our approaches, though not yet fully fleshed out.
 %
 The \pkg{Rserve} package \citep{cran:Rserve} was another early approach,
-going back to 2002. On the server side, \pkg{Rserve} translates
-R data structures into a binary serialization format and uses TCP/IP
-for transfer. On the client side, objects are reconstructed as instances
-of C++ classes that emulate the structure of R objects.
-%
+going back to 2002. On the server side, \pkg{Rserve} translates R data
+structures into a binary serialization format and uses TCP/IP for
+transfer. On the client side, objects are reconstructed as instances of Java
+or C++ classes that emulate the structure of R objects. 
+
 The packages \pkg{rcppbind} \citep{liang08:rcppbind}, \pkg{RAbstraction}
 \citep{armstrong09:RAbstraction} and \pkg{RObjects}
 \citep{armstrong09:RObjects} are all implemented using C++ templates.
@@ -108,12 +118,23 @@
 % [Romain:] I'd argue it is still the case with the new api
 The core focus of \pkg{Rcpp}---particularly for the earlier API described in
 this section---has always been on allowing the programmer to add C++-based
-functions where we use this term in the standard mathematical sense of
-providing results (output) given a set of parameters or data (input). This
-was facilitated from the earliest releases using C++ classes for receiving
+functions. We use this term in the standard mathematical sense of providing
+results (output) given a set of parameters or data (input). This was
+facilitated from the earliest releases using C++ classes for receiving
 various types of R objects, converting them to C++ objects and allowing the
-programmer to return the results to R with relative use.
+programmer to return the results to R with relative use. 
 
+This API therefore supports two typical use cases. First, one can think of
+replacing existing R code with equivalent C++ code in order to reap
+performance gains.  This case can be conceptually easy as there may not be
+(built- or run-time) dependencies on other C or C++ libraries.  It typically
+involves setting up data and parameters---the right-hand side components of a
+function call---before making the call in order to provide the result that is
+to be assigned to the left-hand side. Second, \pkg{Rcpp} facilitates calling
+functions provided by other libraries. The use resembles the first case: data
+and parameters are passed via \pkg{Rcpp} to a function set-up to call code
+from an external library.  
+
 An illustration can be provided using the time-tested example of a
 convolution of two vectors. This example is shown in sections 5.2 (for the
 \code{.C()} interface) and 5.9 (for the \code{.Call()} interface) of 'Writing
@@ -141,41 +162,49 @@
 \}
 \end{example}
 
-We can highlight several aspects. First, only one header file is needed.
-Second, given two \code{SEXP} types---the bread-and-butter of all internal R
-programming---a third is returned.  Third, both inputs are converted to C++
-vector types that are \textsl{templated} (which means that such a vector
-template can use used to create vectors of different base types). Here a
-standard \code{double} type is used to create a vector of doubles from the
-template type.
+We can highlight several aspects. First, only a single header file
+\code{Rcpp.h} is needed to use the \pkg{Rcpp} API.  Second, given two
+\code{SEXP} types---the bread-and-butter of all internal R programming---a
+third is returned.  Third, both inputs are converted to C++ vector types that
+are \textsl{templated} (which means that such a vector template can use used
+to create vectors of different base types). Here a standard \code{double}
+type is used to create a vector of doubles from the template type.
 % [Romain:] I think the previous sentence is confusing, one might think
 % that the same vector can hold int and double
 % [Dirk:] Better?
 % [Romain:] I think so, maybe the (...) should be a footnote
+% [Dirk:] Sorry, which '(...)' ?
 Fourth, the usefulness off these classes can be seen when we query the
 vectors directly for their size---using the \code{size} member function---in
-order to reserved a new result type of appropriate length.  Fifth, the
+order to reserved a new result type of appropriate length whereas use based
+on C arrays would have required additional parameters for the length of
+vectors $a$ and $b$, leaving open the possibility of mismatches between the
+actual length and the length reported by the programmer.  Fifth, the
 computation itself is straightforward embedded looping just as in the
 original examples in the 'Writing R Extensions' manual \citep{R:exts}.
 Sixth, a return type (\code{RcppResultSet}) is then prepared as a named
 object (something that should be familiar to R programmers) which is then
-converted to a list object that is returned.
+converted to a list object that is returned.  We should note that the
+\code{RcppResultSet} permits the return of numerous (named) objects which can
+also be of different types.
 
 We argue that this usage is already easier to read, write and debug than the
 C macro-based approach supported by R itself. Possible performance issues and
 other potentual limitations will be discussed throughout the article and
 reviewed at the end.
 
-\section{inline code}
+\section{Using code `inline'}
 
-Extending R with compiled code also needs to address how to reliably load the
-code.  While this can be achieved directly using \code{dyn.load}, using a
-package is preferable in the long run.  Another option is
+Extending R with compiled code also needs to address how to reliably compile,
+link and load the code.  While using a package is preferable in the long run,
+it may be to heavy a framework for quick explorations.  An alternative is
 provided by the \pkg{inline} package \citep{cran:inline} which compiles,
-links and loads a C or C++ function---directly from the R prompt.  It was
-recently extended to work with \pkg{Rcpp} by allowing for additional header
-files and libraries, and in particularly those used by the \pkg{Rcpp} package
-which are automatically located and used.
+links and loads a C, C++ or Fortran function---directly from the R prompt
+using a simple function \code{cfunction}.  It was recently extended to work
+with \pkg{Rcpp} by allowing for the use of additional header files and
+libraries. This works particularly well with the \pkg{Rcpp} package where
+headers and the library are automatically found if the appropriate option
+\code{Rcpp} to \texttt{cfunction} is set to true.
 
 % [Romain] : the next paragraph is very confusing
 % [Dirk] Is this better?
@@ -185,20 +214,50 @@
 %          it might also be useful to show a quick example of inlining
 %          c++ code, for example say that we use it for our unit tests
 %          and show an example unit test
-The use of \pkg{inline} is possible as \pkg{Rcpp} can be used and updated
-just like any other R package. Even though it provides a library and header
-files for other packages to use, it can be installed via
-\code{install.packages()} just like other CRAN packages. Similarly, new
-versions can be obtained via \code{update.packages()}.  What makes \pkg{Rcpp}
-useful for other packages for their interfacing of R and C++ is that it is
-provided as a dynamic library.\footnote{Windows users however only obtain a
-  static library though this could be changed.} The location of this library,
-and the associated compiler and header arguments can be queried directly from
-the installed package using the functions \code{Rcpp:::CxxFlags()} and
-\code{Rcpp:::LdFlags()}.  So even though R / C++ interfacing would otherwise
-require source code, the Rcpp library is always provided ready for use as a
-pre-built library through the CRAN package mechanism.
+The use of \pkg{inline} is possible as \pkg{Rcpp} can be installed and
+updated just like any other R package using \textsl{e.g.} the
+\code{install.packages()} function for initial installation as well as
+\code{update.packages()} for upgrades.  So even though R / C++ interfacing
+would otherwise require source code, the \pkg{Rcpp} library is always provided
+ready for use as a pre-built library through the CRAN package mechanism.
 
+The library and header files provided by \pkg{Rcpp} for use by other packages
+are installed along with the \pkg{Rcpp} package making it possible for
+\pkg{Rcpp} to provide the appropriate \code{-I} and \code{-L} switches needed
+for compilation and linking.  So internally, \pkg{inline} makes uses of the
+two functions \code{Rcpp:::CxxFlags()} and \code{Rcpp:::LdFlags()} that
+provide this information (and which are also used by \code{Makefiles} of
+other packages).  Here, however, all this is done behind the scenes and the
+user need not worry about compiler or linker options or settings.
+
+The convolution example provided above now can be rewritten for use by
+\pkg{inline} as shown here.  The function body is provided by character
+variable \code{src}, the function header is defined by the argument
+\code{signature}---and we only need to enable \code{Rcpp=TRUE} to obtain a
+new function \code{fun} based on the C++ code in \code{src}:
+\begin{example}
+src <- '
+  RcppVector<double> xa(a);
+  RcppVector<double> xb(b);
+  int nab = xa.size() + xb.size() - 1;
+
+  RcppVector<double> xab(nab);
+  for (int i = 0; i < nab; i++) xab(i) = 0.0;
+
+  for (int i = 0; i < xa.size(); i++)
+    for (int j = 0; j < xb.size(); j++)
+       xab(i + j) += xa(i) * xb(j);
+
+  RcppResultSet rs;
+  rs.add("ab", xab);
+  return rs.getReturnList();
+';
+fun <- cfunction(signature(a="numeric", 
+                           b="numeric"),
+                 src, Rcpp=TRUE)
+\end{example}
+
+
 \section{New \pkg{Rcpp} API}
 \label{sec:new_rcpp}
 
@@ -241,6 +300,7 @@
 
 The \code{RObject} class also defines a set of member functions that
 can be used on any R object, regardless of its type.
+% [Dirk]: Do we need the table if we shorten the paper?}
 
 \begin{center}
 \begin{small}



More information about the Rcpp-commits mailing list