[Rcpp-commits] r568 - papers/rjournal
noreply at r-forge.r-project.org
noreply at r-forge.r-project.org
Thu Feb 4 18:01:25 CET 2010
Author: romain
Date: 2010-02-04 18:01:24 +0100 (Thu, 04 Feb 2010)
New Revision: 568
Modified:
papers/rjournal/EddelbuettelFrancois.tex
Log:
adapt wrap section to current wrap, rework RObject section, etc ...
Modified: papers/rjournal/EddelbuettelFrancois.tex
===================================================================
--- papers/rjournal/EddelbuettelFrancois.tex 2010-02-04 12:44:01 UTC (rev 567)
+++ papers/rjournal/EddelbuettelFrancois.tex 2010-02-04 17:01:24 UTC (rev 568)
@@ -217,6 +217,9 @@
% c++ code, for example say that we use it for our unit tests
% and show an example unit test
% [Dirk] Done in last round
+% [Romain] But this shows the old api !!! and the same code as above so that
+% people get to see it twice. I'd prefer moving these bits after
+% the new Rcpp api section and show new api code inlined
The use of \pkg{inline} is possible as \pkg{Rcpp} can be installed and
updated just like any other R package using \textsl{e.g.} the
\code{install.packages()} function for initial installation as well as
@@ -269,136 +272,64 @@
based on the usage experience of several years of Rcpp deployment, as well as
current C++ design approaches.
-% we should include key design aspects here.
-% what are they ?
-% - thin wrappers : an RObject only contains a SEXP, no copy
-% - RAII
-% - member functions define the extent of what is possible to do with an
-% object, instead of the catch all SEXP
-% - easy translation between R and c++ types
-% - need to talk about implicit conversion somewhere
-%
-% [Dirk] Sounds great -- give a go!
+\subsection{Rcpp Class hierarchy}
+The \code{Rcpp::RObject} class is the basic class of the new Rcpp api.
+An instance of the \code{RObject} class encapsulates an R object
+(\code{SEXP}), exposes methods that are appropriate for all types
+of objects and transparently manage garbage collection.
-\subsection{The RObject class}
+The most important aspect of the \code{RObject} class is that it is
+a very thin wrapper around the \code{SEXP} it encapsulates, the
+\code{SEXP} is indeed the only data member of an \code{RObject}.
-% [Romain] this needs cleaning
-Here, the \code{RObject} class is the base class of all
-objects in the extended API of the \pkg{Rcpp} package. An \code{RObject} has only one
-data member, the protected \code{SEXP} it encapsulates. The \code{RObject}
-treats the \code{SEXP} as a resource, following the RAII (resource
-acquisition is initialization) pattern. As long as the \code{RObject}
-instance is alive, its underlying \code{SEXP} remains protected from garbage
-collection. When the \code{RObject} goes out of scope (either via a function
-return or through an exception), it removes the protection so that if the \code{SEXP} is not
-otherwise protected when it becomes subject to garbage collection.
+The \code{RObject} class takes advantage of the explicit life cyle of
+c++ objects to implement garbage collection of R objects. The
+\code{RObject} effectively treats its underlying \code{SEXP} as
+a resource. The constructor of the \code{RObject} class takes
+the necessary measures to guarantee that the underlying \code{SEXP}
+is protected from the garbage collector, and the destructor
+assumes the responsability to withdraw that protection.
-% [Dirk]: Shorten and make a footnote?
-% [Romain]: yes, but the whole section needs cleaning anyway
-Garbage collection is only mentioned here to illustrate the basic design
-of the \code{RObject} class, the user of \pkg{Rcpp} need not to concern
-himself/herself with such matters and can instead focus on the problem
-that he/she is solving.
+By assuming the entire responsability of garbage collection, \code{Rcpp}
+relieves the programmer from writing boiler plate code to manage
+the protection stack with \code{PROTECT} and \code{UNPROTECT} macros.
-The \code{RObject} class also defines a set of member functions that
-can be used on any R object, regardless of its type.
-% [Dirk]: Do we need the table if we shorten the paper?
-% [Romain]: Probably not. Noth that interesting anyway.
+The \code{RObject} class defines a set of member functions that
+can be used on any R object, regardless of its type. The member
+functions \code{isNULL}, \code{isObject} and \code{isS4} can be
+used to query properties of the object.
-\begin{center}
-\begin{small}
-\begin{tabular}{cc}
-method & action \\
-\hline
-\code{isNULL} & is the object \code{NULL}\\
-\hline
-\code{attributeNames} & the names of its attributes\\
-\code{hasAttribute} & does it have a given attribute\\
-\code{attr} & retrieve or set an attribute \\
-\hline
-\code{isS4} & is it an S4 object \\
-\code{hasSlot} & if S4, does it have the given slot\\
-\code{slot} & retrieve a given slot \\
-\hline
-\end{tabular}
-\end{small}
-\end{center}
+Regarding attributes, the member functions
+\code{attributeNames} can be used to retrieve the names of the attributes,
+the \code{hasAttribute} can be used to query the existence of an attribute and
+the \code{attr} can be used to either get the current value of an
+attribute, or set the value to some other object.
+Similarly, the member functions \code{hasSlot} and \code{slot}
+can be used to manage slots of an S4 object. These function throw
+c++ exceptions when used on objects that are not S4 objects.
+
+% example of using attr or slot ?
+% mention proxy pattern ?
+
\subsection{Derived classes}
Internally, an R object must have one type amongst the set of
predefined types, commonly referred to as SEXP types. R internals
-\citep{R:ints} documents the various types. \pkg{Rcpp} associates
-a C++ class for most SEXP types.
+\citep{R:ints} documents these various types.
+\pkg{Rcpp} associates a dedicated C++ class for most SEXP types.
-% [Romain] I don't like this table anymore
-% including also the description of each SEXP type would make it better
-% but it then takes too much space
-%
-% maybe we need some sort of UML like diagram
-%
-% [Dirk] To be honest I never liked it much either. Good go into an
-% Appendix, or we just pick a few key combinations and describe them in
-% text.
-%
-% [Romain] Please be honest, I'd rather have the comment from you than
-% from the reviewer. the text after will need some cleaning also then
-% [Dirk] I'd say cut. There is too much 'low-level' stuff here. I see the
-% paper as trying to interest a non-C/C++ programmer in trying Rcpp,
-% This scares children and grown me alike. Better for the 'long
-% paper' on all the juicy details.
-% But we need better context. How can we hash out what a concise and
-% and convincing section on 'New API' should look like? Show how
-% easy the code, and make a gentle mention of some of the key C++
-% technologies? I am open to any idea.
-\begin{center}
-\begin{small}
-\begin{tabular}{ccc}
-SEXP type & \pkg{Rcpp} class \\
-\hline
-\code{NILSXP} & \\
-\code{SYMSXP} & \code{Symbol} \\
-\code{LISTSXP} & \code{Pairlist} \\
-\code{CLOSXP} & \code{Function} \\
-\code{ENVSXP} & \code{Environment} \\
-\code{PROMSXP} & \code{Promise} \\
-\code{LANGSXP} & \code{Language} \\
-\code{SPECIALSXP} & \code{Function} \\
-\code{BUILTINSXP} & \code{Function} \\
-\code{CHARSXP} & \\
-\code{LGLSXP} & \code{LogicalVector} \\
-\code{INTSXP} & \code{IntegerVector} \\
-\code{REALSXP} & \code{NumericVector} \\
-\code{CPLXSXP} & \code{ComplexVector}\\
-\code{STRSXP} & \code{CharacterVector} \\
-\code{DOTSXP} & \code{Pairlist} \\
-\code{ANYSXP} & \\
-\code{VECSXP} & \code{List} \\
-\code{EXPRSXP} & \code{ExpressionVector}\\
-\code{BCODESXP} & \\
-\code{EXTPTRSXP} & \code{XPtr<T>}\\
-\code{WEAKREFSXP} & \code{WeakReference}\\
-\code{RAWSXP} & \code{RawVector}\\
-\code{S4SXP} & \\
-\hline
-\end{tabular}
-\end{small}
-\end{center}
-
-Some types do not have their own C++ class. \code{NILSXP} and
-\code{S4SXP} have their functionality covered by the \code{RObject}
-class; \code{ANYSXP} is just a placeholder to facilitate S4 dispatch
-(and no object in R has this type); and \code{BCODESXP} is not currently
-used.
-
Each class contains functionality that is relevant to the R object
-that it encapsulates. For example \code{Environment} contains
-member methods to query the list of objects in the associated environment,
-classes with the \code{Vector} overload the \code{operator[]} in order
-to extract/modify values at the given position in the vector, ...
+that it encapsulates. For example \code{Rcpp::Environment} contains
+member functions to manage objects in the associated environment.
+Classes related to vectors (\code{IntegerVector}, \code{NumericVector},
+\code{RawVector}, \code{LogicalVector}, \code{CharacterVector},
+\code{GenericVector} and \code{ExpressionVector}) expose functionality
+to extract and set values from the vectors, etc ...
-The rest of this section presents example uses of \pkg{Rcpp} classes.
+The following sub sections present typical uses Rcpp classes in
+comparison with the same code expressed using functions of the R api.
\subsection{numeric vector}
@@ -427,9 +358,8 @@
actual array; its indexing is does not resemble either R or C++.
\end{itemize}
-Using the \code{Rcpp::NumericVector}, the code can be rewritten:
+Using the \code{Rcpp::NumericVector} class, the code can be rewritten:
-
\begin{example}
Rcpp::NumericVector ab(2) ;
ab[0] = 123.45;
@@ -489,106 +419,106 @@
CharacterVector ab = \{"foo","bar"\};
\end{example}
+\section{Data interchange between R and C++}
-\section{wrap and as}
+In addition to classes, the \pkg{Rcpp} package contains two additional
+functions to perform conversion of C++ objects to R objects and back.
-Besides classes, the \pkg{Rcpp} package also contains utilities allowing
-conversion from R objects to C++ types and vice-versa. Through
-polymorphism, the \code{wrap} set of functions can be used to wrap
-some data structure into an \code{RObject} instance.
-
-In total, the \pkg{Rcpp} defines 23 different \code{wrap}
-functions, including :
+The C++ to R conversion is performed by the \code{Rcpp::wrap} templated
+function. It uses advanced template meta programming techniques
+to convert a wide and extensible set of types and classes to the
+most appropriate type of R object. \code{wrap} will
+currently handle these C++ types:
\begin{itemize}
-\item SEXP
-\item primitive types : \code{bool}, \code{int}, \code{double},
-\code{size\_t}, \code{unsigned char} (byte), \code{std::string} and
-\code{char*}
-\item STL vectors of these types: \code{vecor<int>},
-\code{vector<double>}, \code{vector<bool>}, \code{vector<unsigned char>},
-\code{vector<string>}
-\item STL sets : \code{set<int>}, \code{set<double>}, \code{set<unsigned char>},
-\code{set<string>}
-\item initializer lists (only available in G++ 4.4 or later).
+\item primitive types, \code{int}, \code{double}, ... are converted
+into R vectors of the appropriate type
+\item \code{std::string} are converted to R character vectors
+\item STL-like containers, e.g \code{std::vector<T>}, \code{std::list<T>},
+are wrappable as long as the type they contain (T) is wrappable.
+\item STL-like maps, e.g. \code{std::map<std::string,T>},
+which uses \code{std::string} for their keys, are wrappable as long as
+the type \code{T} is wrappable
+\item any type that implements implicit conversion to \code{SEXP}, through the
+\code{operator SEXP()} are wrappable
\end{itemize}
-Each type is wrapped in the most sensible class, e.g. \code{vector<double>}
-is wrapped into an \pkg{NumericVector} object, which in turns encapsulates
-a numeric vector (a \code{SEXP} of type \code{REALSXP}).
-Here are a few examples of \code{wrap} calls:
+In addition, the \code{wrap} template may be partially or fully specialized by
+third party code to extend its capabilities. The design allow composition,
+so for example objects of the class
+\code{std::vector< std::map<std::string,int> >} are wrappable. This is
+because \code{int} is wrappable (as a primitive type), consequently
+\code{std::map<std::string,int>} is wrappable (as an STL-like map of
+wrappable types keyed by strings, and therefore
+\code{std::vector< std::map<std::string,int> >} is wrappable (as a
+STL-like container of wrappable objects). The example code below
+illustrates this:
\begin{example}
-LogicalVector x1 = wrap( false );
-IntegerVector x2 = wrap( 1 ) ;
+std::vector< std::map<std::string,int> > v ;
-vector<double> v ;
-v.push_back(0.0); v.push_back( 1.0 );
-NumericVector x3 = wrap( v ) ;
+std::map< std::string, int > m1 ;
+m1["foo"] = 1 ; m1["bar"] = 2 ;
-// initializer list (only on GCC >= 4.4)
-LogicalVector x4 = wrap( \{ false, true\} );
-CharacterVector x5 = wrap( \{"foo", "bar"\} );
+std::map< std::string, int > m2 ;
+m2["foo"] = 1 ; m2["bar"] = 2 ; m2["bling"] = 3 ;
+
+v.push_back( m1) ;
+v.push_back( m2) ;
+
+wrap( v ) ;
\end{example}
-Similarly, converting an R object to a C++ standard type is implemented
-by variations on the \code{as} template function. In this case, we must
-use the angle brackets to specify which version of as we want to use.
+The code creates a list of two named vectors, equal to the list that
+can be created by the following R code:
\begin{example}
-bool x = as<bool>(x) ;
-double x = as<double>(x) ;
-vector<int> x = as< vector<int> >(x) ;
+list( c( bar = 2L, foo = 1L) , c( bar = 2L, bling = 3L, foo = 1L) )
\end{example}
-\section{external pointers}
+The reversed conversion is implemented by variations of the
+\code{Rcpp::as} template. \code{as} offers less flexibility and currently
+handles conversion of R objects into primitive types (bool, int, std::string, ...),
+STL vectors of primitive types (\code{std::vector<bool>},
+\code{std::vector<double>}, etc ...) and arbitrary types that offer
+a constructor that takes a \code{SEXP}. In addition \code{as} can
+be fully or partially specialized to manage conversion of R data
+structures to third party types.
-In addition to primitive data types, R can handle arbitrary pointers
-by encapsulating the pointer in a special R object, the external
-pointer. \cite{R:exts} documents the available API R has to offer to
-deal with external pointers.
+The converters offered by \code{wrap} and \code{as} provide a very
+useful framework to implement the logic of the code in terms of C++
+data structures and then explicitely convert data back to R, ...
-\pkg{Rcpp} takes advantage of C++ templates and smart pointers and
-defines the templated class \code{XPtr} that acts as a smart
-pointer to the underlying C++ object.
+The converters are also used implicitely in various places in the
+\code{Rcpp} api. Consider the following code that uses the
+\code{Rcpp::Environment} class to interchange data between C++ and R.
-Assuming we get from R an external pointer to a \code{std::vector<int>}
-c++ object, we can manipulate it as such using the \code{XPtr} class:
-
\begin{example}
-// xp is an external pointer
-// to a std::vector<int>
-XPtr< std::vector<int> > p(xp) ;
-p->push\_back(1) ;
-p->push\_back(2) ;
-p->size() ;
-\end{example}
+# assuming the global environment contains
+# a variable 'x' that is a numeric vector
+Rcpp::Environment global = Rcpp::Environment::global_env()
-The \code{XPtr} class directly derives from the \code{RObject} class.
-Thanks to its template parameter and overloading of the \code{->}
-and \code{*} operators, objects of the \code{XPtr<Foo>} generated
-class look and feel like raw pointers (\code{Foo*}).
+# extract a std::vector<double> from the global environment
+std::vector<double> vx = global["x"] ;
-Making an external pointer from a raw pointer is equally easy using
-another constructor.
+# create a map<string,string>
+std::map<std::string,std::string> map ;
+map["foo"] = "oof" ;
+map["bar"] = "rab" ;
-\begin{example}
-std::vector<int> *pv = new std::vector<int> ;
-XPtr< std::vector<int> > p(pv,true) ;
+# push the STL map to the global environment
+global["y"] = map ;
\end{example}
-The creation of the instance of the \code{XPtr< std::vector<int> >}
-smart extenal pointer to a \code{std::vector<int>} hides the
-R API that is typically used for external pointers, including registration
-of a finalizer to be executed to free the memory of the vector when the
-external pointer goes out of scope.
+In the first part of the example, \code{as} is used implicitely to convert
+the object "x" from the global environment into an instance
+of the \code{std::vector<double>} class. In the second part of the example,
+\code{wrap} is used implicitely to convert the object of class
+\code{std::map<std::string,std::string>} into an R object, a named
+character vector in this case.
\section{other examples}
The last example shows how to use \pkg{Rcpp} to emulate the R code below.
-For more examples, the reader is invited to
-refer to the comprehensive documentation included in \pkg{Rcpp}
-as well as the many examples that the package contains as part of
-its unit tests.
\begin{example}
> rnorm( 10L, sd = 100.0 )
@@ -605,9 +535,8 @@
We first pull out the \code{rnorm} function from the environment
called \samp{package:stats} in the search path, then call the function
-using syntax similar to calling the function in R. The \code{Named}
-class is an utility class that helps emulating the use of
-named arguments.
+using syntax similar to calling the function in R. The \code{Rcpp::Named}
+class is an utility class that is used to emulate named arguments.
The second version shows the use of the \code{Language} class, which
manage calls (LANGSXP).
@@ -618,15 +547,11 @@
\end{example}
In this version, we first create a call to the symbol "rnorm" and
-evaluate the call in the global environment, this is similar to the
-R code :
+evaluate the call in the global environment. In both cases, \code{wrap}
+is used implicitely to convert \code{10} and \code{100}
+into R integer vectors.
-\begin{example}
-> eval( call( "rnorm", 10L, sd = 100 ) )
-\end{example}
-
-Using the R API, the first example, using the actual
-\code{rnorm} function,
+Using the R API, the first example, using the actual \code{rnorm} function,
translates to :
\begin{example}
@@ -644,8 +569,9 @@
return res ;
\end{example}
-and the second example, using the \samp{rnorm} symbol, and therefore
-involving implicit lookup in hte search path, can be written as:
+and the second example, using the \samp{rnorm} symbol --- and therefore
+involving potentially expensive implicit lookup in the search path ---
+can be written as:
\begin{example}
SEXP call = PROTECT(
@@ -658,6 +584,10 @@
return res ;
\end{example}
+For more examples, the reader is invited to
+refer to the documentation included in \pkg{Rcpp}
+as well as the many examples that the package contains as part of
+its unit tests.
\section{Performance/Limitations}
@@ -680,7 +610,8 @@
\cite{R:exts} appears in section~\ref{sec:classic_rcpp}. With the new API,
the code can be written as shown below. The main difference is that the input
parameters are directly passed to types \code{Rcpp::NumericVector}, and that
-the return vector is automatically converted to a \code{SEXP} type.
+the return vector is automatically converted to a \code{SEXP} type through
+implicit conversion.
\begin{example}
#include <Rcpp.h>
@@ -702,31 +633,22 @@
\}
\end{example}
-Seemingly, this code is as efficient as it can be.
-However, when considering the implementation of the \code{operator[]}
-for the \code{NumericVector} class:
+The implementation of the \code{operator[]} is implemented as
+efficiently as possible, using inlining and caching,
+but the implementation above is however less efficient than the
+reference C imlementation described in \cite{R:exts}.
-% FIXME: not the case anymore, this has been optimized by caching the
-% pointer inside the NumericVector. This needs update
+In order to achieve maximulm effociency, the reference implementation
+extracts the underlying array pointer : \code{double*} and works
+with pointer arithmetics, which is a built-in operation as opposed to
+calling the \code{operator[]} on a user-defined class which has to
+pay the price of object encapsulation.
-\begin{example}
-inline double& operator[]( const int& i ) {
- return REAL(m_sexp)[i];
-}
-\end{example}
-
-Each call to the \code{operator[]} on a \code{NumericVector}
-calls the \code{REAL} macro of the R API to retrieve the pointer to the
-underlying array of \code{double}. The code in \cite{R:exts} is much
-more parsimonious with exactly only 3 calls to the \code{REAL} macro,
-delegating extraction to pointer arithmetics which are usually much more
-efficient.
-
-The \code{NumericVector} class provides two member functions \code{begin}
+Modelled after containers of the standard template library,
+the \code{NumericVector} class provides two member functions \code{begin}
and \code{end} that can use used to retrieve respectively
-the pointer to the first element and to the element after the last element
-of the underlying array. We can revisit the code to take advantage
-of \code{begin} :
+the pointer to the first and past to end elements of the underlying array.
+We can revisit the code to take advantage of this feature :
\begin{example}
#include <Rcpp.h>
@@ -768,6 +690,11 @@
\end{tabular}
\end{center}
+% need to comment the results, give reasons why the RcppVector<double> is
+% 10 times less efficient than the reference, show that 55-36 is the price for
+% encapsulation and say that the difference between 34 and 36 is not
+% significant
+
\section{Summary}
% The \code{Rcpp} package provides comprehensive set of C++
More information about the Rcpp-commits
mailing list