[Rprotobuf-commits] r701 - papers/rjournal

noreply at r-forge.r-project.org noreply at r-forge.r-project.org
Sat Jan 4 02:06:19 CET 2014


Author: edd
Date: 2014-01-04 02:06:19 +0100 (Sat, 04 Jan 2014)
New Revision: 701

Added:
   papers/rjournal/eddelbuettel-stokely.Rnw
   papers/rjournal/eddelbuettel-stokely.bib
Removed:
   papers/rjournal/eddelbuettel-francois-stokely.Rnw
   papers/rjournal/eddelbuettel-francois-stokely.bib
Modified:
   papers/rjournal/Makefile
   papers/rjournal/RJwrapper.tex
Log:
renaming

Modified: papers/rjournal/Makefile
===================================================================
--- papers/rjournal/Makefile	2014-01-04 01:01:54 UTC (rev 700)
+++ papers/rjournal/Makefile	2014-01-04 01:06:19 UTC (rev 701)
@@ -9,8 +9,8 @@
 	rm -fr RJwrapper.blg
 	rm -fr RJwrapper.brf
 
-RJwrapper.pdf: RJwrapper.tex eddelbuettel-francois-stokely.Rnw RJournal.sty
-	R CMD Sweave eddelbuettel-francois-stokely.Rnw
+RJwrapper.pdf: RJwrapper.tex eddelbuettel-stokely.Rnw RJournal.sty
+	R CMD Sweave eddelbuettel-stokely.Rnw
 	pdflatex RJwrapper.tex
 	bibtex RJwrapper
 	pdflatex RJwrapper.tex

Modified: papers/rjournal/RJwrapper.tex
===================================================================
--- papers/rjournal/RJwrapper.tex	2014-01-04 01:01:54 UTC (rev 700)
+++ papers/rjournal/RJwrapper.tex	2014-01-04 01:06:19 UTC (rev 701)
@@ -19,7 +19,7 @@
 
 %% replace RJtemplate with your article
 \begin{article}
-  \input{eddelbuettel-francois-stokely}
+  \input{eddelbuettel-stokely}
 
 \address{Dirk Eddelbuettel\\
   Debian and R Projects\\

Deleted: papers/rjournal/eddelbuettel-francois-stokely.Rnw
===================================================================
--- papers/rjournal/eddelbuettel-francois-stokely.Rnw	2014-01-04 01:01:54 UTC (rev 700)
+++ papers/rjournal/eddelbuettel-francois-stokely.Rnw	2014-01-04 01:06:19 UTC (rev 701)
@@ -1,1335 +0,0 @@
-% !TeX root = RJwrapper.tex
-% We don't want a left margin for Sinput or Soutput for our table 1.
-%\DefineVerbatimEnvironment{Sinput}{Verbatim} {xleftmargin=0em}
-%\DefineVerbatimEnvironment{Soutput}{Verbatim}{xleftmargin=0em}
-%\DefineVerbatimEnvironment{Scode}{Verbatim}{xleftmargin=2em}
-% Setting the topsep to 0 reduces spacing from input to output and
-% improves table 1.
-\fvset{listparameters={\setlength{\topsep}{0pt}}}
-\renewenvironment{Schunk}{\vspace{\topsep}}{\vspace{\topsep}}
-
-\title{RProtoBuf: Efficient Cross-Language Data Serialization in R}
-\author{by Dirk Eddelbuettel and Murray Stokely}
-
-%% DE: I tend to have wider option(width=...) so this
-%%     guarantees better line breaks
-<<echo=FALSE,print=FALSE>>=
-options(width=65, prompt="R> ", digits=4)
-@
-
-\maketitle
-
-\abstract{Modern data collection and analysis pipelines often involve
- a sophisticated mix of applications written in general purpose and
- specialized programming languages.  Protocol Buffers are a popular
- method of serializing structured data between applications---while remaining
- independent of programming languages or operating system.  The
- \CRANpkg{RProtoBuf} package provides a complete interface between this
- library and the R environment for statistical computing.
- %TODO(ms) keep it less than 150 words.
-}
-
-%TODO(de) 'protocol buffers' or 'Protocol Buffers' ?
-
-\section{Introduction}
-
-Modern data collection and analysis pipelines are increasingly being
-built using collections of components to better manage software
-complexity through reusability, modularity, and fault
-isolation \citep{Wegiel:2010:CTT:1932682.1869479}.
-Data analysis patterns such as Split-Apply-Combine
-\citep{wickham2011split} explicitly break up large problems into
-manageable pieces.  These patterns are frequently employed with
-different programming languages used for the different phases of data
-analysis -- collection, cleaning, analysis, post-processing, and
-presentation in order to take advantage of the unique combination of
-performance, speed of development, and library support offered by
-different environments.  Each stage of the data
-analysis pipeline may involve storing intermediate results in a
-file or sending them over the network.
-% DE: Nice!
-
-Given these requirements, how do we safely share intermediate results
-between different applications, possibly written in different
-languages, and possibly running on different computer system, possibly
-spanning different operating systems?  Programming
-languages such as R, Julia, Java, and Python include built-in
-serialization support, but these formats are tied to the specific
-% DE: need to define serialization?
-programming language in use and thus lock the user into a single
-environment.  CSV files can be read and written by many applications
-and so are often used for exporting tabular data.  However, CSV files
-have a number of disadvantages, such as a limitation of exporting only
-tabular datasets, lack of type-safety, inefficient text representation
-and parsing, and ambiguities in the format involving special
-characters.  JSON is another widely-supported format used mostly on
-the web that removes many of these disadvantages, but it too suffers
-from being too slow to parse and also does not provide strong typing
-between integers and floating point.  Because the schema information
-is not kept separately, multiple JSON messages of the same type
-needlessly duplicate the field names with each message.
-%
-%
-%
-A number of binary formats based on JSON have been proposed that
-reduce the parsing cost and improve the efficiency.  MessagePack
-\citep{msgpackR} and BSON \citep{rmongodb} both have R interfaces, but
-these formats lack a separate schema for the serialized data and thus
-still duplicate field names with each message sent over the network or
-stored in a file.  Such formats also lack support for versioning when
-data storage needs evolve over time, or when application logic and
-requirement changes dictate update to the message format.
-% DE: Need to talk about XML ?
-
-Once the data serialization needs of an application become complex
-enough, developers typically benefit from the use of an
-\emph{interface description language}, or \emph{IDL}.  IDLs like
-Protocol Buffers \citep{protobuf}, Apache Thrift, and Apache Avro provide a compact
-well-documented schema for cross-langauge data structures and
-efficient binary interchange formats.  The schema can be used to
-generate model classes for statically-typed programming languages such
-as C++ and Java, or can be used with reflection for dynamically-typed
-programming languages.  Since the schema is provided separately from
-the encoded data, the data can be efficiently encoded to minimize
-storage costs of the stored data when compared with simple
-``schema-less'' binary interchange formats.
-
-% TODO(mstokely): Take a more conversational tone here asking
-% questions and motivating protocol buffers?
-
-% TODO(mstokely): If we go to JSS, include a larger paragraph here
-% referencing each numbered section.  I don't like these generally,
-% but its useful for this paper I think because we have a boring bit
-% in the middle (full class/method details) and interesting
-% applications at the end.
-This article describes the basics of Google's Protocol Buffers through
-an easy to use R package, \CRANpkg{RProtoBuf}.  After describing the
-basics of protocol buffers and \CRANpkg{RProtoBuf}, we illustrate
-several common use cases for protocol buffers in data analysis.
-
-\section{Protocol Buffers}
-
-FIXME Introductory section which may include references in parentheses
-\citep{R}, or cite a reference such as \citet{R} in the text.
-
-% This content is good.  Maybe use and cite?
-% http://martin.kleppmann.com/2012/12/05/schema-evolution-in-avro-protocol-buffers-thrift.html
-
-
-%% TODO(de,ms)  What follows is oooooold and was lifted from the webpage
-%%              Rewrite?
-Protocol Buffers can be described as a modern, language-neutral, platform-neutral,
-extensible mechanism for sharing and storing structured data.  Since their
-introduction, Protocol Buffers have been widely adopted in industry with
-applications as varied as database-internal messaging (Drizzle), % DE: citation?
-Sony Playstations, Twitter, Google Search, Hadoop, and Open Street Map.  While
-% TODO(DE): This either needs a citation, or remove the name drop
-traditional IDLs have at time been criticized for code bloat and
-complexity, Protocol Buffers are based on a simple list and records
-model that is compartively flexible and simple to use.
-
-Some of the key features provided by Protocol Buffers for data analysis
-include:
-
-\begin{itemize}
-\item \emph{Portable}:  Allows users to send and receive data between
-  applications or different computers.
-\item \emph{Efficient}:  Data is serialized into a compact binary
-  representation for transmission or storage.
-\item \emph{Exentsible}:  New fields can be added to Protocol Buffer Schemas
-  in a forward-compatible way that do not break older applications.
-\item \emph{Stable}:  Protocol Buffers have been in wide use for over a
-  decade.
-\end{itemize}
-
-Figure~\ref{fig:protobuf-distributed-usecase} illustrates an example
-communication workflow with protocol buffers and an interactive R
-session.  Common use cases include populating a request RPC protocol
-buffer in R that is then serialized and sent over the network to a
-remote server.  The server would then deserialize the message, act on
-the request, and respond with a new protocol buffer over the network. The key
-difference to, say, a request to an Rserve instance is that the remote server
-may not even know the R language.
-
-%Protocol buffers are a language-neutral, platform-neutral, extensible
-%way of serializing structured data for use in communications
-%protocols, data storage, and more.
-
-%Protocol Buffers offer key features such as an efficient data interchange
-%format that is both language- and operating system-agnostic yet uses a
-%lightweight and highly performant encoding, object serialization and
-%de-serialization as well data and configuration management. Protocol
-%buffers are also forward compatible: updates to the \texttt{proto}
-%files do not break programs built against the previous specification.
-
-%While benchmarks are not available, Google states on the project page that in
-%comparison to XML, protocol buffers are at the same time \textsl{simpler},
-%between three to ten times \textsl{smaller}, between twenty and one hundred
-%times \textsl{faster}, as well as less ambiguous and easier to program.
-
-Many sources compare data serialization formats and show protocol
-buffers very favorably to the alternatives, such
-as \citet{Sumaray:2012:CDS:2184751.2184810}
-
-%The flexibility of the reflection-based API is particularly well
-%suited for interactive data analysis.
-
-% XXX Design tradeoffs: reflection vs proto compiler
-
-For added speed and efficiency, the C++, Java, and Python bindings to
-Protocol Buffers are used with a compiler that translates a protocol
-buffer schema description file (ending in \texttt{.proto}) into
-language-specific classes that can be used to create, read, write and
-manipulate protocol buffer messages.  The R interface, in contrast,
-uses a reflection-based API that is particularly well suited for
-interactive data analysis.  All messages in R have a single class
-structure, but different accessor methods are created at runtime based
-on the name fields of the specified message type.
-
-% In other words, given the 'proto'
-%description file, code is automatically generated for the chosen
-%target language(s). The project page contains a tutorial for each of
-%these officially supported languages:
-%\url{http://code.google.com/apis/protocolbuffers/docs/tutorials.html}
-
-%The protocol buffers code is released under an open-source (BSD) license. The
-%protocol buffer project (\url{http://code.google.com/p/protobuf/})
-%contains a C++ library and a set of runtime libraries and compilers for
-%C++, Java and Python.
-
-%With these languages, the workflow follows standard practice of so-called
-%Interface Description Languages (IDL)
-%(c.f. \href{http://en.wikipedia.org/wiki/Interface_description_language}{Wikipedia
-%  on IDL}).  This consists of compiling a protocol buffer description file
-%(ending in \texttt{.proto}) into language specific classes that can be used
-
-%Besides the officially supported C++, Java and Python implementations, several projects have been
-%created to support protocol buffers for many languages. The list of known
-%languages to support protocol buffers is compiled as part of the
-%project page: \url{http://code.google.com/p/protobuf/wiki/ThirdPartyAddOns}
-
-\begin{figure}[t]
-\begin{center}
-\includegraphics[width=\textwidth]{protobuf-distributed-system-crop.pdf}
-\end{center}
-\caption{Example protobuf usage}
-\label{fig:protobuf-distributed-usecase}
-\end{figure}
-
-\section{Basic Usage: Messages and Descriptors}
-
-This section describes how to use the R API to create and manipulate
-protocol buffer messages in R, and how to read and write the
-binary \emph{payload} of the messages to files and arbitrary binary
-R connections.
-
-The two fundamental building blocks of Protocol Buffers are Messages
-and Descriptors.  Messages provide a common abstract encapsulation of
-structured data fields of the type specified in a Message Descriptor.
-Message Descriptors are defined in \texttt{.proto} files and define a
-schema for a particular named class of messages.
-
-Table~\ref{tab:proto} shows an example \texttt{.proto} file which
-defines the \texttt{tutorial.Person} type.  The R code in the right
-column shows an example of creating a new message of this type and
-populating its fields.
-
-% Commented out because we said this earlier.
-%This separation
-%between schema and the message objects is in contrast to
-%more verbose formats like JSON, and when combined with the efficient
-%binary representation of any Message object explains a large part of
-%the performance and storage-space advantage offered by Protocol
-%Buffers. TODO(ms): we already said some of this above.  clean up.
-
-% lifted from protobuf page:
-%With Protocol Buffers you define how you want your data to be
-%structured once, and then you can read or write structured data to and
-%from a variety of data streams using a variety of different
-%languages.  The definition
-
-%% TODO(de) Can we make this not break the width of the page?
-\noindent
-\begin{table}
-\begin{tabular}{@{\hskip .01\textwidth}p{.40\textwidth}@{\hskip .015\textwidth}|@{\hskip .015\textwidth}p{0.55\textwidth}@{\hskip .01\textwidth}}
-\hline
-Schema : \texttt{addressbook.proto} & Example R Session\\
-\hline
-\begin{minipage}{.35\textwidth}
-\vspace{2mm}
-\begin{example}
-package tutorial;
-message Person {
- required string name = 1;
- required int32 id = 2;
- optional string email = 3;
- enum PhoneType {
-   MOBILE = 0; HOME = 1;
-   WORK = 2;
- }
- message PhoneNumber {
-   required string number = 1;
-   optional PhoneType type = 2;
- }
- repeated PhoneNumber phone = 4;
-}
-\end{example}
-\vspace{2mm}
-\end{minipage} & \begin{minipage}{.45\textwidth}
-<<echo=TRUE>>=
-library(RProtoBuf)
-person <- new(tutorial.Person, id=1, name="Dirk")
-person
-person$name
-person$name <- "Romain"
-cat(as.character(person))
-serialize(person, NULL)
-@
-\end{minipage} \\
-\hline
-\end{tabular}
-\caption{The schema representation from a \texttt{.proto} file for the
-  \texttt{tutorial.Person} class (left) and simple R code for creating
-  an object of this class and accessing its fields (right).}
-\label{tab:proto}
-\end{table}
-
-%This section may contain a figure such as Figure~\ref{figure:rlogo}.
-%
-%\begin{figure}[htbp]
-%  \centering
-%  \includegraphics{Rlogo}
-%  \caption{The logo of R.}
-%  \label{figure:rlogo}
-%\end{figure}
-
-\subsection{Importing Message Descriptors from .proto files}
-
-%The three basic abstractions of \CRANpkg{RProtoBuf} are Messages,
-%which encapsulate a data structure, Descriptors, which define the
-%schema used by one or more messages, and DescriptorPools, which
-%provide access to descriptors.
-
-Before we can create a new Protocol Buffer Message or parse a
-serialized stream of bytes as a Message, we must read in the message
-type specification from a \texttt{.proto} file.
-
-New \texttt{.proto} files are imported with the \code{readProtoFiles}
-function, which can import a single file, all files in a directory, or
-all \texttt{.proto} files provided by another R package.
-
-The \texttt{.proto} file syntax for defining the structure of protocol
-buffer data is described comprehensively on Google Code:
-\url{http://code.google.com/apis/protocolbuffers/docs/proto.html}.
-
-Once the proto files are imported, all message descriptors are
-are available in the R search path in the \texttt{RProtoBuf:DescriptorPool}
-special environment. The underlying mechanism used here is
-described in more detail in Section~\ref{sec-lookup}.
-
-<<>>=
-ls( "RProtoBuf:DescriptorPool" )
-@
-
-%\subsection{Importing proto files}
-%In contrast to the other languages (Java, C++, Python) that are officially
-%supported by Google, the implementation used by the \texttt{RProtoBuf}
-%package does not rely on the \texttt{protoc} compiler (with the exception of
-%the two functions discussed in the previous section). This means that no
-%initial step of statically compiling the proto file into C++ code that is
-%then accessed by R code is necessary. Instead, \texttt{proto} files are
-%parsed and processed \textsl{at runtime} by the protobuf C++ library---which
-%is much more appropriate for a dynamic language.
-
-\subsection{Creating a message}
-
-New messages are created with the \texttt{new} function which accepts
-a Message Descriptor and optionally a list of ``name = value'' pairs
-to set in the message.
-%The objects contained in the special environment are
-%descriptors for their associated message types. Descriptors will be
-%discussed in detail in another part of this document, but for the
-%purpose of this section, descriptors are just used with the \texttt{new}
-%function to create messages.
-
-<<>>=
-p1 <- new( tutorial.Person )
-p <- new( tutorial.Person, name = "Romain", id = 1 )
-@
-
-\subsection{Access and modify fields of a message}
-
-Once the message is created, its fields can be queried
-and modified using the dollar operator of R, making protocol
-buffer messages seem like lists.
-
-<<>>=
-p$name
-p$id
-p$email <- "francoisromain at free.fr"
-@
-
-However, as opposed to R lists, no partial matching is performed
-and the name must be given entirely.
-
-The \verb|[[| operator can also be used to query and set fields
-of a mesages, supplying either their name or their tag number :
-
-<<>>=
-p[["name"]] <- "Romain Francois"
-p[[ 2 ]] <- 3
-p[[ "email" ]]
-@
-
-Protocol buffers include a 64-bit integer type, but R lacks native
-64-bit integer support.  A workaround is available and described in
-Section~\ref{sec:int64} for working with large integer values.
-
-% TODO(mstokely): Document extensions here.
-% There are none in addressbook.proto though.
-
-\subsection{Display messages}
-
-Protocol buffer messages and descriptors implement \texttt{show}
-methods that provide basic information about the message :
-
-<<>>=
-p
-@
-
-For additional information, such as for debugging purposes,
-the \texttt{as.character} method provides a more complete ASCII
-representation of the contents of a message.
-
-<<>>=
-writeLines( as.character( p ) )
-@
-
-\subsection{Serializing messages}
-
-However, the main focus of protocol buffer messages is
-efficiency. Therefore, messages are transported as a sequence
-of bytes. The \texttt{serialize} method is implemented for
-protocol buffer messages to serialize a message into a sequence of
-bytes that represents the message.
-%(raw vector in R speech) that represents the message.
-
-<<>>=
-serialize( p, NULL )
-@
-
-The same method can also be used to serialize messages to files :
-
-<<>>=
-tf1 <- tempfile()
-serialize( p, tf1 )
-readBin( tf1, raw(0), 500 )
-@
-
-Or to arbitrary binary connections:
-
-<<>>=
-tf2 <- tempfile()
-con <- file( tf2, open = "wb" )
-serialize( p, con )
-close( con )
-readBin( tf2, raw(0), 500 )
-@
-
-\texttt{serialize} can also be used in a more traditional
-object oriented fashion using the dollar operator :
-
-<<>>=
-# serialize to a file
-p$serialize( tf1 )
-# serialize to a binary connection
-con <- file( tf2, open = "wb" )
-p$serialize( con )
-close( con )
-@
-
-
-\subsection{Parsing messages}
-
-The \texttt{RProtoBuf} package defines the \texttt{read} and
-\texttt{readASCII} functions to read messages from files, raw vectors,
-or arbitrary connections.  \texttt{read} expects to read the message
-payload from binary files or connections and \texttt{readASCII} parses
-the human-readable ASCII output that is created with
-\code{as.character}.
-
-The binary representation of the message (often called the payload)
-does not contain information that can be used to dynamically
-infer the message type, so we have to provide this information
-to the \texttt{read} function in the form of a descriptor :
-
-<<>>=
-msg <- read( tutorial.Person, tf1 )
-writeLines( as.character( msg ) )
-@
-
-The \texttt{input} argument of \texttt{read} can also be a binary
-readable R connection, such as a binary file connection:
-
-<<>>=
-con <- file( tf2, open = "rb" )
-message <- read( tutorial.Person, con )
-close( con )
-writeLines( as.character( message ) )
-@
-
-Finally, the payload of the message can be used :
-
-<<>>=
-# reading the raw vector payload of the message
-payload <- readBin( tf1, raw(0), 5000 )
-message <- read( tutorial.Person, payload )
-@
-
-
-\texttt{read} can also be used as a pseudo method of the descriptor
-object :
-
-<<>>=
-# reading from a file
-message <- tutorial.Person$read( tf1 )
-# reading from a binary connection
-con <- file( tf2, open = "rb" )
-message <- tutorial.Person$read( con )
-close( con )
-# read from the payload
-message <- tutorial.Person$read( payload )
-@
-
-
-\section{Under the hood: S4 Classes, Methods, and Pseudo Methods}
-
-The \CRANpkg{RProtoBuf} package uses the S4 system to store
-information about descriptors and messages.  Using the S4 system
-allows the \texttt{RProtoBuf} package to dispatch methods that are not
-generic in the S3 sense, such as \texttt{new} and
-\texttt{serialize}.
-
-Each R object stores an external pointer to an object managed by
-the \texttt{protobuf} C++ library.
-The \CRANpkg{Rcpp} package \citep{eddelbuettel2011rcpp} is used to
-facilitate the integration of the R and C++ code for these objects.
-
-% Message, Descriptor, FieldDescriptor, EnumDescriptor,
-% FileDescriptor, EnumValueDescriptor
-%
-% grep RPB_FUNC * | grep -v define|wc -l
-% 84
-% grep RPB_ * | grep -v RPB_FUNCTION | grep METHOD|wc -l
-% 33
-
-There are over 100 C++ functions that provide the glue code between
-the member functions of the 6 primary Message and Descriptor classes
-in the protobuf library.  Wrapping each method individually allows us
-to add user friendly custom error handling, type coercion, and
-performance improvements at the cost of a more verbose
-implementation.  The RProtoBuf implementation in many ways motivated
-the development of Rcpp Modules \citep{eddelbuettel2010exposing},
-which provide a more concise way of wrapping C++ functions and classes
-in a single entity.
-
-The \texttt{RProtoBuf} package combines the \emph{R typical} dispatch
-of the form \verb|method( object, arguments)| and the more traditional
-object oriented notation \verb|object$method(arguments)|.
-Additionally, \texttt{RProtoBuf} implements the \texttt{.DollarNames} S3 generic function
-(defined in the \texttt{utils} package) for all classes to enable tab
-completion.  Completion possibilities include pseudo method names for all
-classes, plus dynamic dispatch on names or types specific to a given object.
-
-% TODO(ms): Add column check box for doing dynamic dispatch based on type.
-\begin{table}[h]
-\centering
-\begin{tabular}{|l|c|c|l|}
-\hline
-\textbf{Class} & \textbf{Slots} & \textbf{Methods} & \textbf{Dynamic Dispatch}\\
-\hline
-\hline
-Message & 2 & 20 & yes (field names)\\
-\hline
-Descriptor & 2 & 16 & yes (field names, enum types, nested types)\\
-\hline
-FieldDescriptor & 4 & 18 & no\\
-\hline
-EnumDescriptor & 4 & 11 & yes (enum constant names)\\
-\hline
-FileDescriptor & 3 & 6 & yes (message/field definitions)\\
-\hline
-EnumValueDescriptor & 3 & 6 & no\\
-\hline
-\end{tabular}
-\end{table}
-
-\subsection{Messages}
-
-The \texttt{Message} S4 class represents Protocol Buffer Messages and
-is the core abstraction of \CRANpkg{RProtoBuf}. Each \texttt{Message}
-contains a pointer to a \texttt{Descriptor} which defines the schema
-of the data defined in the Message, as well as a number of
-\texttt{FieldDescriptors} for the individual fields of the message.  A
-complete list of the slots and methods for \texttt{Messages}
-is available in Table~\ref{Message-methods-table}.
-
-\begin{table}[h]
-\centering
-\begin{small}
-\begin{tabular}{l|p{10cm}}
-\hline
-\textbf{Slot} & \textbf{Description} \\
-\hline
-\texttt{pointer} & External pointer to the \texttt{Message} object of the C++ proto library. Documentation for the
-\texttt{Message} class is available from the protocol buffer project page:
-\url{http://code.google.com/apis/protocolbuffers/docs/reference/cpp/google.protobuf.message.html#Message} \\
-\hline
-\texttt{type} & Fully qualified name of the message. For example a \texttt{Person} message
-has its \texttt{type} slot set to \texttt{tutorial.Person} \\[.3cm]
-\hline
-\textbf{Method} & \textbf{Description} \\
-\hline
-\texttt{has} & Indicates if a message has a given field.   \\
-\texttt{clone} & Creates a clone of the message \\
-\texttt{isInitialized} & Indicates if a message has all its required fields set\\
-\texttt{serialize} & serialize a message to a file, binary connection, or raw vector\\
-\texttt{clear} & Clear one or several fields of a message, or the entire message\\
-\texttt{size} & The number of elements in a message field\\
-\texttt{bytesize} & The number of bytes the message would take once serialized\\
-\hline
-\texttt{swap} & swap elements of a repeated field of a message\\
-\texttt{set} & set elements of a repeated field\\
-\texttt{fetch} & fetch elements of a repeated field\\
-\texttt{setExtension} & set an extension of a message\\
-\texttt{getExtension} & get the value of an extension of a message\\
-\texttt{add} & add elements to a repeated field \\
-\hline
-\texttt{str} & the R structure of the message\\
-\texttt{as.character} & character representation of a message\\
-\texttt{toString} & character representation of a message (same as \texttt{as.character}) \\
-\texttt{as.list} & converts message to a named R list\\
-\texttt{update} & updates several fields of a message at once\\
-\texttt{descriptor} & get the descriptor of the message type of this message\\
-\texttt{fileDescriptor} & get the file descriptor of this message's descriptor\\
-\hline
-\end{tabular}
-\end{small}
-\caption{\label{Message-methods-table}Description of slots and methods for the \texttt{Message} S4 class}
-\end{table}
-
-\subsection{Descriptors}
-
-Descriptors describe the type of a Message.  This includes what fields
-a message contains and what the types of those fields are.  Message
-descriptors are represented in R with the \emph{Descriptor} S4
-class. The class contains the slots \texttt{pointer} and
-\texttt{type}.  Similarly to messages, the \verb|$| operator can be
-used to retrieve descriptors that are contained in the descriptor, or
-invoke pseudo-methods.
-
-When \CRANpkg{RProtoBuf} is first loaded it calls
-\texttt{readProtoFiles} to read in an example \texttt{.proto} file
-included with the package.  The \texttt{tutorial.Person} descriptor
-and any other descriptors defined in loaded \texttt{.proto} files are
-then available on the search path.
-
-<<>>=
-# field descriptor
-tutorial.Person$email
-
-# enum descriptor
-tutorial.Person$PhoneType
-
-# nested type descriptor
-tutorial.Person$PhoneNumber
-# same as
-tutorial.Person.PhoneNumber
-@
-
-Table~\ref{Descriptor-methods-table} provides a complete list of the
-slots and avalailable methods for Descriptors.
-
-\begin{table}[h]
-\centering
-\begin{small}
-\begin{tabular}{l|p{10cm}}
-\hline
-\textbf{Slot} & \textbf{Description} \\
-\hline
-\texttt{pointer} & External pointer to the \texttt{Descriptor} object of the C++ proto library. Documentation for the
-\texttt{Descriptor} class is available from the protocol buffer project page:
-\url{http://code.google.com/apis/protocolbuffers/docs/reference/cpp/google.protobuf.descriptor.html#Descriptor} \\
-\hline
-\texttt{type} & Fully qualified path of the message type. \\[.3cm]
-\hline
-\textbf{Method} & \textbf{Description} \\
-\hline
-\texttt{new} & Creates a prototype of a message described by this descriptor.\\
-\texttt{read} & Reads a message from a file or binary connection.\\
-\texttt{readASCII} & Read a message in ASCII format from a file or
-text connection.\\
-\hline
-\texttt{name} & Retrieve the name of the message type associated with
-this descriptor.\\
-\texttt{as.character} & character representation of a descriptor\\
-\texttt{toString} & character representation of a descriptor (same as \texttt{as.character}) \\
-\texttt{as.list} & return a named
-list of the field, enum, and nested descriptors included in this descriptor.\\
-\texttt{asMessage} & return DescriptorProto message. \\
-\hline
-\texttt{fileDescriptor} & Retrieve the file descriptor of this
-descriptor.\\
-\texttt{containing\_type} & Retrieve the descriptor describing the message type containing this descriptor.\\
-\texttt{field\_count} & Return the number of fields in this descriptor.\\
-\texttt{field} & Return the descriptor for the specified field in this descriptor.\\
-\texttt{nested\_type\_count} & The number of nested types in this descriptor.\\
-\texttt{nested\_type} & Return the descriptor for the specified nested
-type in this descriptor.\\
-\texttt{enum\_type\_count} & The number of enum types in this descriptor.\\
-\texttt{enum\_type} & Return the descriptor for the specified enum
-type in this descriptor.\\
-\hline
-\end{tabular}
-\end{small}
-\caption{\label{Descriptor-methods-table}Description of slots and methods for the \texttt{Descriptor} S4 class}
-\end{table}
-
-\subsection{Field Descriptors}
-\label{subsec-field-descriptor}
-
-The class \emph{FieldDescriptor} represents field
-descriptor in R. This is a wrapper S4 class around the
-\texttt{google::protobuf::FieldDescriptor} C++ class.
-Table~\ref{fielddescriptor-methods-table} describes the methods
-defined for the \texttt{FieldDescriptor} class.
-
-\begin{table}[h]
-\centering
-\begin{small}
-\begin{tabular}{l|p{10cm}}
-\hline
-\textbf{Slot} & \textbf{Description} \\
-\hline
-\texttt{pointer} & External pointer to the \texttt{FieldDescriptor} C++ variable \\
-\hline
-\texttt{name} & Simple name of the field \\
-\hline
-\texttt{full\_name} & Fully qualified name of the field \\
-\hline
-\texttt{type} & Name of the message type where the field is declared \\[.3cm]
-\hline
-\textbf{Method} & \textbf{Description} \\
-\hline
-\texttt{as.character} & Character representation of a descriptor\\
-\texttt{toString} & Character
-representation of a descriptor (same as \texttt{as.character}) \\
-\texttt{asMessage} & Return FieldDescriptorProto message. \\
-\texttt{name} & Return the name of the field descriptor.\\
-\texttt{fileDescriptor} & Return the fileDescriptor where this field is defined.\\
-\texttt{containing\_type} & Return the containing descriptor of this field.\\
-\texttt{is\_extension} & Return TRUE if this field is an extension.\\
-\texttt{number} & Gets the declared tag number of the field.\\
-\texttt{type} & Gets the type of the field.\\
-\texttt{cpp\_type} & Gets the C++ type of the field.\\
-\texttt{label} & Gets the label of a field (optional, required, or repeated).\\
-\texttt{is\_repeated} & Return TRUE if this field is repeated.\\
-\texttt{is\_required} & Return TRUE if this field is required.\\
-\texttt{is\_optional} & Return TRUE if this field is optional.\\
-\texttt{has\_default\_value} & Return TRUE if this field has a default value.\\
-\texttt{default\_value} & Return the default value.\\
-\texttt{message\_type} & Return the message type if this is a message type field.\\
-\texttt{enum\_type} & Return the enum type if this is an enum type field.\\
-\hline
-\end{tabular}
-\end{small}
-\caption{\label{fielddescriptor-methods-table}Description of slots and
-  methods for the \texttt{FieldDescriptor} S4 class}
-\end{table}
-
-% TODO(ms): Useful distinction to make -- FieldDescriptor does not do
-% separate '$' dispatch like Messages, Descriptors, and
-% EnumDescriptors do.  Should it?
-
-\subsection{Enum Descriptors}
-\label{subsec-enum-descriptor}
-
-The class \emph{EnumDescriptor} is an R wrapper
-class around the C++ class \texttt{google::protobuf::EnumDescriptor}.
-Table~\ref{enumdescriptor-methods-table} describes the methods
-defined for the \texttt{EnumDescriptor} class.
-
-The \verb|$| operator can be used to retrieve the value of enum
-constants contained in the EnumDescriptor, or to invoke
-pseudo-methods.
-
-<<>>=
-tutorial.Person$PhoneType
-tutorial.Person$PhoneType$WORK
-@
-
-\begin{table}[h]
-\centering
-\begin{small}
-\begin{tabular}{l|p{10cm}}
-\hline
-\textbf{Slot} & \textbf{Description} \\
-\hline
-\texttt{pointer} & External pointer to the \texttt{EnumDescriptor} C++ variable \\
-\hline
-\texttt{name} & Simple name of the enum \\
-\hline
-\texttt{full\_name} & Fully qualified name of the enum \\
-\hline
-\texttt{type} & Name of the message type where the enum is declared \\[.3cm]
-\hline
-\textbf{Method} & \textbf{Description} \\
-\hline
-\texttt{as.list} & return a named
-integer vector with the values of the enum and their names.\\
-\texttt{as.character} & character representation of a descriptor\\
[TRUNCATED]

To get the complete diff run:
    svnlook diff /svnroot/rprotobuf -r 701


More information about the Rprotobuf-commits mailing list