[Rprotobuf-commits] r698 - papers/rjournal

noreply at r-forge.r-project.org noreply at r-forge.r-project.org
Fri Jan 3 23:56:15 CET 2014


Author: murray
Date: 2014-01-03 23:56:15 +0100 (Fri, 03 Jan 2014)
New Revision: 698

Modified:
   papers/rjournal/eddelbuettel-francois-stokely.Rnw
Log:
Improve section 2 on protocol buffers.



Modified: papers/rjournal/eddelbuettel-francois-stokely.Rnw
===================================================================
--- papers/rjournal/eddelbuettel-francois-stokely.Rnw	2014-01-03 21:46:56 UTC (rev 697)
+++ papers/rjournal/eddelbuettel-francois-stokely.Rnw	2014-01-03 22:56:15 UTC (rev 698)
@@ -51,7 +51,7 @@
 Given these requirements, how do we safely share intermediate results
 between different applications, possibly written in different
 languages, and possibly running on different computers?  Programming
-languages such as R, Java, Julia, and Python include built-in
+languages such as R, Julia, Java, and Python include built-in
 serialization support, but these formats are tied to the specific
 programming language in use and thus lock the user into a single
 environment.  CSV files can be read and written by many applications
@@ -79,7 +79,7 @@
 Once the data serialization needs of an application become complex
 enough, developers typically benefit from the use of an
 \emph{interface description language}, or \emph{IDL}.  IDLs like
-Google's Protocol Buffers, Apache Thrift, and Apache Avro provide a compact
+Protocol Buffers \citep{protobuf}, Apache Thrift, and Apache Avro provide a compact
 well-documented schema for cross-langauge data structures and
 efficient binary interchange formats.  The schema can be used to
 generate model classes for statically typed programming languages such
@@ -92,79 +92,113 @@
 % TODO(mstokely): Take a more conversational tone here asking
 % questions and motivating protocol buffers?
 
+% TODO(mstokely): If we go to JSS, include a larger paragraph here
+% referencing each numbered section.  I don't like these generally,
+% but its useful for this paper I think because we have a boring bit
+% in the middle (full class/method details) and interesting
+% applications at the end.
 This article describes the basics of Google's Protocol Buffers through
 an easy to use R package, \CRANpkg{RProtoBuf}.  After describing the
 basics of protocol buffers and \CRANpkg{RProtoBuf}, we illustrate
 several common use cases for protocol buffers in data analysis.
 
+\section{Protocol Buffers}
 
-\section{Protocol Buffers}
+Introductory section which may include references in parentheses
+\citep{R}, or cite a reference such as \citet{R} in the text.
+
 % This content is good.  Maybe use and cite?
 % http://martin.kleppmann.com/2012/12/05/schema-evolution-in-avro-protocol-buffers-thrift.html
 
-Protocol Buffers are a widely used modern language-neutral, platform-neutral, extensible mechanism for sharing structured data.
 
+%% TODO(de,ms)  What follows is oooooold and was lifted from the webpage
+%%              Rewrite?
+Protocol Buffers are a modern language-neutral, platform-neutral,
+extensible mechanism for sharing and storing structured data.  They
+have been widely adopted in industry with applications as varied as Sony
+Playstations, Twitter, Google Search, Hadoop, and Open Street Map.  While
+traditional IDLs were previously characterized by bloat and
+complexity, Protocol Buffers is based on a simple list and records
+model that is flexible and easy to use.  Some of the key features
+provided by Protocol Buffers for data analysis include:
 
-one of the more popular examples of the modern 
+\begin{itemize}
+\item \emph{Portable}:  Allows users to send and receive data between
+  applications or different computers.
+\item \emph{Efficient}:  Data is serialized into a compact binary
+  representation for transmission or storage.
+\item \emph{Exentsible}:  New fields can be added to Protocol Buffer Schemas
+  in a forward-compatible way that do not break older applications.
+\item \emph{Stable}:  Protocol Buffers have been in wide use for over a
+  decade.
+\end{itemize}
 
+Figure~\ref{fig:protobuf-distributed-usecase} illustrates an example
+communication workflow with protocol buffers and an interactive R
+session.  Common use cases include populating a request RPC protocol
+buffer in R that is then serialized and sent over the network to a
+remote server.  The server would then deserialize the message, act on
+the request, and respond with a new protocol buffer over the network.
 
-XXX Related work on IDLs (greatly expanded )
+%Protocol buffers are a language-neutral, platform-neutral, extensible
+%way of serializing structured data for use in communications
+%protocols, data storage, and more.
 
-XXX Design tradeoffs: reflection vs proto compiler
 
 
-% TODO(ms) Also talk about versioning and why its useful.
+%Protocol Buffers offer key features such as an efficient data interchange
+%format that is both language- and operating system-agnostic yet uses a
+%lightweight and highly performant encoding, object serialization and
+%de-serialization as well data and configuration management. Protocol
+%buffers are also forward compatible: updates to the \texttt{proto}
+%files do not break programs built against the previous specification.
 
-%BSON, msgpack, Thrift, and Protocol Buffers take this latter approach,
-%with the
+%While benchmarks are not available, Google states on the project page that in
+%comparison to XML, protocol buffers are at the same time \textsl{simpler},
+%between three to ten times \textsl{smaller}, between twenty and one hundred
+%times \textsl{faster}, as well as less ambiguous and easier to program.
 
-% There are references comparing these we should use here.
+Many sources compare data serialization formats and show protocol
+buffers very favorably to the alternatives, such
+as \citep{Sumaray:2012:CDS:2184751.2184810}
 
-TODO Also mention Thrift and msgpack and the references comparing some
-of these tradeoffs.
+%The flexibility of the reflection-based API is particularly well
+%suited for interactive data analysis.
 
-Introductory section which may include references in parentheses
-\citep{R}, or cite a reference such as \citet{R} in the text.
+% XXX Design tradeoffs: reflection vs proto compiler
 
-%% TODO(de,ms)  What follows is oooooold and was lifted from the webpage
-%%              Rewrite?
-Protocol buffers are a language-neutral, platform-neutral, extensible
-way of serializing structured data for use in communications
-protocols, data storage, and more.
+For added speed and efficiency, the C++, Java, and Python bindings to
+Protocol Buffers are used with a compiler that translates a protocol
+buffer schema description file (ending in \texttt{.proto}) into
+language-specific classes that can be used to create, read, write and
+manipulate protocol buffer messages.  The R interface, in contrast,
+uses a reflection-based API that is particularly well suited for
+interactive data analysis.  All messages in R have a single class
+structure, but different accessor methods are created at runtime based
+on the name fields of the specified message type.
 
-Protocol Buffers offer key features such as an efficient data interchange
-format that is both language- and operating system-agnostic yet uses a
-lightweight and highly performant encoding, object serialization and
-de-serialization as well data and configuration management. Protocol
-buffers are also forward compatible: updates to the \texttt{proto}
-files do not break programs built against the previous specification.
+% In other words, given the 'proto'
+%description file, code is automatically generated for the chosen
+%target language(s). The project page contains a tutorial for each of
+%these officially supported languages:
+%\url{http://code.google.com/apis/protocolbuffers/docs/tutorials.html}
 
-While benchmarks are not available, Google states on the project page that in
-comparison to XML, protocol buffers are at the same time \textsl{simpler},
-between three to ten times \textsl{smaller}, between twenty and one hundred
-times \textsl{faster}, as well as less ambiguous and easier to program.
+%The protocol buffers code is released under an open-source (BSD) license. The
+%protocol buffer project (\url{http://code.google.com/p/protobuf/})
+%contains a C++ library and a set of runtime libraries and compilers for
+%C++, Java and Python.
 
-The protocol buffers code is released under an open-source (BSD) license. The
-protocol buffer project (\url{http://code.google.com/p/protobuf/})
-contains a C++ library and a set of runtime libraries and compilers for
-C++, Java and Python.
+%With these languages, the workflow follows standard practice of so-called
+%Interface Description Languages (IDL)
+%(c.f. \href{http://en.wikipedia.org/wiki/Interface_description_language}{Wikipedia
+%  on IDL}).  This consists of compiling a protocol buffer description file
+%(ending in \texttt{.proto}) into language specific classes that can be used
 
-With these languages, the workflow follows standard practice of so-called
-Interface Description Languages (IDL)
-(c.f. \href{http://en.wikipedia.org/wiki/Interface_description_language}{Wikipedia
-  on IDL}).  This consists of compiling a protocol buffer description file
-(ending in \texttt{.proto}) into language specific classes that can be used
-to create, read, write and manipulate protocol buffer messages. In other
-words, given the 'proto' description file, code is automatically generated
-for the chosen target language(s). The project page contains a tutorial for
-each of these officially supported languages:
-\url{http://code.google.com/apis/protocolbuffers/docs/tutorials.html}
+%Besides the officially supported C++, Java and Python implementations, several projects have been
+%created to support protocol buffers for many languages. The list of known
+%languages to support protocol buffers is compiled as part of the
+%project page: \url{http://code.google.com/p/protobuf/wiki/ThirdPartyAddOns}
 
-Besides the officially supported C++, Java and Python implementations, several projects have been
-created to support protocol buffers for many languages. The list of known
-languages to support protocol buffers is compiled as part of the
-project page: \url{http://code.google.com/p/protobuf/wiki/ThirdPartyAddOns}
-
 \begin{figure}[t]
 \begin{center}
 \includegraphics[width=\textwidth]{protobuf-distributed-system-crop.pdf}
@@ -184,18 +218,21 @@
 and Descriptors.  Messages provide a common abstract encapsulation of
 structured data fields of the type specified in a Message Descriptor.
 Message Descriptors are defined in \texttt{.proto} files and define a
-schema for a particular named class of messages.  This separation
-between schema and the message objects is in contrast to
-more verbose formats like JSON, and when combined with the efficient
-binary representation of any Message object explains a large part of
-the performance and storage-space advantage offered by Protocol
-Buffers. TODO(ms): we already said some of this above.  clean up.
+schema for a particular named class of messages.
 
 Table~\ref{tab:proto} shows an example \texttt{.proto} file which
 defines the \texttt{tutorial.Person} type.  The R code in the right
 column shows an example of creating a new message of this type and
 populating its fields.
 
+% Commented out because we said this earlier.
+%This separation
+%between schema and the message objects is in contrast to
+%more verbose formats like JSON, and when combined with the efficient
+%binary representation of any Message object explains a large part of
+%the performance and storage-space advantage offered by Protocol
+%Buffers. TODO(ms): we already said some of this above.  clean up.
+
 % lifted from protobuf page:
 %With Protocol Buffers you define how you want your data to be
 %structured once, and then you can read or write structured data to and
@@ -1262,12 +1299,8 @@
 
 \section{Summary}
 
-TODO(ms): random citations to work in:
+% RProtoBuf has been used.
 
-Many sources compare data serialization formats and show protocol
-buffers very favorably to the alternatives, such
-as \citep{Sumaray:2012:CDS:2184751.2184810}
-
 %Its pretty useful.  Murray to see if he can get approval to talk a
 %tiny bit about how much its used at Google.
 



More information about the Rprotobuf-commits mailing list