[Rprotobuf-commits] r802 - papers/jss

noreply at r-forge.r-project.org noreply at r-forge.r-project.org
Tue Jan 21 05:00:54 CET 2014


Author: murray
Date: 2014-01-21 05:00:53 +0100 (Tue, 21 Jan 2014)
New Revision: 802

Modified:
   papers/jss/article.Rnw
   papers/jss/article.bib
Log:
Add a short sentence to define serialization early in the intro,
addressing a todo that multiple people had mentioned.  Reference the
C++ FAQ for lack of a better reference for now.

Revert the second to last paragraph of the introduction to my earlier
version that was instead moved to section 2.  Remove one of the more
technical sentences to address Jeroen's observation that it was a bit
too technical for the intro (e.g. "reflection" and dynamic typed
languages was a bit much)

Most sentences of the deleted paragraph were false, as discussed in
email.



Modified: papers/jss/article.Rnw
===================================================================
--- papers/jss/article.Rnw	2014-01-21 03:18:23 UTC (rev 801)
+++ papers/jss/article.Rnw	2014-01-21 04:00:53 UTC (rev 802)
@@ -35,6 +35,10 @@
 \CRANpkg{RProtoBuf} package provides a complete interface between this
 library and the R environment for statistical computing.
 %TODO(ms) keep it less than 150 words.
+% Maybe add Jeroen's sentence:
+% They offer a unique combination of features, performance, and maturity that seems
+% particulary well suited for data-driven applications and numerical
+% computing.
 }
 \Keywords{r, protocol buffers, serialization, cross-platform}
 \Plainkeywords{r, protocol buffers, serialization, cross-platform} %% without formatting
@@ -137,12 +141,14 @@
 stored in a file or sent over the network for further processing. 
 % JO Perhaps also mention that serialization is needed for distributed
 % systems to make systems scale up?
-% MS: yes perhaps somewhere near here we could define serialization
-% and describe this.
 
 Given these requirements, how do we safely and efficiently share intermediate results
 between different applications, possibly written in different
 languages, and possibly running on different computer systems?
+In computer programming, \emph{serialization} is the process of
+translating data structures, variables, and session state into a
+format that can be stored or transmitted and then reconstructed in the
+original form later \citep{clinec++}.
 % Reverted to my original above, because the replacement below puts me
 % to sleep:
 %Such systems require reliable and efficient exchange of intermediate
@@ -191,52 +197,55 @@
 versioning when data storage needs evolve over time, or when
 application logic and requirement changes dictate updates to the
 message format.
+
+Once the data serialization needs of an application become complex
+enough, developers typically benefit from the use of an
+\emph{interface description language}, or \emph{IDL}.  IDLs like
+Protocol Buffers \citep{protobuf}, Apache Thrift, and Apache Avro
+provide a compact well-documented schema for cross-language data
+structures and efficient binary interchange formats.  Since the schema
+is provided separately from the encoded data, the data can be
+efficiently encoded to minimize storage costs when
+compared with simple ``schema-less'' binary interchange formats.
+Many sources compare data serialization formats
+and show Protocol Buffers compare very favorably to the alternatives; see
+\citet{Sumaray:2012:CDS:2184751.2184810} for one such comparison.
+
+% Too technical, move to section 2.
+% The schema can be used to generate model classes for statically-typed programming languages
+%such as C++ and Java, or can be used with reflection for dynamically-typed programming
+%languages.
+
+% TODO(mstokely): Will need to define reflection if we use it here.
+% Maybe in the next section since its not as key as 'serialization'
+% which we already defined.
  
 %\paragraph*{Enter Protocol Buffers:}
-In 2008, and following several years of internal use, Google released an open
-source version of Protocol Buffers. It provides data  
-interchange format that was designed and used for their internal infrastructure.
-Google officially provides high-quality parsing libraries for \texttt{Java}, 
-\texttt{C++} and \texttt{Python}, and community-developed open source implementations
-are available for many other languages. 
-Protocol Buffers take a quite different approach from many other popular formats.
-They offer a unique combination of features, performance, and maturity that seems
-particulary well suited for data-driven applications and numerical computing.
-Protocol Buffers are a binary format that natively supports all common primitive types
-found in modern programming languages. A key advantage is that numeric values
-are serialized exactly the same way as they are stored in memory. There is
-no loss of precision, no overhead, and parsing messages is very efficient: the system can 
-simply copy bytes to memory without any further processing. 
-But the most powerful feature of Protocol Buffers is that it decouples the content
-from the structure using a schema, very similar to a database. This further increases
-performance by eliminating redundancy, while at the same time providing foundations
-for defining an \emph{Interface Description Language}, or \emph{IDL}.
-Many sources compare data serialization formats and show Protocol Buffers compare 
-very favorably to the alternatives; see \citet{Sumaray:2012:CDS:2184751.2184810} 
-for one such comparison.
+
+% In 2008, and following several years of internal use, Google released an open
+% source version of Protocol Buffers. It provides data  
+% interchange format that was designed and used for their internal infrastructure.
+% Google officially provides high-quality parsing libraries for \texttt{Java}, 
+% \texttt{C++} and \texttt{Python}, and community-developed open source implementations
+% are available for many other languages. 
+% Protocol Buffers take a quite different approach from many other popular formats.
+
+% TODO(mstokely): Good sentence from Jeroen, add it here or sec 2.
+% They offer a unique combination of features, performance, and maturity that seems
+% particulary well suited for data-driven applications and numerical
+% computing.
+
 % TODO(DE): Mention "future proof" forward compatibility of schemata
 
-%  The schema can be used to
-%generate model classes for statically-typed programming languages such
-%as C++ and Java, or can be used with reflection for dynamically-typed
-%programming languages.  Since the schema is provided separately from
-%the encoded data, the data can be efficiently encoded to minimize
-%storage costs of the stored data when compared with simple
-%``schema-less'' binary interchange formats.
-%Many sources compare data serialization formats and show Protocol
-%Buffers very 
 
-
 % TODO(mstokely): Take a more conversational tone here asking
 % questions and motivating protocol buffers?
 
-% TODO(mstokely): If we go to JSS, include a larger paragraph here
-% referencing each numbered section.  I don't like these generally,
-% but its useful for this paper I think because we have a boring bit
-% in the middle (full class/method details) and interesting
-% applications at the end.
+% NOTE(mstokely): I don't like these roadmap paragraphs in general,
+% but it seems ueful here because we have a boring bit in the middle
+% (full class/method details) and interesting applications at the end.
 
-This paper describes an R interface to Protocol Buffer, 
+This paper describes an R interface to Protocol Buffers,
 and is organized as follows. Section~\ref{sec:protobuf}
 provides a general overview of Protocol Buffers.
 Section~\ref{sec:rprotobuf-basic} describes the interactive R interface

Modified: papers/jss/article.bib
===================================================================
--- papers/jss/article.bib	2014-01-21 03:18:23 UTC (rev 801)
+++ papers/jss/article.bib	2014-01-21 04:00:53 UTC (rev 802)
@@ -37,6 +37,12 @@
 volume = "19",
 year = "2013"
 }
+ at article{clinec++,
+  title={C++ faq},
+  author={Cline, Marshall},
+  journal={Also available as http://www. parashift. com/c++-faq-lite/index. html},
+  year = "2013"
+}
 @Manual{RJSONIO,
   title = {RJSONIO: Serialize R objects to JSON, JavaScript Object Notation},
   author = {Duncan Temple Lang},



More information about the Rprotobuf-commits mailing list