[Rprotobuf-commits] r700 - papers/rjournal
noreply at r-forge.r-project.org
noreply at r-forge.r-project.org
Sat Jan 4 02:01:55 CET 2014
Author: edd
Date: 2014-01-04 02:01:54 +0100 (Sat, 04 Jan 2014)
New Revision: 700
Modified:
papers/rjournal/eddelbuettel-francois-stokely.Rnw
papers/rjournal/eddelbuettel-francois-stokely.bib
Log:
one incomplete round of comments
Modified: papers/rjournal/eddelbuettel-francois-stokely.Rnw
===================================================================
--- papers/rjournal/eddelbuettel-francois-stokely.Rnw 2014-01-04 00:13:08 UTC (rev 699)
+++ papers/rjournal/eddelbuettel-francois-stokely.Rnw 2014-01-04 01:01:54 UTC (rev 700)
@@ -24,8 +24,8 @@
specialized programming languages. Protocol Buffers are a popular
method of serializing structured data between applications---while remaining
independent of programming languages or operating system. The
- \CRANpkg{RProtoBuf} package provides a complete interface to this
- library.
+ \CRANpkg{RProtoBuf} package provides a complete interface between this
+ library and the R environment for statistical computing.
%TODO(ms) keep it less than 150 words.
}
@@ -36,7 +36,7 @@
Modern data collection and analysis pipelines are increasingly being
built using collections of components to better manage software
complexity through reusability, modularity, and fault
-isolation \citep{Wegiel:2010:CTT:1932682.1869479}.
+isolation \citep{Wegiel:2010:CTT:1932682.1869479}.
Data analysis patterns such as Split-Apply-Combine
\citep{wickham2011split} explicitly break up large problems into
manageable pieces. These patterns are frequently employed with
@@ -47,12 +47,15 @@
different environments. Each stage of the data
analysis pipeline may involve storing intermediate results in a
file or sending them over the network.
+% DE: Nice!
Given these requirements, how do we safely share intermediate results
between different applications, possibly written in different
-languages, and possibly running on different computers? Programming
+languages, and possibly running on different computer system, possibly
+spanning different operating systems? Programming
languages such as R, Julia, Java, and Python include built-in
serialization support, but these formats are tied to the specific
+% DE: need to define serialization?
programming language in use and thus lock the user into a single
environment. CSV files can be read and written by many applications
and so are often used for exporting tabular data. However, CSV files
@@ -74,7 +77,9 @@
these formats lack a separate schema for the serialized data and thus
still duplicate field names with each message sent over the network or
stored in a file. Such formats also lack support for versioning when
-data storage needs evolve over time.
+data storage needs evolve over time, or when application logic and
+requirement changes dictate update to the message format.
+% DE: Need to talk about XML ?
Once the data serialization needs of an application become complex
enough, developers typically benefit from the use of an
@@ -82,8 +87,8 @@
Protocol Buffers \citep{protobuf}, Apache Thrift, and Apache Avro provide a compact
well-documented schema for cross-langauge data structures and
efficient binary interchange formats. The schema can be used to
-generate model classes for statically typed programming languages such
-as C++ and Java, or can be used with reflection for dynamically typed
+generate model classes for statically-typed programming languages such
+as C++ and Java, or can be used with reflection for dynamically-typed
programming languages. Since the schema is provided separately from
the encoded data, the data can be efficiently encoded to minimize
storage costs of the stored data when compared with simple
@@ -104,7 +109,7 @@
\section{Protocol Buffers}
-Introductory section which may include references in parentheses
+FIXME Introductory section which may include references in parentheses
\citep{R}, or cite a reference such as \citet{R} in the text.
% This content is good. Maybe use and cite?
@@ -113,15 +118,19 @@
%% TODO(de,ms) What follows is oooooold and was lifted from the webpage
%% Rewrite?
-Protocol Buffers are a modern language-neutral, platform-neutral,
-extensible mechanism for sharing and storing structured data. They
-have been widely adopted in industry with applications as varied as Sony
-Playstations, Twitter, Google Search, Hadoop, and Open Street Map. While
-traditional IDLs were previously characterized by bloat and
-complexity, Protocol Buffers is based on a simple list and records
-model that is flexible and easy to use. Some of the key features
-provided by Protocol Buffers for data analysis include:
+Protocol Buffers can be described as a modern, language-neutral, platform-neutral,
+extensible mechanism for sharing and storing structured data. Since their
+introduction, Protocol Buffers have been widely adopted in industry with
+applications as varied as database-internal messaging (Drizzle), % DE: citation?
+Sony Playstations, Twitter, Google Search, Hadoop, and Open Street Map. While
+% TODO(DE): This either needs a citation, or remove the name drop
+traditional IDLs have at time been criticized for code bloat and
+complexity, Protocol Buffers are based on a simple list and records
+model that is compartively flexible and simple to use.
+Some of the key features provided by Protocol Buffers for data analysis
+include:
+
\begin{itemize}
\item \emph{Portable}: Allows users to send and receive data between
applications or different computers.
@@ -138,14 +147,14 @@
session. Common use cases include populating a request RPC protocol
buffer in R that is then serialized and sent over the network to a
remote server. The server would then deserialize the message, act on
-the request, and respond with a new protocol buffer over the network.
+the request, and respond with a new protocol buffer over the network. The key
+difference to, say, a request to an Rserve instance is that the remote server
+may not even know the R language.
%Protocol buffers are a language-neutral, platform-neutral, extensible
%way of serializing structured data for use in communications
%protocols, data storage, and more.
-
-
%Protocol Buffers offer key features such as an efficient data interchange
%format that is both language- and operating system-agnostic yet uses a
%lightweight and highly performant encoding, object serialization and
@@ -160,7 +169,7 @@
Many sources compare data serialization formats and show protocol
buffers very favorably to the alternatives, such
-as \citep{Sumaray:2012:CDS:2184751.2184810}
+as \citet{Sumaray:2012:CDS:2184751.2184810}
%The flexibility of the reflection-based API is particularly well
%suited for interactive data analysis.
Modified: papers/rjournal/eddelbuettel-francois-stokely.bib
===================================================================
--- papers/rjournal/eddelbuettel-francois-stokely.bib 2014-01-04 00:13:08 UTC (rev 699)
+++ papers/rjournal/eddelbuettel-francois-stokely.bib 2014-01-04 01:01:54 UTC (rev 700)
@@ -9,7 +9,7 @@
}
@Manual{msgpackR,
title = {msgpackR: A library to serialize or unserialize data in MessagePack format},
- author = {Mikiya TANIZAWA},
+ author = {Mikiya Tanizawa},
year = {2013},
note = {R package version 1.1},
url = {http://CRAN.R-project.org/package=msgpackR},
More information about the Rprotobuf-commits
mailing list