[Rprotobuf-commits] r800 - papers/jss

noreply at r-forge.r-project.org noreply at r-forge.r-project.org
Tue Jan 21 04:09:26 CET 2014


Author: murray
Date: 2014-01-21 04:09:22 +0100 (Tue, 21 Jan 2014)
New Revision: 800

Modified:
   papers/jss/article.Rnw
Log:
Revert Jereoen's two sentences about MessagePack and BSON back to my
original three sentences, and add back the citations to the relevant R
packages.

Jereon's version is very dismissive of these formats as not being
widely used or compatible with the almighty JSON.

My version notes that these address real deficiencies of JSON for the
application domain we are talking about in this paper, and points out
the shortcomings they still have compared to protocol buffers.

This introduction is a natural flow of alternatives each slightly
better than the last one discussed: starting with CSV, then XML, then
JSON, then binary JSON, then protocol buffers.  For this application
domain, binary JSON is strictly better than JSON and so dismissive
comments if any should be oriented the other way towards traditional
text JSON.

The XML section Jereon and Dirk added is great, thanks.

We may still need one more sentence in the first paragraph making it
crystal clear what application domain / context is used for this
discussion of the alternatives.



Modified: papers/jss/article.Rnw
===================================================================
--- papers/jss/article.Rnw	2014-01-21 02:42:58 UTC (rev 799)
+++ papers/jss/article.Rnw	2014-01-21 03:09:22 UTC (rev 800)
@@ -112,9 +112,15 @@
 \maketitle
 
 %TODO(de) 'protocol buffers' or 'Protocol Buffers' ?
+% MS: Lets standardize on 'Protocol Buffers'?
 
 \section{Introduction} % TODO(DE) More sober: Friends don't let friends use CSV}
-
+% NOTE(MS): I really do think we can use add back:
+% \section{Introduction: Friends Don't Let Friends Use CSV}
+% I didn't use proper Title Caps the first time around but really I
+% think it makes the paper more readable to have a tl;dr intro title
+% that is fun and engaging since this paper is still on the dry/boring
+% side.
 Modern data collection and analysis pipelines increasingly involve collections
 of decoupled components in order to better manage software complexity 
 through reusability, modularity, and fault isolation \citep{Wegiel:2010:CTT:1932682.1869479}.
@@ -164,18 +170,30 @@
 availability of mature libraries and parsers). Because \texttt{XML} is text
 based and has no native notion of numeric types or arrays, it usually not a
 very practical format to store numeric datasets as they appear in statistical
-applications.  A more modern, widely used format is \emph{JavaScript Object
+applications.
+%
+A more modern, widely used format is \emph{JavaScript Object
   Notation} (\texttt{JSON}), which is derived from the object literals of
 \proglang{JavaScript}, and used increasingly on the world wide web. \texttt{JSON} natively
 supports arrays and distinguishes 4 primitive types: numbers, strings,
 booleans and null. However, as it too is a text-based format, numbers are
-stored as human-readable decimal notation which is somewhat inefficient and
+stored as human-readable decimal notation which is inefficient and
 leads to loss of type (double versus integer) and precision. Several R packages
 implement functions to parse and generate \texttt{JSON} data from R objects.
-A number of \texttt{JSON} variants has been proposed, such as \texttt{BSON}
-and \texttt{MessagePack} which both add binary support. However, these
-derivatives are not compatible with existing JSON software, and have not seen
-wide adoption.
+
+A number of binary formats based on \texttt{JSON} have been proposed
+that reduce the parsing cost and improve efficiency.  \pkg{MessagePack}
+\citep{msgpackR} and \pkg{BSON} \citep{rmongodb} both have R
+interfaces, but these formats lack a separate schema for the seralized
+data and thus still duplicate field names with each message sent over
+the network or stored in a file.  Such formats also lack support for
+versioning when data storage needs evolve over time, or when
+application logic and requirement changes dictate updates to the
+message format.
+
+%and \texttt{MessagePack} which both add binary support. However, these
+%derivatives are not compatible with existing JSON software, and have not seen
+%wide adoption.
  
 %\paragraph*{Enter Protocol Buffers:}
 In 2008, and following several years of internal use, Google released an open



More information about the Rprotobuf-commits mailing list