[Rprotobuf-commits] r772 - papers/jss

Tue Jan 14 01:41:36 CET 2014

Author: murray
Date: 2014-01-14 01:41:35 +0100 (Tue, 14 Jan 2014)
New Revision: 772

Modified:
   papers/jss/article.Rnw
Log:
Edits to section 1 suggested by Karl.

1) Remove the Hadley Split-Apply-Combine reference for now as it is
confusing and more narrowly R-only than the
multi-platform-data-analysis pipeline pattern otherwise being
discussed in the first paragraph.  We might be able to add it back
with suitable distinctions added.

2) Note that our goal is not just to safely transfer the data, but to
safely _and efficiently_ do so.

3) Rewrite the second to last paragraph a bit, and add a note
specifically at the end that this paper describes an R implementation
of protocol buffers.



Modified: papers/jss/article.Rnw
===================================================================

--- papers/jss/article.Rnw	2014-01-13 22:12:59 UTC (rev 771)
+++ papers/jss/article.Rnw	2014-01-14 00:41:35 UTC (rev 772)
@@ -119,9 +119,10 @@
 built using collections of components to better manage software
 complexity through reusability, modularity, and fault
 isolation \citep{Wegiel:2010:CTT:1932682.1869479}.
-Data analysis patterns such as Split-Apply-Combine
-\citep{wickham2011split} explicitly break up large problems into
-manageable pieces.  These patterns are frequently employed with
+% This is really a different pattern not connected well here.
+%Data analysis patterns such as Split-Apply-Combine
+%\citep{wickham2011split} explicitly break up large problems into manageable pieces.  
+These pipelines are frequently built with
 different programming languages used for the different phases of data
 analysis -- collection, cleaning, modeling, analysis, post-processing, and
 presentation in order to take advantage of the unique combination of
@@ -130,7 +131,7 @@
 analysis pipeline may involve storing intermediate results in a
 file or sending them over the network.
 
-Given these requirements, how do we safely share intermediate results
+Given these requirements, how do we safely and efficiently share intermediate results
 between different applications, possibly written in different
 languages, and possibly running on different computer system, possibly
 spanning different operating systems?  Programming
@@ -173,17 +174,27 @@
 \emph{interface description language}, or \emph{IDL}.  IDLs like
 Protocol Buffers \citep{protobuf}, Apache Thrift, and Apache Avro provide a compact
 well-documented schema for cross-language data structures and
-efficient binary interchange formats.  The schema can be used to
-generate model classes for statically-typed programming languages such
-as C++ and Java, or can be used with reflection for dynamically-typed
-programming languages.  Since the schema is provided separately from
-the encoded data, the data can be efficiently encoded to minimize
-storage costs of the stored data when compared with simple
-``schema-less'' binary interchange formats.
-Many sources compare data serialization formats and show Protocol
-Buffers very favorably to the alternatives; see
+efficient binary interchange formats.
+Since the schema is provided separately from the encoded data, the data can be
+efficiently encoded to minimize storage costs of the stored data when compared with simple
+``schema-less'' binary interchange formats. Many sources compare data serialization formats
+and show Protocol Buffers compare very favorably to the alternatives; see
 \citet{Sumaray:2012:CDS:2184751.2184810} for one such comparison.
+The schema can be used to generate classes for statically-typed programming languages
+such as C++ and Java, or can be used with reflection for dynamically-typed programming
+languages
 
+%  The schema can be used to
+%generate model classes for statically-typed programming languages such
+%as C++ and Java, or can be used with reflection for dynamically-typed
+%programming languages.  Since the schema is provided separately from
+%the encoded data, the data can be efficiently encoded to minimize
+%storage costs of the stored data when compared with simple
+%``schema-less'' binary interchange formats.
+%Many sources compare data serialization formats and show Protocol
+%Buffers very 
+
+
 % TODO(mstokely): Take a more conversational tone here asking
 % questions and motivating protocol buffers?
 
@@ -193,6 +204,7 @@
 % in the middle (full class/method details) and interesting
 % applications at the end.
 
+This paper describes an R interface to protocol buffers.
 The rest of the paper is organized as follows. Section~\ref{sec:protobuf}
 provides a general overview of Protocol Buffers.
 Section~\ref{sec:rprotobuf-basic} describes the interactive R interface