[Rprotobuf-commits] r772 - papers/jss
noreply at r-forge.r-project.org
noreply at r-forge.r-project.org
Tue Jan 14 01:41:36 CET 2014
Author: murray
Date: 2014-01-14 01:41:35 +0100 (Tue, 14 Jan 2014)
New Revision: 772
Modified:
papers/jss/article.Rnw
Log:
Edits to section 1 suggested by Karl.
1) Remove the Hadley Split-Apply-Combine reference for now as it is
confusing and more narrowly R-only than the
multi-platform-data-analysis pipeline pattern otherwise being
discussed in the first paragraph. We might be able to add it back
with suitable distinctions added.
2) Note that our goal is not just to safely transfer the data, but to
safely _and efficiently_ do so.
3) Rewrite the second to last paragraph a bit, and add a note
specifically at the end that this paper describes an R implementation
of protocol buffers.
Modified: papers/jss/article.Rnw
===================================================================
--- papers/jss/article.Rnw 2014-01-13 22:12:59 UTC (rev 771)
+++ papers/jss/article.Rnw 2014-01-14 00:41:35 UTC (rev 772)
@@ -119,9 +119,10 @@
built using collections of components to better manage software
complexity through reusability, modularity, and fault
isolation \citep{Wegiel:2010:CTT:1932682.1869479}.
-Data analysis patterns such as Split-Apply-Combine
-\citep{wickham2011split} explicitly break up large problems into
-manageable pieces. These patterns are frequently employed with
+% This is really a different pattern not connected well here.
+%Data analysis patterns such as Split-Apply-Combine
+%\citep{wickham2011split} explicitly break up large problems into manageable pieces.
+These pipelines are frequently built with
different programming languages used for the different phases of data
analysis -- collection, cleaning, modeling, analysis, post-processing, and
presentation in order to take advantage of the unique combination of
@@ -130,7 +131,7 @@
analysis pipeline may involve storing intermediate results in a
file or sending them over the network.
-Given these requirements, how do we safely share intermediate results
+Given these requirements, how do we safely and efficiently share intermediate results
between different applications, possibly written in different
languages, and possibly running on different computer system, possibly
spanning different operating systems? Programming
@@ -173,17 +174,27 @@
\emph{interface description language}, or \emph{IDL}. IDLs like
Protocol Buffers \citep{protobuf}, Apache Thrift, and Apache Avro provide a compact
well-documented schema for cross-language data structures and
-efficient binary interchange formats. The schema can be used to
-generate model classes for statically-typed programming languages such
-as C++ and Java, or can be used with reflection for dynamically-typed
-programming languages. Since the schema is provided separately from
-the encoded data, the data can be efficiently encoded to minimize
-storage costs of the stored data when compared with simple
-``schema-less'' binary interchange formats.
-Many sources compare data serialization formats and show Protocol
-Buffers very favorably to the alternatives; see
+efficient binary interchange formats.
+Since the schema is provided separately from the encoded data, the data can be
+efficiently encoded to minimize storage costs of the stored data when compared with simple
+``schema-less'' binary interchange formats. Many sources compare data serialization formats
+and show Protocol Buffers compare very favorably to the alternatives; see
\citet{Sumaray:2012:CDS:2184751.2184810} for one such comparison.
+The schema can be used to generate classes for statically-typed programming languages
+such as C++ and Java, or can be used with reflection for dynamically-typed programming
+languages
+% The schema can be used to
+%generate model classes for statically-typed programming languages such
+%as C++ and Java, or can be used with reflection for dynamically-typed
+%programming languages. Since the schema is provided separately from
+%the encoded data, the data can be efficiently encoded to minimize
+%storage costs of the stored data when compared with simple
+%``schema-less'' binary interchange formats.
+%Many sources compare data serialization formats and show Protocol
+%Buffers very
+
+
% TODO(mstokely): Take a more conversational tone here asking
% questions and motivating protocol buffers?
@@ -193,6 +204,7 @@
% in the middle (full class/method details) and interesting
% applications at the end.
+This paper describes an R interface to protocol buffers.
The rest of the paper is organized as follows. Section~\ref{sec:protobuf}
provides a general overview of Protocol Buffers.
Section~\ref{sec:rprotobuf-basic} describes the interactive R interface
More information about the Rprotobuf-commits
mailing list