From noreply at r-forge.r-project.org  Mon Dec  1 02:08:24 2014
From: noreply at r-forge.r-project.org (noreply at r-forge.r-project.org)
Date: Mon,  1 Dec 2014 02:08:24 +0100 (CET)
Subject: [Rprotobuf-commits] r919 - papers/jss
Message-ID: <20141201010824.48124187867@r-forge.r-project.org>

Author: jeroenooms
Date: 2014-12-01 02:08:23 +0100 (Mon, 01 Dec 2014)
New Revision: 919

Modified:
   papers/jss/article.Rnw
   papers/jss/article.bib
Log:
Update citations

Modified: papers/jss/article.Rnw
===================================================================
--- papers/jss/article.Rnw	2014-11-30 22:42:55 UTC (rev 918)
+++ papers/jss/article.Rnw	2014-12-01 01:08:23 UTC (rev 919)
@@ -91,7 +91,7 @@
   University of California\\
   Los Angeles, CA, USA\\
   E-mail: \email{jeroen.ooms at stat.ucla.edu}\\
-  URL: \url{http://jeroenooms.github.io}
+  URL: \url{https://jeroenooms.github.io}
 }
 %% It is also possible to add a telephone and fax number
 %% before the e-mail in the following format:

Modified: papers/jss/article.bib
===================================================================
--- papers/jss/article.bib	2014-11-30 22:42:55 UTC (rev 918)
+++ papers/jss/article.bib	2014-12-01 01:08:23 UTC (rev 919)
@@ -117,12 +117,12 @@
   url =          {http://CRAN.R-project.org/package=rjson},
 }
 
- at Manual{jsonlite,
-  title =         {jsonlite: A Smarter JSON Encoder/Decoder for R},
+ at article{jsonlite,
+  title =         {The jsonlite Package: A Practical and Consistent Mapping Between JSON Data and R Objects},
+  journal =       {arXiv: Computation (stat.CO); Mathematical Software (cs.MS); Software Engineering (cs.SE)},
   author =        {Jeroen Ooms},
   year =          2014,
-  note =          {R package version 0.9.4},
-  url =           {http://github.com/jeroenooms/jsonlite#readme},
+  url =           {http://arxiv.org/abs/1403.2805},
 }
 
 @Manual{rmongodb,
@@ -457,13 +457,12 @@
   url =          {http://CRAN.R-project.org/package=httr},
 }
 
- at Manual{opencpu,
-  title =        {OpenCPU System for Embedded Statistical Computation
-                  and Reproducible Research},
+ at article{opencpu,
+  journal =      {arXiv: Computation (stat.CO); Mathematical Software (cs.MS); Software Engineering (cs.SE)},
+  title =        {The OpenCPU System: Towards a Universal Interface for Scientific Computing through Separation of Concerns},
   author =       {Jeroen Ooms},
-  year =         2013,
-  note =         {R package version 1.2.2},
-  url =          {http://www.opencpu.org},
+  year =         2014,
+  url =          {http://arxiv.org/abs/1406.4806},
 }
 
 @article{shafranovich2005common,


From noreply at r-forge.r-project.org  Mon Dec  1 03:00:55 2014
From: noreply at r-forge.r-project.org (noreply at r-forge.r-project.org)
Date: Mon,  1 Dec 2014 03:00:55 +0100 (CET)
Subject: [Rprotobuf-commits] r920 - papers/jss
Message-ID: <20141201020055.82D34183E72@r-forge.r-project.org>

Author: jeroenooms
Date: 2014-12-01 03:00:53 +0100 (Mon, 01 Dec 2014)
New Revision: 920

Modified:
   papers/jss/article.Rnw
Log:
Use shorter ocpu URL

Modified: papers/jss/article.Rnw
===================================================================
--- papers/jss/article.Rnw	2014-12-01 01:08:23 UTC (rev 919)
+++ papers/jss/article.Rnw	2014-12-01 02:00:53 UTC (rev 920)
@@ -1263,7 +1263,7 @@
 client performs the following HTTP request:
 
 \begin{verbatim}
-  GET https://public.opencpu.org/ocpu/library/MASS/data/Animals/pb
+  GET https://demo.ocpu.io/MASS/data/Animals/pb
 \end{verbatim}
 The postfix \code{/pb} in the URL tells the server to send this
 object in the form of a Protocol Buffer message.
@@ -1286,7 +1286,7 @@
 library("RProtoBuf")
 library("httr")
 
-req <- GET('https://public.opencpu.org/ocpu/library/MASS/data/Animals/pb')
+req <- GET('https://demo.ocpu.io/MASS/data/Animals/pb')
 output <- unserialize_pb(req$content)
 
 identical(output, MASS::Animals)
@@ -1311,7 +1311,7 @@
 import urllib2
 from rexp_pb2 import REXP
 
-req = urllib2.Request('https://public.opencpu.org/ocpu/library/MASS/data/Animals/pb')
+req = urllib2.Request('https://demo.ocpu.io/MASS/data/Animals/pb')
 res = urllib2.urlopen(req)
 
 msg = REXP()
@@ -1349,7 +1349,7 @@
 payload <- serialize_pb(args, NULL)
 
 req <- POST (
-  url = "https://public.opencpu.org/ocpu/library/stats/R/rnorm/pb",
+  url = "https://demo.ocpu.io/stats/R/rnorm/pb",
   body = payload,
   add_headers (
     "Content-Type" = "application/x-protobuf"


From noreply at r-forge.r-project.org  Mon Dec  1 08:58:25 2014
From: noreply at r-forge.r-project.org (noreply at r-forge.r-project.org)
Date: Mon,  1 Dec 2014 08:58:25 +0100 (CET)
Subject: [Rprotobuf-commits] r921 - papers/jss
Message-ID: <20141201075825.D1A561875B9@r-forge.r-project.org>

Author: jeroenooms
Date: 2014-12-01 08:58:25 +0100 (Mon, 01 Dec 2014)
New Revision: 921

Modified:
   papers/jss/article.Rnw
Log:
Rewrite mapreduce introduction. 

Modified: papers/jss/article.Rnw
===================================================================
--- papers/jss/article.Rnw	2014-12-01 02:00:53 UTC (rev 920)
+++ papers/jss/article.Rnw	2014-12-01 07:58:25 UTC (rev 921)
@@ -1063,37 +1063,35 @@
 \section{Application: Distributed data collection with MapReduce}
 \label{sec:mapreduce}
 
-Protocol Buffers have been used extensively at Google for almost all
-RPC protocols, and for storing structured information in a variety of
-persistent storage systems since 2000 \citep{dean2009designs}.  The
-\pkg{RProtoBuf} package has been in widespread use by hundreds of
-statisticians and software engineers at Google since 2010.  This
-section describes a simplified example of a common design pattern of
-collecting a large structured data set in one language for later
-analysis in \proglang{R}.
+Protocol Buffers are used extensively at Google for almost all
+RPC protocols, and to store structured information on a variety of
+persistent storage systems \citep{dean2009designs}. Since the 
+initial release in 2010, hundreds of Google's statisticians and
+software engineers use the \pkg{RProtoBuf} package on daily basis 
+to interact with these systems from within \proglang{R}.
+The current section illustrates the power of Protocol Buffers to
+collect and manage large structured data in one language 
+before analyzing it in \proglang{R}. Our example uses MapReduce
+\citep{dean2008mapreduce}, which has emerged in the last
+decade as a popular design pattern to facilitate parallel 
+processing of big data using distributed computing clusters.
 
-Many large data sets in fields such as particle physics and information
-processing are stored in binned or histogram form in order to reduce
-the data storage requirements \citep{scott2009multivariate}.  In the
-last decade, the MapReduce programming model \citep{dean2008mapreduce}
-has emerged as a popular design pattern that enables the processing of
-very large data sets on large compute clusters.
-
-Many types of data analysis over large data sets may involve very rare
+Big data sets in fields such as particle physics and information
+processing are often stored in binned (histogram) form in order 
+to reduce storage requirements \citep{scott2009multivariate}. 
+Because analysis over such large data sets may involve very rare
 phenomenon or deal with highly skewed data sets or inflexible
-raw data storage systems from which unbiased sampling is not feasible.
-In such situations, MapReduce and binning may be combined as a
+raw data storage systems, unbiased sampling is often not feasible.
+In these situations, MapReduce and binning may be combined as a
 pre-processing step for a wide range of statistical and scientific
 analyses \citep{blocker2013}.
 
 There are two common patterns for generating histograms of large data
-sets in a single pass with MapReduce.  In the first method, each
+sets in a single pass with MapReduce. In the first method, each
 mapper task generates a histogram over a subset of the data that it
 has been assigned, serializes this histogram and sends it to one or
 more reducer tasks which merge the intermediate histograms from the
-mappers.
-
-In the second method, illustrated in
+mappers. In the second method, illustrated in
 Figure~\ref{fig:mr-histogram-pattern1}, each mapper rounds a data
 point to a bucket width and outputs that bucket as a key and '1' as a
 value.  Reducers then sum up all of the values with the same key and


From noreply at r-forge.r-project.org  Mon Dec  1 22:54:51 2014
From: noreply at r-forge.r-project.org (noreply at r-forge.r-project.org)
Date: Mon,  1 Dec 2014 22:54:51 +0100 (CET)
Subject: [Rprotobuf-commits] r922 - papers/jss
Message-ID: <20141201215452.03F3B18788A@r-forge.r-project.org>

Author: murray
Date: 2014-12-01 22:54:51 +0100 (Mon, 01 Dec 2014)
New Revision: 922

Modified:
   papers/jss/article.Rnw
Log:
Make the dotted y=x line in figure 2 dashed with a bigger width to
make it more visible.  Suggested by Steve Scott.  Also add some
commented out code to add line numbers to make review copies for folks
that have offered to do a final review before our resubmit.


Modified: papers/jss/article.Rnw
===================================================================
--- papers/jss/article.Rnw	2014-12-01 07:58:25 UTC (rev 921)
+++ papers/jss/article.Rnw	2014-12-01 21:54:51 UTC (rev 922)
@@ -3,6 +3,10 @@
 \usepackage{listings}
 \usepackage[toc,page]{appendix}
 
+% Line numbers for drafts.
+%\usepackage[switch, modulo]{lineno}
+%\linenumbers
+
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 % Spelling Standardization:
 % Protocol Buffers, not protocol buffers
@@ -1010,7 +1014,7 @@
 plot(clean.df$savings.serialized, clean.df$savings.rprotobuf, pch=1, col="red", las=1, xlab="Serialization Space Savings", ylab="Protocol Buffer Space Savings")
 points(clean.df$savings.serialized.gz, clean.df$savings.rprotobuf.gz,pch=2, col="blue")
 # grey dotted diagonal
-abline(a=0,b=1, col="grey",lty=3)
+abline(a=0,b=1, col="grey",lty=2,lwd=3)
 
 # find point furthest off the X axis.
 clean.df$savings.diff <- clean.df$savings.serialized - clean.df$savings.rprotobuf
@@ -1056,7 +1060,7 @@
 \hline
 \end{tabular}
 \end{center}
-\caption{(Top) Relative space savings of Protocol Buffers and native \proglang{R} serialization over the raw object sizes of each of the \Sexpr{n} data sets in the \pkg{datasets} package. Points to the left of the dotted $y=x$ line represent datasets that are more efficiently encoded with Protocol Buffers. (Bottom) Absolute space savings of two outlier datasets and the aggregate performance of all datasets.}
+\caption{(Top) Relative space savings of Protocol Buffers and native \proglang{R} serialization over the raw object sizes of each of the \Sexpr{n} data sets in the \pkg{datasets} package. Points to the left of the dashed $y=x$ line represent datasets that are more efficiently encoded with Protocol Buffers. (Bottom) Absolute space savings of two outlier datasets and the aggregate performance of all datasets.}
 \label{fig:compression}
 \end{figure}
 

From noreply at r-forge.r-project.org  Mon Dec  1 22:58:07 2014
From: noreply at r-forge.r-project.org (noreply at r-forge.r-project.org)
Date: Mon,  1 Dec 2014 22:58:07 +0100 (CET)
Subject: [Rprotobuf-commits] r923 - papers/jss
Message-ID: <20141201215807.A725B18788A@r-forge.r-project.org>

Author: murray
Date: 2014-12-01 22:58:07 +0100 (Mon, 01 Dec 2014)
New Revision: 923

Modified:
   papers/jss/article.Rnw
Log:
Add a missing article to Jeroen's nice rewording of this section.


Modified: papers/jss/article.Rnw
===================================================================
--- papers/jss/article.Rnw	2014-12-01 21:54:51 UTC (rev 922)
+++ papers/jss/article.Rnw	2014-12-01 21:58:07 UTC (rev 923)
@@ -1069,9 +1069,9 @@
 
 Protocol Buffers are used extensively at Google for almost all
 RPC protocols, and to store structured information on a variety of
-persistent storage systems \citep{dean2009designs}. Since the 
+persistent storage systems \citep{dean2009designs}. Since the
 initial release in 2010, hundreds of Google's statisticians and
-software engineers use the \pkg{RProtoBuf} package on daily basis 
+software engineers use the \pkg{RProtoBuf} package on a daily basis
 to interact with these systems from within \proglang{R}.
 The current section illustrates the power of Protocol Buffers to
 collect and manage large structured data in one language 


From noreply at r-forge.r-project.org  Mon Dec  1 23:53:20 2014
From: noreply at r-forge.r-project.org (noreply at r-forge.r-project.org)
Date: Mon,  1 Dec 2014 23:53:20 +0100 (CET)
Subject: [Rprotobuf-commits] r924 - in pkg: . R inst
Message-ID: <20141201225320.994B6187870@r-forge.r-project.org>

Author: murray
Date: 2014-12-01 23:53:20 +0100 (Mon, 01 Dec 2014)
New Revision: 924

Modified:
   pkg/ChangeLog
   pkg/R/serialize.R
   pkg/R/wrapper_ZeroCopyInputStream.R
   pkg/inst/NEWS.Rd
Log:
Address a FIXME in the code and comment from JSS referee about
aboiding file.create to get absolute pathname of temporary file.  Use
normalizePath with mustWork=FALSE as suggested by Jeroen.


Modified: pkg/ChangeLog
===================================================================
--- pkg/ChangeLog	2014-12-01 21:58:07 UTC (rev 923)
+++ pkg/ChangeLog	2014-12-01 22:53:20 UTC (rev 924)
@@ -1,3 +1,9 @@
+2014-12-01  Murray Stokely  <mstokely at google.com>
+
+	* R/wrapper_ZeroCopyInputStream.R: Avoid file.create for getting
+	absolute path of a temporary file name (JSS reviewer feedback)
+	* R/serialize.R: Idem.
+
 2014-11-26  Murray Stokely  <mstokely at google.com>
 
 	Address feedback from anonymous reviewer for JSS to make this

Modified: pkg/R/serialize.R
===================================================================
--- pkg/R/serialize.R	2014-12-01 21:58:07 UTC (rev 923)
+++ pkg/R/serialize.R	2014-12-01 22:53:20 UTC (rev 924)
@@ -14,10 +14,10 @@
 		if( is.character( connection ) ){
 			# pretend it is a file name
 			if( !file.exists(connection) ){
-				# FIXME: hack to grab the absolute path name
-				file.create( connection )
-				file <- file_path_as_absolute(connection)
-				unlink( file )
+				if( !file.exists( dirname(connection) ) ){
+					stop( "directory does not exist" )
+				}
+				file <- normalizePath(connection, mustWork=FALSE)
 			} else{
 				file <- file_path_as_absolute(connection)
 			}

Modified: pkg/R/wrapper_ZeroCopyInputStream.R
===================================================================
--- pkg/R/wrapper_ZeroCopyInputStream.R	2014-12-01 21:58:07 UTC (rev 923)
+++ pkg/R/wrapper_ZeroCopyInputStream.R	2014-12-01 22:53:20 UTC (rev 924)
@@ -128,9 +128,7 @@
 		if( !file.exists( dirname(filename) ) ){
 			stop( "directory does not exist" )
 		}
-		file.create( filename )
-		filename <- file_path_as_absolute(filename)
-		unlink( filename )
+		filename <- normalizePath(filename, mustWork=FALSE)
 	} else{
 		filename <- file_path_as_absolute(filename)
 	}

Modified: pkg/inst/NEWS.Rd
===================================================================
--- pkg/inst/NEWS.Rd	2014-12-01 21:58:07 UTC (rev 923)
+++ pkg/inst/NEWS.Rd	2014-12-01 22:53:20 UTC (rev 924)
@@ -2,7 +2,7 @@
 \title{News for Package \pkg{RProtoBuf}}
 \newcommand{\cpkg}{\href{http://CRAN.R-project.org/package=#1}{\pkg{#1}}}
 
-\section{Changes in RProtoBuf version 0.4.2 (2014-??-??)}{
+\section{Changes in RProtoBuf version 0.4.2 (2014-12-??)}{
   \itemize{
     \item Address changes suggested by anonymous reviewers for our
     Journal of Statistical Software submission.
@@ -23,6 +23,8 @@
       with \code{serialize_pb} and \code{unserialize_pb} to make it
       easy to serialize into a protocol buffer all 100+ of the
       built-in datasets with R.
+    \item Use \code{normalizePath} instead of creating a temporary
+    file with \code{file.create} when getting absolute path names.
     \item Add unit tests for all of the above.
 }
 

From noreply at r-forge.r-project.org  Tue Dec  2 01:40:57 2014
From: noreply at r-forge.r-project.org (noreply at r-forge.r-project.org)
Date: Tue,  2 Dec 2014 01:40:57 +0100 (CET)
Subject: [Rprotobuf-commits] r925 - papers/jss
Message-ID: <20141202004057.2E47E187863@r-forge.r-project.org>

Author: murray
Date: 2014-12-02 01:40:56 +0100 (Tue, 02 Dec 2014)
New Revision: 925

Modified:
   papers/jss/article.Rnw
Log:
Grammatical improvements throughout the paper suggested by Tim
Hesterberg.


Modified: papers/jss/article.Rnw
===================================================================
--- papers/jss/article.Rnw	2014-12-01 22:53:20 UTC (rev 924)
+++ papers/jss/article.Rnw	2014-12-02 00:40:56 UTC (rev 925)
@@ -172,20 +172,20 @@
 lacks type-safety, and has limited precision for numeric values.  Moreover,
 ambiguities in the format itself frequently cause problems.  For example,
 conventions on which characters is used as separator or decimal point vary by
-country.  \emph{Extensible Markup Language} (\code{XML}) is another
+country.  \emph{Extensible Markup Language} (\code{XML}) is a
 well-established and widely-supported format with the ability to define just
 about any arbitrarily complex schema \citep{nolan2013xml}. However, it pays
 for this complexity with comparatively large and verbose messages, and added
-complexity at the parsing side (which are somewhat mitigated by the
-availability of mature libraries and parsers). Because \code{XML} is 
+complexity at the parsing side (these problems are somewhat mitigated by the
+availability of mature libraries and parsers). Because \code{XML} is
 text-based and has no native notion of numeric types or arrays, it usually not a
 very practical format to store numeric data sets as they appear in statistical
 applications.
 
 
-A more modern format is \emph{JavaScript ObjectNotation} 
+A more modern format is \emph{JavaScript ObjectNotation}
 (\code{JSON}), which is derived from the object literals of
-\proglang{JavaScript}, and already widely-used on the world wide web. 
+\proglang{JavaScript}, and already widely-used on the world wide web.
 Several \proglang{R} packages implement functions to parse and generate
 \code{JSON} data from \proglang{R} objects \citep{rjson,RJSONIO,jsonlite}.
 \code{JSON} natively supports arrays and four primitive types: numbers, strings,
@@ -220,11 +220,11 @@
 Section~\ref{sec:rprotobuf-basic} describes the interactive \proglang{R} interface
 provided by the \pkg{RProtoBuf} package, and introduces the two main abstractions:
 \emph{Messages} and \emph{Descriptors}.  Section~\ref{sec:rprotobuf-classes}
-details the implementation of the main S4 classes and methods.  
+details the implementation of the main S4 classes and methods.
 Section~\ref{sec:types} describes the challenges of type coercion
 between \proglang{R} and other languages.  Section~\ref{sec:evaluation} introduces a
-general \proglang{R} language schema for serializing arbitrary \proglang{R} objects and evaluates
-it against the serialization capabilities built directly into \proglang{R}.  Sections~\ref{sec:mapreduce}
+general \proglang{R} language schema for serializing arbitrary \proglang{R} objects and compares it to
+the serialization capabilities built directly into \proglang{R}.  Sections~\ref{sec:mapreduce}
 and \ref{sec:opencpu} provide real-world use cases of \pkg{RProtoBuf}
 in MapReduce and web service environments, respectively, before
 Section~\ref{sec:summary} concludes.
@@ -233,8 +233,8 @@
 \label{sec:protobuf}
 
 Protocol Buffers are a modern, language-neutral, platform-neutral,
-extensible mechanism for sharing and storing structured data.  Some of
-the key features provided by Protocol Buffers for data analysis are:
+extensible mechanism for sharing and storing structured data.  Key
+features provided by Protocol Buffers for data analysis include:
 
 \begin{itemize}
 \item \emph{Portable}:  Enable users to send and receive data between
@@ -260,9 +260,9 @@
 communication work flow with Protocol Buffers and an interactive \proglang{R} session.
 Common use cases include populating a request remote-procedure call (RPC)
 Protocol Buffer in \proglang{R} that is then serialized and sent over the network to a
-remote server.  The server would then deserialize the message, act on the
-request, and respond with a new Protocol Buffer over the network. 
-The key difference to, say, a request to an \pkg{Rserve} 
+remote server.  The server deserializes the message, acts on the
+request, and responds with a new Protocol Buffer over the network.
+The key difference to, say, a request to an \pkg{Rserve}
 \citep{Urbanek:2003:Rserve,CRAN:Rserve} instance is that
 the remote server may be implemented in any language.
 %, with no dependence on \proglang{R}.
@@ -367,8 +367,8 @@
 
 \subsection*{Importing message descriptors from \code{.proto} files}
 
-To create or parse a Protocol Buffer Message, one must first read in 
-the message type specification from a \code{.proto} file. 
+To create or parse a Protocol Buffer Message, one must first read in
+the message descriptor (\emph{message type}) from a \code{.proto} file.
 A small number of message types are imported when the package is first
 loaded, including the \code{tutorial.Person} type we saw in the last
 section.
@@ -472,8 +472,8 @@
 
 % \subsection{Serializing messages}
 
-One of the primary benefits of Protocol Buffers is the efficient
-binary wire-format representation.  
+A primary benefit of Protocol Buffers is an efficient
+binary wire-format representation.
 The \code{serialize} method is implemented for
 Protocol Buffer messages to serialize a message into a sequence of
 bytes (raw vector) that represents the message.
@@ -1098,8 +1098,8 @@
 mappers. In the second method, illustrated in
 Figure~\ref{fig:mr-histogram-pattern1}, each mapper rounds a data
 point to a bucket width and outputs that bucket as a key and '1' as a
-value.  Reducers then sum up all of the values with the same key and
-output to a data store.
+value.  Reducers count how many times each key occurs and outputs a
+histogram to a data store.
 
 \begin{figure}[h!]
 \begin{center}
@@ -1154,20 +1154,17 @@
 
 \begin{Code}
 from histogram_pb2 import HistogramState;
-
 hist = HistogramState()
-
 hist.counts.extend([2, 6, 2, 4, 6])
 hist.breaks.extend(range(6))
 hist.name="Example Histogram Created in Python"
-
 outfile = open("/tmp/hist.pb", "wb")
 outfile.write(hist.SerializeToString())
 outfile.close()
 \end{Code}
 
 The Protocol Buffer created from this \proglang{Python} script can then be read into \proglang{R} and converted to a native
-\proglang{R} histogram object for plotting.  Line~1 in the listing below attaches the \pkg{HistogramTools} package which imports \pkg{RProtoBuf}.  Line~2 then reads all of the \code{.proto} descriptor definitions provided by \pkg{HistogramTools} and adds them to the environment as described in Section~\ref{sec:rprotobuf-basic}.  Line~3 parses the serialized protocol buffer using the \code{HistogramTools.HistogramState} schema.  Line~8 converts the protocol buffer representation of the histogram to a native \proglang{R} histogram object with \code{as.histogram} and passes the result to \code{plot}.
+\proglang{R} histogram object for plotting.  Line~1 in the listing below attaches the \pkg{HistogramTools} package which imports \pkg{RProtoBuf}.  Line~2 then reads all of the \code{.proto} descriptor definitions provided by \pkg{HistogramTools} and adds them to the environment as described in Section~\ref{sec:rprotobuf-basic}.  Line~3 parses the serialized protocol buffer using the \code{HistogramTools.HistogramState} schema.  The last line converts the protocol buffer representation of the histogram to a native \proglang{R} histogram object with \code{as.histogram} and passes the result to \code{plot}.
 
 % Here, the schema is read first,
 %then the (serialized) histogram is read into the variable \code{hist} which
@@ -1220,7 +1217,7 @@
 \label{sec:opencpu}
 
 The previous section described an application where data from a
-program written in another language was output to persistent storage
+program written in another language was saved to persistent storage
 and then read into \proglang{R} for further analysis.  This section
 describes another common use case where Protocol Buffers are used as
 the interchange format for client-server communication.
@@ -1232,7 +1229,7 @@
 multimedia content.  When designing systems where various components require
 exchange of specific data structures, we need something on top of the network
 protocol that prescribes how these structures are to be represented in
-messages (buffers) on the network. Protocol Buffers solve exactly this
+messages (buffers) on the network. Protocol Buffers solve this
 problem by providing a cross-platform method for serializing arbitrary
 structures into well defined messages, which can then be exchanged using any
 protocol.
@@ -1312,10 +1309,8 @@
 \begin{verbatim}
 import urllib2
 from rexp_pb2 import REXP
-
 req = urllib2.Request('https://demo.ocpu.io/MASS/data/Animals/pb')
 res = urllib2.urlopen(req)
-
 msg = REXP()
 msg.ParseFromString(res.read())
 print(msg)
@@ -1394,7 +1389,7 @@
 users of \pkg{RProtoBuf} using it to read data from and otherwise interact
 with distributed systems written in \proglang{C++}, \proglang{Java}, \proglang{Python}, and 
 other languages. We hope that making Protocol Buffers available to the
-\proglang{R} community will contribute towards better software integration
+\proglang{R} community will contribute to better software integration
 and allow for building even more advanced applications and analysis pipelines 
 with \proglang{R}.
 
@@ -1465,7 +1460,7 @@
   repeated REXP attrValue = 12;
   optional bytes languageValue = 13;
   optional bytes environmentValue = 14;
-  optional bytes functionValue = 14;
+  optional bytes functionValue = 15;
 }
 message STRING {
   optional string strval = 1;


From noreply at r-forge.r-project.org  Tue Dec  2 04:39:47 2014
From: noreply at r-forge.r-project.org (noreply at r-forge.r-project.org)
Date: Tue,  2 Dec 2014 04:39:47 +0100 (CET)
Subject: [Rprotobuf-commits] r926 - in pkg: . inst man
Message-ID: <20141202033947.B597D184C54@r-forge.r-project.org>

Author: edd
Date: 2014-12-02 04:39:46 +0100 (Tue, 02 Dec 2014)
New Revision: 926

Modified:
   pkg/ChangeLog
   pkg/inst/NEWS.Rd
   pkg/man/Descriptor-class.Rd
   pkg/man/EnumDescriptor-class.Rd
   pkg/man/Message-class.Rd
Log:
minor fixes for documentation


Modified: pkg/ChangeLog
===================================================================
--- pkg/ChangeLog	2014-12-02 00:40:56 UTC (rev 925)
+++ pkg/ChangeLog	2014-12-02 03:39:46 UTC (rev 926)
@@ -1,3 +1,9 @@
+2014-12-01  Dirk Eddelbuettel  <edd at debian.org>
+
+	* man/Message-class.Rd: Completed documentation
+	* man/Descriptor-class.Rd: Ditto
+	* man/EnumDescriptor-class.Rd: Ditto
+
 2014-12-01  Murray Stokely  <mstokely at google.com>
 
 	* R/wrapper_ZeroCopyInputStream.R: Avoid file.create for getting

Modified: pkg/inst/NEWS.Rd
===================================================================
--- pkg/inst/NEWS.Rd	2014-12-02 00:40:56 UTC (rev 925)
+++ pkg/inst/NEWS.Rd	2014-12-02 03:39:46 UTC (rev 926)
@@ -26,6 +26,7 @@
     \item Use \code{normalizePath} instead of creating a temporary
     file with \code{file.create} when getting absolute path names.
     \item Add unit tests for all of the above.
+  }
 }
 
 \section{Changes in RProtoBuf version 0.4.1 (2014-03-25)}{

Modified: pkg/man/Descriptor-class.Rd
===================================================================
--- pkg/man/Descriptor-class.Rd	2014-12-02 00:40:56 UTC (rev 925)
+++ pkg/man/Descriptor-class.Rd	2014-12-02 03:39:46 UTC (rev 926)
@@ -15,6 +15,9 @@
 \alias{field,Descriptor-method}
 \alias{nested_type,Descriptor-method}
 \alias{enum_type,Descriptor,ANY,ANY-method}
+\alias{[[,Descriptor-method}
+\alias{names,Descriptor-method}
+\alias{length,Descriptor-method}
 
 \title{Class "Descriptor" }
 \description{ full descriptive information about a protocol buffer 
@@ -81,6 +84,9 @@
 			If \code{name} is used, the enum type will be retrieved
 			using its name, with the \code{FindEnumTypeByName} C++ method
 	}
+    \item{[[}{\code{signature(x = "Descriptor")}: extracts a field identified by its name or declared tag number}
+    \item{names}{\code{signature(x = "Descriptor")} : extracts names of this descriptor}
+    \item{length}{\code{signature(x = "Descriptor")} : extracts length of this descriptor}
 
   }
 }

Modified: pkg/man/EnumDescriptor-class.Rd
===================================================================
--- pkg/man/EnumDescriptor-class.Rd	2014-12-02 00:40:56 UTC (rev 925)
+++ pkg/man/EnumDescriptor-class.Rd	2014-12-02 03:39:46 UTC (rev 926)
@@ -18,6 +18,9 @@
 \alias{value-methods}
 \alias{value,EnumDescriptor-method}
 
+\alias{[[,EnumDescriptor-method}
+\alias{names,EnumDescriptor-method}
+
 \title{Class "EnumDescriptor"  }
 \description{ R representation of an enum descriptor. This 
 is a thin wrapper around the \code{EnumDescriptor} c++ class. }
@@ -60,6 +63,8 @@
       using the name of the constant, using the \code{FindValueByName}
       C++ method.
     }
+    \item{[[}{\code{signature(x = "EnumDescriptor")}: extracts field identified by its name or declared tag number}
+    \item{names}{\code{signature(x = "EnumDescriptor")} : extracts names of this enum}
   }
   	
 }

Modified: pkg/man/Message-class.Rd
===================================================================
--- pkg/man/Message-class.Rd	2014-12-02 00:40:56 UTC (rev 925)
+++ pkg/man/Message-class.Rd	2014-12-02 03:39:46 UTC (rev 926)
@@ -11,6 +11,7 @@
 \alias{show,Message-method}
 \alias{update,Message-method}
 \alias{length,Message-method}
+\alias{names,Message-method}
 \alias{str,Message-method}
 \alias{toString,Message-method}
 \alias{identical,Message,Message-method}
@@ -68,7 +69,9 @@
     \item{==}{\code{signature(e1 = "Message", e2 = "Message")}: Same as \code{identical} }
     \item{!=}{\code{signature(e1 = "Message", e2 = "Message")}: Negation of \code{identical} }
     \item{all.equal}{\code{signature(e1 = "Message", e2 = "Message")}: Test near equality }
-     }
+    \item{names}{\code{signature(x = "Message")}: extracts the names of the message. }
+
+  }
 }
 \references{ 
 	The \code{Message} class from the C++ proto library.


From mstokely at google.com  Tue Dec  2 04:42:10 2014
From: mstokely at google.com (Murray Stokely)
Date: Mon, 1 Dec 2014 19:42:10 -0800
Subject: [Rprotobuf-commits] r926 - in pkg: . inst man
In-Reply-To: <20141202033947.B597D184C54@r-forge.r-project.org>
References: <20141202033947.B597D184C54@r-forge.r-project.org>
Message-ID: <CADiBzwRHj6=Dgd8eWrQovgZO+BPT5MJr3N1VQrAW6yFhOdt-dw@mail.gmail.com>

Thanks, Dirk!

- Murray


On Mon, Dec 1, 2014 at 7:39 PM, <noreply at r-forge.r-project.org> wrote:

> Author: edd
> Date: 2014-12-02 04:39:46 +0100 (Tue, 02 Dec 2014)
> New Revision: 926
>
> Modified:
>    pkg/ChangeLog
>    pkg/inst/NEWS.Rd
>    pkg/man/Descriptor-class.Rd
>    pkg/man/EnumDescriptor-class.Rd
>    pkg/man/Message-class.Rd
> Log:
> minor fixes for documentation
>
>
> Modified: pkg/ChangeLog
> ===================================================================
> --- pkg/ChangeLog       2014-12-02 00:40:56 UTC (rev 925)
> +++ pkg/ChangeLog       2014-12-02 03:39:46 UTC (rev 926)
> @@ -1,3 +1,9 @@
> +2014-12-01  Dirk Eddelbuettel  <edd at debian.org>
> +
> +       * man/Message-class.Rd: Completed documentation
> +       * man/Descriptor-class.Rd: Ditto
> +       * man/EnumDescriptor-class.Rd: Ditto
> +
>  2014-12-01  Murray Stokely  <mstokely at google.com>
>
>         * R/wrapper_ZeroCopyInputStream.R: Avoid file.create for getting
>
> Modified: pkg/inst/NEWS.Rd
> ===================================================================
> --- pkg/inst/NEWS.Rd    2014-12-02 00:40:56 UTC (rev 925)
> +++ pkg/inst/NEWS.Rd    2014-12-02 03:39:46 UTC (rev 926)
> @@ -26,6 +26,7 @@
>      \item Use \code{normalizePath} instead of creating a temporary
>      file with \code{file.create} when getting absolute path names.
>      \item Add unit tests for all of the above.
> +  }
>  }
>
>  \section{Changes in RProtoBuf version 0.4.1 (2014-03-25)}{
>
> Modified: pkg/man/Descriptor-class.Rd
> ===================================================================
> --- pkg/man/Descriptor-class.Rd 2014-12-02 00:40:56 UTC (rev 925)
> +++ pkg/man/Descriptor-class.Rd 2014-12-02 03:39:46 UTC (rev 926)
> @@ -15,6 +15,9 @@
>  \alias{field,Descriptor-method}
>  \alias{nested_type,Descriptor-method}
>  \alias{enum_type,Descriptor,ANY,ANY-method}
> +\alias{[[,Descriptor-method}
> +\alias{names,Descriptor-method}
> +\alias{length,Descriptor-method}
>
>  \title{Class "Descriptor" }
>  \description{ full descriptive information about a protocol buffer
> @@ -81,6 +84,9 @@
>                         If \code{name} is used, the enum type will be
> retrieved
>                         using its name, with the \code{FindEnumTypeByName}
> C++ method
>         }
> +    \item{[[}{\code{signature(x = "Descriptor")}: extracts a field
> identified by its name or declared tag number}
> +    \item{names}{\code{signature(x = "Descriptor")} : extracts names of
> this descriptor}
> +    \item{length}{\code{signature(x = "Descriptor")} : extracts length of
> this descriptor}
>
>    }
>  }
>
> Modified: pkg/man/EnumDescriptor-class.Rd
> ===================================================================
> --- pkg/man/EnumDescriptor-class.Rd     2014-12-02 00:40:56 UTC (rev 925)
> +++ pkg/man/EnumDescriptor-class.Rd     2014-12-02 03:39:46 UTC (rev 926)
> @@ -18,6 +18,9 @@
>  \alias{value-methods}
>  \alias{value,EnumDescriptor-method}
>
> +\alias{[[,EnumDescriptor-method}
> +\alias{names,EnumDescriptor-method}
> +
>  \title{Class "EnumDescriptor"  }
>  \description{ R representation of an enum descriptor. This
>  is a thin wrapper around the \code{EnumDescriptor} c++ class. }
> @@ -60,6 +63,8 @@
>        using the name of the constant, using the \code{FindValueByName}
>        C++ method.
>      }
> +    \item{[[}{\code{signature(x = "EnumDescriptor")}: extracts field
> identified by its name or declared tag number}
> +    \item{names}{\code{signature(x = "EnumDescriptor")} : extracts names
> of this enum}
>    }
>
>  }
>
> Modified: pkg/man/Message-class.Rd
> ===================================================================
> --- pkg/man/Message-class.Rd    2014-12-02 00:40:56 UTC (rev 925)
> +++ pkg/man/Message-class.Rd    2014-12-02 03:39:46 UTC (rev 926)
> @@ -11,6 +11,7 @@
>  \alias{show,Message-method}
>  \alias{update,Message-method}
>  \alias{length,Message-method}
> +\alias{names,Message-method}
>  \alias{str,Message-method}
>  \alias{toString,Message-method}
>  \alias{identical,Message,Message-method}
> @@ -68,7 +69,9 @@
>      \item{==}{\code{signature(e1 = "Message", e2 = "Message")}: Same as
> \code{identical} }
>      \item{!=}{\code{signature(e1 = "Message", e2 = "Message")}: Negation
> of \code{identical} }
>      \item{all.equal}{\code{signature(e1 = "Message", e2 = "Message")}:
> Test near equality }
> -     }
> +    \item{names}{\code{signature(x = "Message")}: extracts the names of
> the message. }
> +
> +  }
>  }
>  \references{
>         The \code{Message} class from the C++ proto library.
>
> _______________________________________________
> Rprotobuf-commits mailing list
> Rprotobuf-commits at lists.r-forge.r-project.org
>
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rprotobuf-commits
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rprotobuf-commits/attachments/20141201/1ccab2d1/attachment-0001.html>

From noreply at r-forge.r-project.org  Wed Dec  3 20:43:16 2014
From: noreply at r-forge.r-project.org (noreply at r-forge.r-project.org)
Date: Wed,  3 Dec 2014 20:43:16 +0100 (CET)
Subject: [Rprotobuf-commits] r927 - papers/jss
Message-ID: <20141203194316.7CE3D18444D@r-forge.r-project.org>

Author: murray
Date: 2014-12-03 20:43:16 +0100 (Wed, 03 Dec 2014)
New Revision: 927

Modified:
   papers/jss/article.Rnw
Log:
Improve the plot and point out 3 outliers now and explain them in the
text.  Correct an error in the space savings definition.  Change trivial example to simple example.

Suggestions from: Andy Chu


Modified: papers/jss/article.Rnw
===================================================================
--- papers/jss/article.Rnw	2014-12-02 03:39:46 UTC (rev 926)
+++ papers/jss/article.Rnw	2014-12-03 19:43:16 UTC (rev 927)
@@ -972,20 +972,20 @@
 clean.df<-rbind(clean.df, all.df)
 @
 
-Figure~\ref{fig:compression} shows the space savings $\left(1 - \frac{\textrm{Uncompressed Size}}{\textrm{Compressed Size}}\right)$ for each of the data sets using each of these four methods.  The associated table shows the exact data sizes for two outliers and the aggregate of all \Sexpr{n} data sets.
+Figure~\ref{fig:compression} shows the space savings $\left(1 - \frac{\textrm{Compressed Size}}{\textrm{Uncompressed Size}}\right)$ for each of the data sets using each of these four methods.  The associated table shows the exact data sizes for some outliers and the aggregate of all \Sexpr{n} data sets.
 Note that Protocol Buffer serialization results in slightly
-smaller byte streams compared to native \proglang{R} serialization in most cases,
-but this difference disappears if the results are compressed with gzip.
+smaller byte streams compared to native \proglang{R} serialization in most cases (red dots),
+but this difference disappears if the results are compressed with gzip (blue triangles).
 %Sizes are comparable but Protocol Buffers provide simple getters and setters
 %in multiple languages instead of requiring other programs to parse the \proglang{R}
 %serialization format. % \citep{serialization}.
 
 The \code{crimtab} dataset of anthropometry measurements of British
-prisoners \citep{garson1900metric}
-shows the greatest difference in the space savings when
+prisoners \citep{garson1900metric} and the \code{airquality} dataset of air quality measurements in New York show the
+greatest difference in the space savings when
 using Protocol Buffers compared to \proglang{R} native serialization.
-This dataset is a 42x22 table of integers, most equal to 0.  Small
-integer values like this can be very efficiently encoded by the
+The \code{crimtab} dataset is a 42x22 table of integers, most equal to 0, and the \code{airquality} dataset is a data frame of 154 observations of 1 numeric and 5 integer variables.  In both data sets, the large number of small
+integer values can be very efficiently encoded by the
 \emph{Varint} integer encoding scheme used by Protocol Buffers which
 use a variable number of bytes for each value.
 
@@ -1008,10 +1008,16 @@
 application-specific schema has been defined.  The example in the next
 section satisfies both of these conditions.
 
-\begin{figure}[t!]
+\begin{figure}[hbt!]
 \begin{center}
-<<echo=FALSE,fig=TRUE,width=8,height=4>>=
-plot(clean.df$savings.serialized, clean.df$savings.rprotobuf, pch=1, col="red", las=1, xlab="Serialization Space Savings", ylab="Protocol Buffer Space Savings")
+<<label=SER,echo=FALSE,include=FALSE,fig=TRUE>>=
+old.mar<-par("mar")
+new.mar<-old.mar
+new.mar[3]<-0
+new.mar[4]<-0
+my.cex<-1.3
+par("mar"=new.mar)
+plot(clean.df$savings.serialized, clean.df$savings.rprotobuf, pch=1, col="red", las=1, xlab="Serialization Space Savings", ylab="Protocol Buffer Space Savings", xlim=c(0,1),ylim=c(0,1),cex.lab=my.cex, cex.axis=my.cex)
 points(clean.df$savings.serialized.gz, clean.df$savings.rprotobuf.gz,pch=2, col="blue")
 # grey dotted diagonal
 abline(a=0,b=1, col="grey",lty=2,lwd=3)
@@ -1023,17 +1029,27 @@
 # The one to label.
 tmp.df <- clean.df[which(clean.df$savings.diff == min(clean.df$savings.diff)),]
 # This minimum means most to the left of our line, so pos=2 is label to the left
-text(tmp.df$savings.serialized, tmp.df$savings.rprotobuf, labels=tmp.df$dataset, pos=2)
-text(tmp.df$savings.serialized.gz, tmp.df$savings.rprotobuf.gz, labels=tmp.df$dataset, pos=2)
+text(tmp.df$savings.serialized, tmp.df$savings.rprotobuf, labels=tmp.df$dataset, pos=2, cex=my.cex)
 
+# Some gziped version
+# text(tmp.df$savings.serialized.gz, tmp.df$savings.rprotobuf.gz, labels=tmp.df$dataset, pos=2, cex=my.cex)
+
+# Second one is also an outlier
+tmp.df <- clean.df[which(clean.df$savings.diff == sort(clean.df$savings.diff)[2]),]
+# This minimum means most to the left of our line, so pos=2 is label to the left
+text(tmp.df$savings.serialized, tmp.df$savings.rprotobuf, labels=tmp.df$dataset, pos=2, cex=my.cex)
+#text(tmp.df$savings.serialized.gz, tmp.df$savings.rprotobuf.gz, labels=tmp.df$dataset, pos=my.cex)
+
+
 tmp.df <- clean.df[which(clean.df$savings.diff == max(clean.df$savings.diff)),]
 # This minimum means most to the right of the diagonal, so pos=4 is label to the right
-text(tmp.df$savings.serialized, tmp.df$savings.rprotobuf, labels=tmp.df$dataset, pos=4)
-text(tmp.df$savings.serialized.gz, tmp.df$savings.rprotobuf.gz, labels=tmp.df$dataset, pos=4)
+# Only show the gziped one.
+#text(tmp.df$savings.serialized, tmp.df$savings.rprotobuf, labels=tmp.df$dataset, pos=4, cex=my.cex)
+text(tmp.df$savings.serialized.gz, tmp.df$savings.rprotobuf.gz, labels=tmp.df$dataset, pos=4, cex=my.cex)
 
 #outlier.dfs <- clean.df[c(which(clean.df$savings.diff == min(clean.df$savings.diff)),
 
-legend("topleft", c("Raw", "Gzip Compressed"), pch=1:2, col=c("red", "blue"))
+legend("topleft", c("Raw", "Gzip Compressed"), pch=1:2, col=c("red", "blue"), cex=my.cex)
 
 interesting.df <- clean.df[unique(c(which(clean.df$savings.diff == min(clean.df$savings.diff)),
                              which(clean.df$savings.diff == max(clean.df$savings.diff)),
@@ -1041,7 +1057,9 @@
 			     which(clean.df$dataset == "TOTAL"))),c("dataset", "object.size", "serialized", "gzipped serialized", "RProtoBuf", "gzipped RProtoBuf", "savings.serialized", "savings.serialized.gz", "savings.rprotobuf", "savings.rprotobuf.gz")]
 # Print without .00 in xtable
 interesting.df$object.size <- as.integer(interesting.df$object.size)
+par("mar"=old.mar)
 @
+\includegraphics[width=0.45\textwidth]{figures/fig-SER}
 
 % latex table generated in R 3.0.2 by xtable 1.7-0 package
 % Wed Nov 26 15:31:30 2014
@@ -1054,13 +1072,14 @@
   & & default & gzipped & default & gzipped \\
   \cmidrule(r){2-6}
  crimtab & 7,936 & 4,641 (41.5\%) & 713 (91.0\%) & 1,655 (79.2\%) & 576 (92.7\%)\\
+ airquality & 5,496 & 4,551 (17.2\%) & 1,241 (77.4\%) & 2,874 (47.7\%) & 1,294 (76.5\%)\\
  faithful & 5,136 & 4,543 (11.5\%) & 1,339 (73.9\%) & 4,936 (3.9\%) & 1,776 (65.5\%)\\
    \hline
  All & 605,256 & 461,667 (24\%) & 138,937 (77\%) & 435,360 (28\%) & 142,134 (77\%)\\
 \hline
 \end{tabular}
 \end{center}
-\caption{(Top) Relative space savings of Protocol Buffers and native \proglang{R} serialization over the raw object sizes of each of the \Sexpr{n} data sets in the \pkg{datasets} package. Points to the left of the dashed $y=x$ line represent datasets that are more efficiently encoded with Protocol Buffers. (Bottom) Absolute space savings of two outlier datasets and the aggregate performance of all datasets.}
+\caption{(Top) Relative space savings of Protocol Buffers and native \proglang{R} serialization over the raw object sizes of each of the \Sexpr{n} data sets in the \pkg{datasets} package. Points to the left of the dashed $y=x$ line represent datasets that are more efficiently encoded with Protocol Buffers. (Bottom) Absolute space savings of three outlier datasets and the aggregate performance of all datasets.}
 \label{fig:compression}
 \end{figure}
 
@@ -1135,7 +1154,7 @@
 written in other languages and only the resulting output histograms
 need to be manipulated in \proglang{R}.
 
-\subsection*{A trivial single-machine example for Python to R serialization}
+\subsection*{A simple single-machine example for Python to R serialization}
 
 To create HistogramState
 messages in Python for later consumption by \proglang{R}, we first compile the 


From noreply at r-forge.r-project.org  Wed Dec  3 23:09:22 2014
From: noreply at r-forge.r-project.org (noreply at r-forge.r-project.org)
Date: Wed,  3 Dec 2014 23:09:22 +0100 (CET)
Subject: [Rprotobuf-commits] r928 - /
Message-ID: <20141203220922.C44551810FB@r-forge.r-project.org>

Author: jeroenooms
Date: 2014-12-03 23:09:22 +0100 (Wed, 03 Dec 2014)
New Revision: 928

Added:
   .travis.yml
Log:
Add travis file.

Added: .travis.yml
===================================================================
--- .travis.yml	                        (rev 0)
+++ .travis.yml	2014-12-03 22:09:22 UTC (rev 928)
@@ -0,0 +1,24 @@
+# Sample .travis.yml for R projects.
+#
+# See README.md for instructions, or for more configuration options,
+# see the wiki:
+#   https://github.com/craigcitro/r-travis/wiki
+
+language: c
+
+before_install:
+  - sudo apt-get install libprotobuf-dev libprotoc-dev
+  - curl -OL http://raw.github.com/craigcitro/r-travis/master/scripts/travis-tool.sh
+  - chmod 755 ./travis-tool.sh
+  - ./travis-tool.sh bootstrap
+install:
+  - ./travis-tool.sh install_deps
+script: ./travis-tool.sh run_tests
+
+after_failure:
+  - ./travis-tool.sh dump_logs
+
+notifications:
+  email:
+    on_success: change
+    on_failure: change


From noreply at r-forge.r-project.org  Thu Dec  4 02:45:57 2014
From: noreply at r-forge.r-project.org (noreply at r-forge.r-project.org)
Date: Thu,  4 Dec 2014 02:45:57 +0100 (CET)
Subject: [Rprotobuf-commits] r929 - in pkg: . inst/unitTests
Message-ID: <20141204014557.A1023185D50@r-forge.r-project.org>

Author: murray
Date: 2014-12-04 02:45:57 +0100 (Thu, 04 Dec 2014)
New Revision: 929

Modified:
   pkg/ChangeLog
   pkg/inst/unitTests/runit.int64.R
Log:
Save the options and restore them on.exit to make this test indempotent.  This might be responsible for some
unit test failures if R CMD CHECK now runs the same testSuite twice, which I don't think it does.


Modified: pkg/ChangeLog
===================================================================
--- pkg/ChangeLog	2014-12-03 22:09:22 UTC (rev 928)
+++ pkg/ChangeLog	2014-12-04 01:45:57 UTC (rev 929)
@@ -1,3 +1,8 @@
+2014-12-04  Murray Stokely  <mstokely at google.com>
+
+	* inst/unitTests/runit.int64.R: restore options on exit from this
+	  function to make the test indempotent.
+
 2014-12-01  Dirk Eddelbuettel  <edd at debian.org>
 
 	* man/Message-class.Rd: Completed documentation

Modified: pkg/inst/unitTests/runit.int64.R
===================================================================
--- pkg/inst/unitTests/runit.int64.R	2014-12-03 22:09:22 UTC (rev 928)
+++ pkg/inst/unitTests/runit.int64.R	2014-12-04 01:45:57 UTC (rev 929)
@@ -15,6 +15,10 @@
 # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
 
 test.int64 <- function() {
+    # Preserve option.
+    old.optval <- options("RProtoBuf.int64AsString")
+    on.exit(options(old.optval))
+
     if (!exists("protobuf_unittest.TestAllTypes",
                 "RProtoBuf:DescriptorPool")) {
         unittest.proto.file <- system.file("unitTests", "data",


From noreply at r-forge.r-project.org  Mon Dec 15 02:10:08 2014
From: noreply at r-forge.r-project.org (noreply at r-forge.r-project.org)
Date: Mon, 15 Dec 2014 02:10:08 +0100 (CET)
Subject: [Rprotobuf-commits] r930 - papers/jss
Message-ID: <20141215011008.3385A1865E6@r-forge.r-project.org>

Author: edd
Date: 2014-12-15 02:10:07 +0100 (Mon, 15 Dec 2014)
New Revision: 930

Modified:
   papers/jss/article.R
   papers/jss/article.Rnw
Log:
one "isn't" replaced with "is not"; one sentence reworked


Modified: papers/jss/article.R
===================================================================
--- papers/jss/article.R	2014-12-04 01:45:57 UTC (rev 929)
+++ papers/jss/article.R	2014-12-15 01:10:07 UTC (rev 930)
@@ -1,7 +1,7 @@
-### R code from vignette source 'article.Rnw'
+### R code from vignette source '/home/edd/svn/rprotobuf/papers/jss/article.Rnw'
 
 ###################################################
-### code chunk number 1: article.Rnw:125-131
+### code chunk number 1: article.Rnw:130-136
 ###################################################
 ## cf http://www.jstatsoft.org/style#q12
 options(prompt = "R> ", 
@@ -12,7 +12,7 @@
 
 
 ###################################################
-### code chunk number 2: article.Rnw:313-321
+### code chunk number 2: article.Rnw:318-326
 ###################################################
 library("RProtoBuf")
 p <- new(tutorial.Person, id=1,
@@ -25,20 +25,13 @@
 
 
 ###################################################
-### code chunk number 3: article.Rnw:376-377
+### code chunk number 3: article.Rnw:421-422
 ###################################################
-ls("RProtoBuf:DescriptorPool")
-
-
-###################################################
-### code chunk number 4: article.Rnw:391-393
-###################################################
-p1 <- new(tutorial.Person)
 p <- new(tutorial.Person, name = "Murray", id = 1)
 
 
 ###################################################
-### code chunk number 5: article.Rnw:402-405
+### code chunk number 4: article.Rnw:431-434
 ###################################################
 p$name
 p$id
@@ -46,7 +39,7 @@
 
 
 ###################################################
-### code chunk number 6: article.Rnw:413-416
+### code chunk number 5: article.Rnw:442-445
 ###################################################
 p[["name"]] <- "Murray Stokely"
 p[[ 2 ]] <- 3
@@ -54,25 +47,25 @@
 
 
 ###################################################
-### code chunk number 7: article.Rnw:429-430
+### code chunk number 6: article.Rnw:461-462
 ###################################################
 p
 
 
 ###################################################
-### code chunk number 8: article.Rnw:437-438
+### code chunk number 7: article.Rnw:469-470
 ###################################################
 writeLines(as.character(p))
 
 
 ###################################################
-### code chunk number 9: article.Rnw:451-452
+### code chunk number 8: article.Rnw:483-484
 ###################################################
 serialize(p, NULL)
 
 
 ###################################################
-### code chunk number 10: article.Rnw:457-460
+### code chunk number 9: article.Rnw:489-492
 ###################################################
 tf1 <- tempfile()
 serialize(p, tf1)
@@ -80,92 +73,42 @@
 
 
 ###################################################
-### code chunk number 11: article.Rnw:465-470
+### code chunk number 10: article.Rnw:538-540
 ###################################################
-tf2 <- tempfile()
-con <- file(tf2, open = "wb")
-serialize(p, con)
-close(con)
-readBin(tf2, raw(0), 500)
-
-
-###################################################
-### code chunk number 12: article.Rnw:476-480
-###################################################
-p$serialize(tf1)
-con <- file(tf2, open = "wb")
-p$serialize(con)
-close(con)
-
-
-###################################################
-### code chunk number 13: article.Rnw:500-502
-###################################################
 msg <- read(tutorial.Person, tf1)
 writeLines(as.character(msg))
 
 
 ###################################################
-### code chunk number 14: article.Rnw:508-512
+### code chunk number 11: article.Rnw:660-661
 ###################################################
-con <- file(tf2, open = "rb")
-message <- read(tutorial.Person, con)
-close(con)
-writeLines(as.character(message))
-
-
-###################################################
-### code chunk number 15: article.Rnw:517-519
-###################################################
-payload <- readBin(tf1, raw(0), 5000)
-message <- read(tutorial.Person, payload)
-
-
-###################################################
-### code chunk number 16: article.Rnw:526-531
-###################################################
-message <- tutorial.Person$read(tf1)
-con <- file(tf2, open = "rb")
-message <- tutorial.Person$read(con)
-close(con)
-message <- tutorial.Person$read(payload)
-
-
-###################################################
-### code chunk number 17: article.Rnw:610-611
-###################################################
 new(tutorial.Person)
 
 
 ###################################################
-### code chunk number 18: article.Rnw:675-682
+### code chunk number 12: article.Rnw:685-690
 ###################################################
 tutorial.Person$email 
+tutorial.Person$email$is_required()
+tutorial.Person$email$type()
+tutorial.Person$email$as.character()
+class(tutorial.Person$email)
 
-tutorial.Person$PhoneType 
 
-tutorial.Person$PhoneNumber 
-
-tutorial.Person.PhoneNumber
-
-
 ###################################################
-### code chunk number 19: article.Rnw:798-800
+### code chunk number 13: article.Rnw:702-709
 ###################################################
 tutorial.Person$PhoneType
 tutorial.Person$PhoneType$WORK
-
-
-###################################################
-### code chunk number 20: article.Rnw:849-852
-###################################################
+class(tutorial.Person$PhoneType)
 tutorial.Person$PhoneType$value(1)
 tutorial.Person$PhoneType$value(name="HOME")
 tutorial.Person$PhoneType$value(number=1)
+class(tutorial.Person$PhoneType$value(1))
 
 
 ###################################################
-### code chunk number 21: article.Rnw:921-924
+### code chunk number 14: article.Rnw:719-722
 ###################################################
 f <- tutorial.Person$fileDescriptor()
 f
@@ -173,7 +116,7 @@
 
 
 ###################################################
-### code chunk number 22: article.Rnw:987-990
+### code chunk number 15: article.Rnw:785-788
 ###################################################
 if (!exists("JSSPaper.Example1", "RProtoBuf:DescriptorPool")) {
     readProtoFiles(file="int64.proto")
@@ -181,7 +124,7 @@
 
 
 ###################################################
-### code chunk number 23: article.Rnw:1012-1016
+### code chunk number 16: article.Rnw:810-814
 ###################################################
 as.integer(2^31-1)
 as.integer(2^31 - 1) + as.integer(1)
@@ -190,20 +133,20 @@
 
 
 ###################################################
-### code chunk number 24: article.Rnw:1028-1029
+### code chunk number 17: article.Rnw:826-827
 ###################################################
 2^53 == (2^53 + 1)
 
 
 ###################################################
-### code chunk number 25: article.Rnw:1080-1082
+### code chunk number 18: article.Rnw:878-880
 ###################################################
 msg <- serialize_pb(iris, NULL)
 identical(iris, unserialize_pb(msg))
 
 
 ###################################################
-### code chunk number 26: article.Rnw:1113-1116
+### code chunk number 19: article.Rnw:908-911
 ###################################################
 datasets <- as.data.frame(data(package="datasets")$results)
 datasets$name <- sub("\\s+.*$", "", datasets$Item)
@@ -211,26 +154,8 @@
 
 
 ###################################################
-### code chunk number 27: article.Rnw:1126-1127
+### code chunk number 20: article.Rnw:929-972
 ###################################################
-m <- sum(sapply(datasets$name, function(x) can_serialize_pb(get(x))))
-
-
-###################################################
-### code chunk number 28: article.Rnw:1140-1147
-###################################################
-attr(CO2, "formula")
-msg <- serialize_pb(CO2, NULL)
-object <- unserialize_pb(msg)
-identical(CO2, object)
-identical(class(CO2), class(object))
-identical(dim(CO2), dim(object))
-attr(object, "formula")
-
-
-###################################################
-### code chunk number 29: article.Rnw:1163-1182
-###################################################
 datasets$object.size <- unname(sapply(datasets$name, function(x) object.size(eval(as.name(x)))))
 
 datasets$R.serialize.size <- unname(sapply(datasets$name, function(x) length(serialize(eval(as.name(x)), NULL))))
@@ -249,42 +174,117 @@
                        "gzipped serialized"=datasets$R.serialize.size.gz,
                        "RProtoBuf"=datasets$RProtoBuf.serialize.size,
                        "gzipped RProtoBuf"=datasets$RProtoBuf.serialize.size.gz,
+		       "ratio.serialized" = datasets$R.serialize.size / datasets$object.size,
+		       "ratio.rprotobuf" = datasets$RProtoBuf.serialize.size / datasets$object.size,
+		       "ratio.serialized.gz" = datasets$R.serialize.size.gz / datasets$object.size,
+		       "ratio.rprotobuf.gz" = datasets$RProtoBuf.serialize.size.gz / datasets$object.size,
+		       "savings.serialized" = 1-(datasets$R.serialize.size / datasets$object.size),
+		       "savings.rprotobuf" = 1-(datasets$RProtoBuf.serialize.size / datasets$object.size),
+		       "savings.serialized.gz" = 1-(datasets$R.serialize.size.gz / datasets$object.size),
+		       "savings.rprotobuf.gz" = 1-(datasets$RProtoBuf.serialize.size.gz / datasets$object.size),
                        check.names=FALSE)
 
+all.df<-data.frame(dataset="TOTAL", object.size=sum(datasets$object.size),
+				    "serialized"=sum(datasets$R.serialize.size),
+                       "gzipped serialized"=sum(datasets$R.serialize.size.gz),
+                       "RProtoBuf"=sum(datasets$RProtoBuf.serialize.size),
+                       "gzipped RProtoBuf"=sum(datasets$RProtoBuf.serialize.size.gz),
+		       "ratio.serialized" = sum(datasets$R.serialize.size) / sum(datasets$object.size),
+		       "ratio.rprotobuf" = sum(datasets$RProtoBuf.serialize.size) / sum(datasets$object.size),
+		       "ratio.serialized.gz" = sum(datasets$R.serialize.size.gz) / sum(datasets$object.size),
+		       "ratio.rprotobuf.gz" = sum(datasets$RProtoBuf.serialize.size.gz) / sum(datasets$object.size),
+		       "savings.serialized" = 1-(sum(datasets$R.serialize.size) / sum(datasets$object.size)),
+		       "savings.rprotobuf" = 1-(sum(datasets$RProtoBuf.serialize.size) / sum(datasets$object.size)),
+		       "savings.serialized.gz" = 1-(sum(datasets$R.serialize.size.gz) / sum(datasets$object.size)),
+		       "savings.rprotobuf.gz" = 1-(sum(datasets$RProtoBuf.serialize.size.gz) / sum(datasets$object.size)),
+                       check.names=FALSE)
+clean.df<-rbind(clean.df, all.df)
 
+
 ###################################################
-### code chunk number 30: article.Rnw:1390-1395
+### code chunk number 21: SER
 ###################################################
-require(RProtoBuf)
+old.mar<-par("mar")
+new.mar<-old.mar
+new.mar[3]<-0
+new.mar[4]<-0
+my.cex<-1.3
+par("mar"=new.mar)
+plot(clean.df$savings.serialized, clean.df$savings.rprotobuf, pch=1, col="red", las=1, xlab="Serialization Space Savings", ylab="Protocol Buffer Space Savings", xlim=c(0,1),ylim=c(0,1),cex.lab=my.cex, cex.axis=my.cex)
+points(clean.df$savings.serialized.gz, clean.df$savings.rprotobuf.gz,pch=2, col="blue")
+# grey dotted diagonal
+abline(a=0,b=1, col="grey",lty=2,lwd=3)
+
+# find point furthest off the X axis.
+clean.df$savings.diff <- clean.df$savings.serialized - clean.df$savings.rprotobuf
+clean.df$savings.diff.gz <- clean.df$savings.serialized.gz - clean.df$savings.rprotobuf.gz
+
+# The one to label.
+tmp.df <- clean.df[which(clean.df$savings.diff == min(clean.df$savings.diff)),]
+# This minimum means most to the left of our line, so pos=2 is label to the left
+text(tmp.df$savings.serialized, tmp.df$savings.rprotobuf, labels=tmp.df$dataset, pos=2, cex=my.cex)
+
+# Some gziped version
+# text(tmp.df$savings.serialized.gz, tmp.df$savings.rprotobuf.gz, labels=tmp.df$dataset, pos=2, cex=my.cex)
+
+# Second one is also an outlier
+tmp.df <- clean.df[which(clean.df$savings.diff == sort(clean.df$savings.diff)[2]),]
+# This minimum means most to the left of our line, so pos=2 is label to the left
+text(tmp.df$savings.serialized, tmp.df$savings.rprotobuf, labels=tmp.df$dataset, pos=2, cex=my.cex)
+#text(tmp.df$savings.serialized.gz, tmp.df$savings.rprotobuf.gz, labels=tmp.df$dataset, pos=my.cex)
+
+
+tmp.df <- clean.df[which(clean.df$savings.diff == max(clean.df$savings.diff)),]
+# This minimum means most to the right of the diagonal, so pos=4 is label to the right
+# Only show the gziped one.
+#text(tmp.df$savings.serialized, tmp.df$savings.rprotobuf, labels=tmp.df$dataset, pos=4, cex=my.cex)
+text(tmp.df$savings.serialized.gz, tmp.df$savings.rprotobuf.gz, labels=tmp.df$dataset, pos=4, cex=my.cex)
+
+#outlier.dfs <- clean.df[c(which(clean.df$savings.diff == min(clean.df$savings.diff)),
+
+legend("topleft", c("Raw", "Gzip Compressed"), pch=1:2, col=c("red", "blue"), cex=my.cex)
+
+interesting.df <- clean.df[unique(c(which(clean.df$savings.diff == min(clean.df$savings.diff)),
+                             which(clean.df$savings.diff == max(clean.df$savings.diff)),
+                             which(clean.df$savings.diff.gz == max(clean.df$savings.diff.gz)),
+			     which(clean.df$dataset == "TOTAL"))),c("dataset", "object.size", "serialized", "gzipped serialized", "RProtoBuf", "gzipped RProtoBuf", "savings.serialized", "savings.serialized.gz", "savings.rprotobuf", "savings.rprotobuf.gz")]
+# Print without .00 in xtable
+interesting.df$object.size <- as.integer(interesting.df$object.size)
+par("mar"=old.mar)
+
+
+###################################################
+### code chunk number 22: article.Rnw:1211-1215
+###################################################
 require(HistogramTools)
 readProtoFiles(package="HistogramTools")
 hist <- HistogramTools.HistogramState$read("hist.pb")
-plot(as.histogram(hist))
+plot(as.histogram(hist), main="")
 
 
 ###################################################
-### code chunk number 31: article.Rnw:1463-1470 (eval = FALSE)
+### code chunk number 23: article.Rnw:1303-1310 (eval = FALSE)
 ###################################################
 ## library("RProtoBuf")
 ## library("httr")
 ## 
-## req <- GET('https://public.opencpu.org/ocpu/library/MASS/data/Animals/pb')
+## req <- GET('https://demo.ocpu.io/MASS/data/Animals/pb')
 ## output <- unserialize_pb(req$content)
 ## 
 ## identical(output, MASS::Animals)
 
 
 ###################################################
-### code chunk number 32: article.Rnw:1529-1545 (eval = FALSE)
+### code chunk number 24: article.Rnw:1360-1376 (eval = FALSE)
 ###################################################
-## library("httr")       
+## library("httr")
 ## library("RProtoBuf")
 ## 
 ## args <- list(n=42, mean=100)
 ## payload <- serialize_pb(args, NULL)
 ## 
 ## req <- POST (
-##   url = "https://public.opencpu.org/ocpu/library/stats/R/rnorm/pb",
+##   url = "https://demo.ocpu.io/stats/R/rnorm/pb",
 ##   body = payload,
 ##   add_headers (
 ##     "Content-Type" = "application/x-protobuf"
@@ -296,7 +296,7 @@
 
 
 ###################################################
-### code chunk number 33: article.Rnw:1549-1552 (eval = FALSE)
+### code chunk number 25: article.Rnw:1380-1383 (eval = FALSE)
 ###################################################
 ## fnargs <- unserialize_pb(inputmsg)
 ## val <- do.call(stats::rnorm, fnargs)

Modified: papers/jss/article.Rnw
===================================================================
--- papers/jss/article.Rnw	2014-12-04 01:45:57 UTC (rev 929)
+++ papers/jss/article.Rnw	2014-12-15 01:10:07 UTC (rev 930)
@@ -233,8 +233,8 @@
 \label{sec:protobuf}
 
 Protocol Buffers are a modern, language-neutral, platform-neutral,
-extensible mechanism for sharing and storing structured data.  Key
-features provided by Protocol Buffers for data analysis include:
+extensible mechanism for sharing and storing structured data. Some of their
+features, particularly in the context of data analysis, are:
 
 \begin{itemize}
 \item \emph{Portable}:  Enable users to send and receive data between
@@ -388,7 +388,7 @@
 parsed from \code{.proto} files and added to the global
 namespace.\footnote{Note that there is a significant performance
   overhead with this RObjectTable implementation.  Because the table
-  is on the search path and isn't cacheable, lookups of symbols that
+  is on the search path and is not cacheable, lookups of symbols that
   are behind it in the search path cannot be added to the global object
   cache, and R must perform an expensive lookup through all of the
   attached environments and the protocol buffer definitions to find common


From noreply at r-forge.r-project.org  Mon Dec 15 04:01:41 2014
From: noreply at r-forge.r-project.org (noreply at r-forge.r-project.org)
Date: Mon, 15 Dec 2014 04:01:41 +0100 (CET)
Subject: [Rprotobuf-commits] r931 - papers/jss
Message-ID: <20141215030142.166D7184817@r-forge.r-project.org>

Author: edd
Date: 2014-12-15 04:01:40 +0100 (Mon, 15 Dec 2014)
New Revision: 931

Added:
   papers/jss/JSS_1313_comments.txt
   papers/jss/response-to-reviewers.tex
Log:
added referee report; 
started a point-by-point reply --- which will need a lot of work still


Added: papers/jss/JSS_1313_comments.txt
===================================================================
--- papers/jss/JSS_1313_comments.txt	                        (rev 0)
+++ papers/jss/JSS_1313_comments.txt	2014-12-15 03:01:40 UTC (rev 931)
@@ -0,0 +1,224 @@
+This submission is important, but needs some work on both the paper and
+the software before it can be accepted.  The authors should address the
+concerns of the two reviewers (below).
+
+<reviewer1>
+Overall, I think this is a strong paper. Cross-language communication
+is a challenging problem, and good solutions for R are important to
+establish R as a well-behaved member of a data analysis pipeline. The
+paper is well written, and I recommend that it be accepted subject to
+the suggestions below.
+
+# More big picture, less details
+
+Overall, I think the paper provides too much detail on relatively
+unimportant topics and not enough on the reasoning behind important
+design decisions. I think you could comfortably reduce the paper by
+5-10 pages, referring the interested reader to the documentation for
+more detail.
+
+I'd recommend shrinking section 3 to ~2 pages, and removing the
+subheadings. This section should quickly orient the reader to the
+RProtobuf API so they understand the big picture before learning more
+details in the subsequent sections. I'd recommend picking one OO style
+and sticking to it in this section - two is confusing.
+
+Section 4 dives into the details without giving a good overview and
+motivation. Why use S4 and not RC? How are the objects made mutable?
+Why do you provide both generic function and message passing OO
+styles? What does `$` do in this context? What the heck is a
+pseudo-method? Spend more time on those big issues rather than
+describing each class in detail. Reduce class descriptions to a
+bulleted list giving a high-level overview, then encourage the reader
+to refer to the documentation for further details. Similarly, Tables
+3-5 belong in the documentation, not in a vignette/paper.
+
+Section 7 is weak. I think the important message is that RProtobuf is
+being used in practice at large scale for for large data, and is
+useful for communicating between R and Python. How can you make that
+message stronger while avoiding (for the purposes of this paper) the
+relatively unimportant details of the map-reduce setup?
+
+# R <-> Protobuf translation
+
+The discussion of R <-> Protobuf could be improved. Table 9 would be
+much simpler if instead of Message, you provided a "vectorised"
+Messages class (this would also make the interface more consistent and
+hence the package easier to use).
+
+Along these lines, I think it would make sense to combine sections 5
+and 6 and discuss translation challenges in both direction
+simultaneously. At the minimum, add the equivalent for Table 9 that
+shows how important R classes are converted to their protobuf
+equivalents.
+
+You should discuss how missing values are handled for strings and
+integers, and why enums are not equivalent to factors. I think you
+could make explicit how coercion of factors, dates, times and matrices
+occurs, and the implications of this on sharing data structures
+between programming languages. For example, how do you share date/time
+data between R and python using RProtoBuf?
+
+Table 10 is dying to be a plot, and a natural companion would be to
+show how long it takes to serialise data frames using both RProtoBuf
+and R's native serialisation. Is there a performance penalty to using
+protobufs?
+
+# RObjectTables magic
+
+The use of RObjectTables magic makes me uneasy. It doesn't seem like a
+good fit for an infrastructure package and it's not clear what
+advantages it has over explicitly loading a protobuf definition into
+an object.
+
+Using global state makes understanding code much harder. In Table 1,
+it's not obvious where `tutorial.Person` comes from. Is it loaded by
+default by RProtobuf? This need some explanation. In Section 7, what
+does `readProtoFiles()` do? Why does `RProtobuf` need to be attached
+as well as `HistogramTools`? This needs more explanation, and a
+comment on the implications of this approach on CRAN packages and
+namespaces.
+
+I'd prefer you eliminate this magic from the magic, but failing that,
+you need a good explanation of why.
+
+# Code comments
+
+* Using `file.create()` to determine the absolute path seems like a bad
+idea.
+
+
+# Minor niggles
+
+* Don't refer to the message passing style of OO as traditional.
+
+* In Section 3.4, if messages isn't a vectorised class, the default
+   print method should use `cat()` to eliminate the confusing `[1]`.
+
+* The REXP definition would have been better defined using an enum that
+   matches R's SEXPTYPE "enum". But I guess that ship has sailed.
+
+* Why does `serialize_pb(CO2, NULL)` fail silently? Shouldn't it at least
+   warn that the serialization is partial?
+</reviewer1>
+
+
+???????????????????????????????????????????????????????
+???????????????????????????????????????????????????????
+
+
+<reviewer2>
+The paper gives an overview of the RProtoBuf package which implements an 
+R interface to the Protocol Buffers library for an efficient 
+serialization of objects. The paper is well written and easy to read. 
+Introductory code is clear and the package provides objects to play with 
+immediately without the need to jump through hoops, making it appealing. 
+The software implementation is executed well.
+
+There are, however, a few inconsistencies in the implementation and some 
+issues with specific sections in the paper. In the following both issues 
+will be addressed sequentially by their occurrence in the paper.
+
+
+p.4 illustrates the use of messages. The class implements list-like 
+access via [[ and $, but strangely names() return NULL and length() 
+doesn't correspond to the number of fields leading to startling results like
+
+ > p
+[1] "message of type 'tutorial.Person' with 2 fields set"
+ > length(p)
+[1] 2
+ > p[[3]]
+[1] ""
+
+The inconsistencies get even more bizarre with descriptors (p.9):
+
+ > tutorial.Person$email
+[1] "descriptor for field 'email' of type 'tutorial.Person' "
+ > tutorial.Person[["email"]]
+Error in tutorial.Person[["email"]] : this S4 class is not subsettable
+ > names(tutorial.Person)
+NULL
+ > length(tutorial.Person)
+[1] 1
+
+It appears that there is no way to find out the fields of a descriptor 
+directly (although the low-level object methods seem to be exposed as 
+$field_count() and $fields() - but that seems extremely cumbersome). 
+Again, implementing names() and subsetting may help here.
+
+Another inconsistency concerns the as.list() method which by design 
+coerces objects to lists (see ?as.list), but the implementation for 
+EnumDescriptor breaks that contract and returns a vector instead:
+
+ > is.list(as.list(tutorial.Person$PhoneType))
+[1] FALSE
+ > str(as.list(tutorial.Person$PhoneType))
+  Named int [1:3] 0 1 2
+  - attr(*, "names")= chr [1:3] "MOBILE" "HOME" "WORK"
+
+As with the other interfaces, names() returns NULL so it is again quite
+difficult to perform even simple operations such as finding out the 
+values. It may be natural use some of the standard methods like names(), 
+levels() or similar. As with the previous cases, the lack of [[ support
+makes it impossible to map named enum values to codes and vice-versa.
+
+In general, the package would benefit from one pass of checks to assess
+the consistency of the API. Since the authors intend direct interaction
+with the objects via basic standard R methods, the classes should behave 
+consistently.
+
+Finally, most classes implement coercion to characters, which is not 
+mentioned and is not quite intuitive for some objects. For example, one
+may think that as.character() on a file descriptor returns let's say the 
+filename, but we get:
+
+ > cat(as.character(tutorial.Person$fileDescriptor()))
+syntax = "proto2";
+
+package tutorial;
+
+option java_package = "com.example.tutorial";
+option java_outer_classname = "AddressBookProtos";
+[...]
+
+It is not necessary clear what java_package has to do with a file 
+descriptor in R. Depending on the intention here, it may be useful to 
+explain this feature.
+
+Other comments:
+
+p.17: "does not support ... function, language or environment. Such 
+objects have no native equivalent type in Protocol Buffers, and have 
+little meaning outside the context or R"
+That is certainly false. Native mirror of environments are hash tables - 
+a very useful type indeed. Language objects are just lists, so there is
+no reason to not include them - they can be useful to store expressions
+that may not be necessary specific to R. Further on p. 18 your run into
+the same problem that could be fixed so easily.
+
+The examples in sections 7 and 8 are somewhat weak. It does not seem 
+clear why one would wish to unleash the power of PB just to transfer 
+breaks and counts for plotting - even a simple ASCII file would do that
+just fine. The main point in the example is presumably that there are 
+code generation methods for Hadoop based on PB IDL such that Hadoop can
+be made aware of the data types, thus making a histogram a proper record 
+that won't be split, can be combined etc. -- yet that is not mentioned 
+nor a way presented how that can be leveraged in practice. The Python 
+example code simply uses a static example with constants to simulate the 
+output of a reducer so it doesn't illustrate the point - the reader is 
+left confused why something as trivial would require PB while a savvy 
+reader is not able to replicate the illustrated process. Possibly 
+explaining the benefits and providing more details on how one would 
+write such a job would make it much more relevant.
+
+Section 8 is not very well motivated. It is much easier to use other 
+formats for HTTP exchange - JSON is probably the most popular, but even
+CSV works in simple settings. PB is a much less common standard. The 
+main advantage of PB is the performance over the alternatives, but HTTP
+services are not necessarily known for their high-throughput so why one
+would sacrifice interoperability by using PB (they are still more hassle 
+and require special installations)? It would be useful if the reason 
+could be made explicit here or a better example chosen.
+</reviewer2>
+

Added: papers/jss/response-to-reviewers.tex
===================================================================
--- papers/jss/response-to-reviewers.tex	                        (rev 0)
+++ papers/jss/response-to-reviewers.tex	2014-12-15 03:01:40 UTC (rev 931)
@@ -0,0 +1,301 @@
+
+\documentclass[10pt]{article}
+\usepackage{url}
+\usepackage{vmargin}
+\setpapersize{USletter}
+% left top right bottom -- headheight headsep footheight footskop
+\setmarginsrb{1in}{1in}{1in}{0.5in}{0pt}{0mm}{10pt}{0.5in}
+\usepackage{charter}
+
+\setlength{\parskip}{1ex plus1ex minus1ex}
+\setlength{\parindent}{0pt}
+
+\newcommand{\proglang}[1]{\textsf{#1}}
+\newcommand{\pkg}[1]{{\fontseries{b}\selectfont #1}}
+
+\newcommand{\pointRaised}[2]{\smallskip %\hrule 
+  \textsl{{\fontseries{b}\selectfont #1}: #2}\newline}
+\newcommand{\simplePointRaised}[1]{\bigskip \hrule\textsl{#1} }
+\newcommand{\reply}[1]{\textbf{Reply}:\ #1 \smallskip } %\hrule \smallskip}
+
+\begin{document}
+
+\author{Dirk Eddelbuettel\\Debian Project \and 
+        Murray Stokely\\Google, Inc \and
+        Jeroen Ooms\\UCLA}
+\title{Submission JSS 1313: \\ Response to Reviewers' Comments}
+\maketitle 
+\thispagestyle{empty}
+
+Thank you for reviewing our manuscript, and for giving us an opportunity to
+rewrite, extend and and tighten both the paper and the underlying package.
+
+\smallskip
+We truly appreciate the comments and suggestions. Below, we have regrouped the sets
+of comments, and have provided detailed point-by-point replies.
+%
+We hope that this satisfies the request for changes necessary to proceed with
+the publication of the revised and updated manuscript, along with the revised
+and updated package (which was recently resubmitted to CRAN as version 0.4.2).
+
+\section*{Response to Reviewer \#1}
+
+\pointRaised{Comment 1}{Overall, I think this is a strong paper. Cross-language communication
+  is a challenging problem, and good solutions for R are important to
+  establish R as a well-behaved member of a data analysis pipeline. The
+  paper is well written, and I recommend that it be accepted subject to
+  the suggestions below.}
+\reply{Thank you. We are providing a point-by-point reply below.}
+
+\subsubsection*{More big picture, less details}
+
+\pointRaised{Comment 2}{Overall, I think the paper provides too much detail on
+  relatively unimportant topics and not enough on the reasoning behind
+  important design decisions. I think you could comfortably reduce the paper
+  by 5-10 pages, referring the interested reader to the documentation for
+  more detail.}
+\reply{The paper was rewritten throughout and is now much tighter at just 23 pages.}
+
+\pointRaised{Comment 3}{I'd recommend shrinking section 3 to ~2 pages, and removing the
+  subheadings. This section should quickly orient the reader to the
+  RProtobuf API so they understand the big picture before learning more
+  details in the subsequent sections. I'd recommend picking one OO style
+  and sticking to it in this section - two is confusing.}
+\reply{We followed this recommendation and reduced section 3 to about 2 1/2 pages.}
+
+\pointRaised{Comment 3}{Section 4 dives into the details without giving a good overview and
+  motivation. Why use S4 and not RC? How are the objects made mutable?
+  Why do you provide both generic function and message passing OO
+  styles? What does \$ do in this context? What the heck is a
+  pseudo-method? Spend more time on those big issues rather than
+  describing each class in detail. Reduce class descriptions to a
+  bulleted list giving a high-level overview, then encourage the reader
+  to refer to the documentation for further details. Similarly, Tables
+  3-5 belong in the documentation, not in a vignette/paper.}
+\reply{Done. TO BE EXPANDED}
+
+\pointRaised{Comment 4}{Section 7 is weak. I think the important message is that RProtobuf is
+  being used in practice at large scale for for large data, and is
+  useful for communicating between R and Python. How can you make that
+  message stronger while avoiding (for the purposes of this paper) the
+  relatively unimportant details of the map-reduce setup?}
+\reply{TBD}
+
+\subsubsection*{R to/from Protobuf translation}
+
+\pointRaised{Comment 5}{The discussion of R to/from Protobuf could be improved. Table 9 would be
+  much simpler if instead of Message, you provided a "vectorised"
+  Messages class (this would also make the interface more consistent and
+  hence the package easier to use).}
+\reply{TBD}
+
+\pointRaised{Comment 6}{Along these lines, I think it would make sense to combine sections 5
+  and 6 and discuss translation challenges in both direction
+  simultaneously. At the minimum, add the equivalent for Table 9 that
+  shows how important R classes are converted to their protobuf
+  equivalents.}
+\reply{TBD}
+
+\pointRaised{Comment 7}{You should discuss how missing values are handled for strings and
+  integers, and why enums are not equivalent to factors. I think you
+  could make explicit how coercion of factors, dates, times and matrices
+  occurs, and the implications of this on sharing data structures
+  between programming languages. For example, how do you share date/time
+  data between R and python using RProtoBuf?}
+\reply{TBD}
+
+\pointRaised{Comment 8}{Table 10 is dying to be a plot, and a natural companion would be to
+  show how long it takes to serialise data frames using both RProtoBuf
+  and R's native serialisation. Is there a performance penalty to using
+  protobufs?}
+\reply{TBD}
+
+\subsubsection*{RObjectTables magic}
+
+\pointRaised{Comment 9}{The use of RObjectTables magic makes me uneasy. It doesn't seem like a
+  good fit for an infrastructure package and it's not clear what
+  advantages it has over explicitly loading a protobuf definition into
+  an object.}
+\reply{TBD}
+
+\pointRaised{Comment 10}{Using global state makes understanding code much harder. In Table 1,
+  it's not obvious where \texttt{tutorial.Person} comes from. Is it loaded by
+  default by RProtobuf? This need some explanation. In Section 7, what
+  does \texttt{readProtoFiles()} do? Why does \texttt{RProtobuf} need to be attached
+  as well as \texttt{HistogramTools}? This needs more explanation, and a
+  comment on the implications of this approach on CRAN packages and
+  namespaces.}
+\reply{TBD}
+
+\pointRaised{Comment 11}{
+  I'd prefer you eliminate this magic from the magic, but failing that,
+  you need a good explanation of why.}
+\reply{TBD}
+
+\subsubsection*{Code comments}
+
+\pointRaised{Comment 12}{Using \texttt{file.create()} to determine the absolute path seems like a bad idea.}
+\reply{TBD}
+
+
+\subsubsection*{Minor niggles}
+
+\pointRaised{Comment 13}{Don't refer to the message passing style of OO as traditional.}
+\reply{TBD}
+
+\pointRaised{Comment 14}{In Section 3.4, if messages isn't a vectorised class, the default
+   print method should use \texttt{cat()} to eliminate the confusing \texttt{[1]}.}
+\reply{TBD}
+
+\pointRaised{Comment 15}{The REXP definition would have been better defined using an enum that
+   matches R's SEXPTYPE "enum". But I guess that ship has sailed.}
+\reply{TBD}
+
+\pointRaised{Comment 16}{Why does \texttt{serialize\_pb(CO2, NULL)} fail silently? Shouldn't it at least
+   warn that the serialization is partial?}
+\reply{TBD}
+
+
+\section*{Response to Reviewer \#2}
+
+\pointRaised{Comment 1}{The paper gives an overview of the RProtoBuf package which implements an 
+  R interface to the Protocol Buffers library for an efficient 
+  serialization of objects. The paper is well written and easy to read. 
+  Introductory code is clear and the package provides objects to play with 
+  immediately without the need to jump through hoops, making it appealing. 
+  The software implementation is executed well.}
+\reply{Thank you.}
+
+\pointRaised{Comment 2}{There are, however, a few inconsistencies in the implementation and some 
+  issues with specific sections in the paper. In the following both issues 
+  will be addressed sequentially by their occurrence in the paper.}
+\reply{TBD}
+
+\pointRaised{Comment 3}{p.4 illustrates the use of messages. The class implements list-like 
+  access via \texttt{[[} and \$, but strangely \texttt{names()} return NULL and \texttt{length() }
+  doesn't correspond to the number of fields leading to startling results like
+the following:}
+
+\begin{verbatim}
+ > p
+[1] "message of type 'tutorial.Person' with 2 fields set"
+ > length(p)
+[1] 2
+ > p[[3]]
+[1] ""
+\end{verbatim}
+\reply{TBD}
+
+\pointRaised{Comment 3 cont.}{The inconsistencies get even more bizarre with descriptors (p.9):}
+
+\begin{verbatim}
+ > tutorial.Person$email
+[1] "descriptor for field 'email' of type 'tutorial.Person' "
+ > tutorial.Person[["email"]]
+Error in tutorial.Person[["email"]] : this S4 class is not subsettable
+ > names(tutorial.Person)
+NULL
+ > length(tutorial.Person)
+[1] 1
+\end{verbatim}
+\reply{TBD}
+
+\pointRaised{Comment 3 cont.}{It appears that there is no way to find out the fields of a descriptor 
+  directly (although the low-level object methods seem to be exposed as 
+  \texttt{\$field\_count()} and \texttt{\$fields()} - but that seems extremely cumbersome). 
+  Again, implementing names() and subsetting may help here.}
+\reply{TBD}
+
+\pointRaised{Comment 4}{Another inconsistency concerns the \texttt{as.list()} method which by design 
+  coerces objects to lists (see \texttt{?as.list}), but the implementation for 
+  EnumDescriptor breaks that contract and returns a vector instead:}
+
+\begin{verbatim}
+ > is.list(as.list(tutorial.Person$PhoneType))
+[1] FALSE
+ > str(as.list(tutorial.Person$PhoneType))
+  Named int [1:3] 0 1 2
+  - attr(*, "names")= chr [1:3] "MOBILE" "HOME" "WORK"
+\end{verbatim}
+
+\pointRaised{Comment 4 cont}{As with the other interfaces, names() returns NULL so it is again quite
+  difficult to perform even simple operations such as finding out the 
+  values. It may be natural use some of the standard methods like names(), 
+  levels() or similar. As with the previous cases, the lack of [[ support
+  makes it impossible to map named enum values to codes and vice-versa.}
+\reply{TBD}
+
+\pointRaised{Comment 5}{In general, the package would benefit from one pass of checks to assess
+  the consistency of the API. Since the authors intend direct interaction
+  with the objects via basic standard R methods, the classes should behave 
+  consistently.}
+\reply{TBD}
+
+\pointRaised{Comment 6}{Finally, most classes implement coercion to characters, which is not 
+  mentioned and is not quite intuitive for some objects. For example, one
+  may think that as.character() on a file descriptor returns let's say the 
+  filename, but we get:}
+
+\begin{verbatim}
+ > cat(as.character(tutorial.Person$fileDescriptor()))
+syntax = "proto2";
+
+package tutorial;
+
+option java_package = "com.example.tutorial";
+option java_outer_classname = "AddressBookProtos";
+[...]
+\end{verbatim}
+\reply{TBD}
+
+\pointRaised{Comment 7}{It is not necessary clear what java\_package has to do with a file 
+  descriptor in R. Depending on the intention here, it may be useful to 
+  explain this feature.
+}
+\reply{TBD}
+
+\subsubsection*{Other comments:}
+
+\pointRaised{Comment 8}{p.17: "does not support ... function, language or environment. Such 
+  objects have no native equivalent type in Protocol Buffers, and have 
+  little meaning outside the context or R"
+  That is certainly false. Native mirror of environments are hash tables - 
+  a very useful type indeed. Language objects are just lists, so there is
+  no reason to not include them - they can be useful to store expressions
+  that may not be necessary specific to R. Further on p. 18 your run into
+  the same problem that could be fixed so easily.}
+\reply{TBD}
+
+\pointRaised{Comment 9}{The examples in sections 7 and 8 are somewhat weak. It does not seem 
+  clear why one would wish to unleash the power of PB just to transfer 
+  breaks and counts for plotting - even a simple ASCII file would do that
+  just fine. The main point in the example is presumably that there are 
+  code generation methods for Hadoop based on PB IDL such that Hadoop can
+  be made aware of the data types, thus making a histogram a proper record 
+  that won't be split, can be combined etc. -- yet that is not mentioned 
+  nor a way presented how that can be leveraged in practice. The Python 
+  example code simply uses a static example with constants to simulate the 
+  output of a reducer so it doesn't illustrate the point - the reader is 
+  left confused why something as trivial would require PB while a savvy 
+  reader is not able to replicate the illustrated process. Possibly 
+  explaining the benefits and providing more details on how one would 
+  write such a job would make it much more relevant.}
+\reply{TBD}
+
+
+\pointRaised{Comment 10}{Section 8 is not very well motivated. It is much easier to use other 
+  formats for HTTP exchange - JSON is probably the most popular, but even
+  CSV works in simple settings. PB is a much less common standard. The 
+  main advantage of PB is the performance over the alternatives, but HTTP
+  services are not necessarily known for their high-throughput so why one
+  would sacrifice interoperability by using PB (they are still more hassle 
+  and require special installations)? It would be useful if the reason 
+  could be made explicit here or a better example chosen.}
+\reply{TBD}
+
+\end{document}
+
+%%% Local Variables: 
+%%% mode: latex
+%%% TeX-master: t
+%%% End: 


From noreply at r-forge.r-project.org  Mon Dec 15 19:52:34 2014
From: noreply at r-forge.r-project.org (noreply at r-forge.r-project.org)
Date: Mon, 15 Dec 2014 19:52:34 +0100 (CET)
Subject: [Rprotobuf-commits] r932 - papers/jss
Message-ID: <20141215185234.E8D351876E8@r-forge.r-project.org>

Author: murray
Date: 2014-12-15 19:52:34 +0100 (Mon, 15 Dec 2014)
New Revision: 932

Modified:
   papers/jss/response-to-reviewers.tex
Log:
Add more point to point replies.  Still working.  I can mostly finish this up today.


Modified: papers/jss/response-to-reviewers.tex
===================================================================
--- papers/jss/response-to-reviewers.tex	2014-12-15 03:01:40 UTC (rev 931)
+++ papers/jss/response-to-reviewers.tex	2014-12-15 18:52:34 UTC (rev 932)
@@ -72,14 +72,20 @@
   bulleted list giving a high-level overview, then encourage the reader
   to refer to the documentation for further details. Similarly, Tables
   3-5 belong in the documentation, not in a vignette/paper.}
-\reply{Done. TO BE EXPANDED}
+\reply{Done. RProtoBuf was designed and implemented before RC were
+  available, and this is noted in a footnote now.  Explanation of how
+  they are made mutable haas been added.  Better explanation of the
+  two styles and '\$' as been added, while no longer using the
+  confusing term
+  'pseudo-method' anywhere.  Moved Tables 3-5 into the documentation
+  and out of the paper, as suggested.}
 
 \pointRaised{Comment 4}{Section 7 is weak. I think the important message is that RProtobuf is
   being used in practice at large scale for for large data, and is
   useful for communicating between R and Python. How can you make that
   message stronger while avoiding (for the purposes of this paper) the
   relatively unimportant details of the map-reduce setup?}
-\reply{TBD}
+\reply{Done.  Rewritten with more motivation taking into account this feedback.}
 
 \subsubsection*{R to/from Protobuf translation}
 
@@ -87,15 +93,29 @@
   much simpler if instead of Message, you provided a "vectorised"
   Messages class (this would also make the interface more consistent and
   hence the package easier to use).}
-\reply{TBD}
+\reply{This is an area for future work and is a space explored in
+  another package called Motobuf by other authors.}
 
 \pointRaised{Comment 6}{Along these lines, I think it would make sense to combine sections 5
   and 6 and discuss translation challenges in both direction
   simultaneously. At the minimum, add the equivalent for Table 9 that
   shows how important R classes are converted to their protobuf
   equivalents.}
-\reply{TBD}
 
+\reply{We have updated these sections to make it clearer that the main
+  distinction is between schema-based datastructures (section 5) and
+  schema-less use where a catch-all .proto is used (section 6).
+  Neither section is meant to focus on only a single direction of the
+  conversion, but how conversion works when you have a schema or not.
+  How important R classes are converted to their protobuf equivalents
+  isn't super useful as a C++, Java, or Python program is unlikely to
+  want to read in an R data.frame exactly as it is defined.  Much more
+  likely is an application-specific message format is defined between the
+  two services, such as the HistogramTools example in the next section.
+  Much more detail has been added to an interesting part of section 6 --
+  which datasets exactly are better served with RProtoBuf than
+  base::serialize and why?}
+
 \pointRaised{Comment 7}{You should discuss how missing values are handled for strings and
   integers, and why enums are not equivalent to factors. I think you
   could make explicit how coercion of factors, dates, times and matrices
@@ -108,7 +128,16 @@
   show how long it takes to serialise data frames using both RProtoBuf
   and R's native serialisation. Is there a performance penalty to using
   protobufs?}
-\reply{TBD}
+\reply{Table 10 has been replaced with a plot, the outliers are
+  labeled, and the text now includes some interesting explanation
+  about the outliers.  Page 4 explains that the R implementation of
+  protocol buffers uses reflection to make operations slower but makes
+  it more convenient for interactive data analysis.  None of the
+  built-in datasets are large enough for performance to really come up
+  as an issue, and for any serialization method examples could be
+  found that significantly favor one over another, so we don't think
+  there will be benefit to adding anything here.
+}
 
 \subsubsection*{RObjectTables magic}
 
@@ -116,7 +145,8 @@
   good fit for an infrastructure package and it's not clear what
   advantages it has over explicitly loading a protobuf definition into
   an object.}
-\reply{TBD}
+\reply{More information about the advantages and disadvantages of this
+  approach have been added.}
 
 \pointRaised{Comment 10}{Using global state makes understanding code much harder. In Table 1,
   it's not obvious where \texttt{tutorial.Person} comes from. Is it loaded by
@@ -125,19 +155,23 @@
   as well as \texttt{HistogramTools}? This needs more explanation, and a
   comment on the implications of this approach on CRAN packages and
   namespaces.}
-\reply{TBD}
+\reply{We followed this recommendation and added explanation for how
+\texttt{tutorial.Person} is loaded, specifically : \emph{A small number of message types are imported when the
+package is first loaded, including the tutorial.Person type we saw in
+the last section.}  We removed the superfluous attach of \texttt{RProtoBuf}.}
 
 \pointRaised{Comment 11}{
   I'd prefer you eliminate this magic from the magic, but failing that,
   you need a good explanation of why.}
-\reply{TBD}
+\reply{We've added more explanation about this.}
 
 \subsubsection*{Code comments}
 
 \pointRaised{Comment 12}{Using \texttt{file.create()} to determine the absolute path seems like a bad idea.}
-\reply{TBD}
+\reply{We followed this recommendation and removed two instances of
+  \texttt{file.create()} for this purpose with calls to
+  \texttt{normalizePath} with \texttt{mustWork=FALSE}.}
 
-
 \subsubsection*{Minor niggles}
 
 \pointRaised{Comment 13}{Don't refer to the message passing style of OO as traditional.}


From noreply at r-forge.r-project.org  Mon Dec 15 22:46:52 2014
From: noreply at r-forge.r-project.org (noreply at r-forge.r-project.org)
Date: Mon, 15 Dec 2014 22:46:52 +0100 (CET)
Subject: [Rprotobuf-commits] r933 - papers/jss
Message-ID: <20141215214652.17AEA1859B0@r-forge.r-project.org>

Author: murray
Date: 2014-12-15 22:46:51 +0100 (Mon, 15 Dec 2014)
New Revision: 933

Modified:
   papers/jss/response-to-reviewers.tex
Log:
More point-by-point responses.


Modified: papers/jss/response-to-reviewers.tex
===================================================================
--- papers/jss/response-to-reviewers.tex	2014-12-15 18:52:34 UTC (rev 932)
+++ papers/jss/response-to-reviewers.tex	2014-12-15 21:46:51 UTC (rev 933)
@@ -158,7 +158,8 @@
 \reply{We followed this recommendation and added explanation for how
 \texttt{tutorial.Person} is loaded, specifically : \emph{A small number of message types are imported when the
 package is first loaded, including the tutorial.Person type we saw in
-the last section.}  We removed the superfluous attach of \texttt{RProtoBuf}.}
+the last section.}  Thank you also for spotting the superfluous attach
+of \texttt{RProtoBuf}, it has been removed from the example.}
 
 \pointRaised{Comment 11}{
   I'd prefer you eliminate this magic from the magic, but failing that,
@@ -175,21 +176,26 @@
 \subsubsection*{Minor niggles}
 
 \pointRaised{Comment 13}{Don't refer to the message passing style of OO as traditional.}
-\reply{TBD}
+\reply{Done, we don't refer to this style as traditional anywhere in
+  the manuscript anymore.}
 
 \pointRaised{Comment 14}{In Section 3.4, if messages isn't a vectorised class, the default
    print method should use \texttt{cat()} to eliminate the confusing \texttt{[1]}.}
-\reply{TBD}
+\reply{Done}
 
 \pointRaised{Comment 15}{The REXP definition would have been better defined using an enum that
    matches R's SEXPTYPE "enum". But I guess that ship has sailed.}
-\reply{TBD}
+\reply{Acknowledged.  We chose to maintain compatibility with RHIPE here.  The main
+use of RProtoBuf is not with rexp.proto however -- it with
+application-specific schemas in .proto files for sending data between
+applications.  Users that want to do something very R-specific are
+welcome to use their own \texttt{.proto} files with an enum to represent R SEXPTYPEs.}
 
 \pointRaised{Comment 16}{Why does \texttt{serialize\_pb(CO2, NULL)} fail silently? Shouldn't it at least
    warn that the serialization is partial?}
-\reply{TBD}
+\reply{Fixed, \texttt{serialize\_pb} now works for all built-in datatypes in R
+  and no longer fails silently if it encounters something it can't serialize.}
 
-
 \section*{Response to Reviewer \#2}
 
 \pointRaised{Comment 1}{The paper gives an overview of the RProtoBuf package which implements an 
@@ -203,7 +209,8 @@
 \pointRaised{Comment 2}{There are, however, a few inconsistencies in the implementation and some 
   issues with specific sections in the paper. In the following both issues 
   will be addressed sequentially by their occurrence in the paper.}
-\reply{TBD}
+\reply{These and others have been identified and addressed.  Thank you
+  for taking the time to enumerate these issues.}
 
 \pointRaised{Comment 3}{p.4 illustrates the use of messages. The class implements list-like 
   access via \texttt{[[} and \$, but strangely \texttt{names()} return NULL and \texttt{length() }
@@ -218,7 +225,21 @@
  > p[[3]]
 [1] ""
 \end{verbatim}
-\reply{TBD}
+\reply{We've corrected the list-like accessor, fixed \texttt{length()} to
+  correspond to the number of set fields, and added \texttt{names()}:}
+\begin{verbatim}
+> p
+message of type 'tutorial.Person' with 0 fields set
+> length(p)
+[1] 0
+> p[[3]]
+[1] ""
+> p$id <- 1
+> length(p)
+[1] 1
+> names(p)
+[1] "name"  "id"    "email" "phone"
+\end{verbatim}
 
 \pointRaised{Comment 3 cont.}{The inconsistencies get even more bizarre with descriptors (p.9):}
 
@@ -232,13 +253,31 @@
  > length(tutorial.Person)
 [1] 1
 \end{verbatim}
-\reply{TBD}
+\reply{We agree, and have addressed this inconsistency.  Thank you:}
+\begin{verbatim}
+> tutorial.Person$email
+descriptor for field 'email' of type 'tutorial.Person' 
+> tutorial.Person[["email"]]
+descriptor for field 'email' of type 'tutorial.Person' 
+> names(tutorial.Person)
+[1] "name"        "id"          "email"       "phone"       "PhoneNumber"
+[6] "PhoneType"  
+> length(tutorial.Person)
+[1] 6
+\end{verbatim}
 
 \pointRaised{Comment 3 cont.}{It appears that there is no way to find out the fields of a descriptor 
   directly (although the low-level object methods seem to be exposed as 
   \texttt{\$field\_count()} and \texttt{\$fields()} - but that seems extremely cumbersome). 
   Again, implementing names() and subsetting may help here.}
-\reply{TBD}
+\reply{\texttt{names} and subsetting implemented.  Thank you for the
+  suggestion.:}
+\begin{verbatim}
+> tutorial.Person[[1]]
+descriptor for field 'name' of type 'tutorial.Person' 
+> tutorial.Person[[2]]
+descriptor for field 'id' of type 'tutorial.Person' 
+\end{verbatim}
 
 \pointRaised{Comment 4}{Another inconsistency concerns the \texttt{as.list()} method which by design 
   coerces objects to lists (see \texttt{?as.list}), but the implementation for 
@@ -252,18 +291,36 @@
   - attr(*, "names")= chr [1:3] "MOBILE" "HOME" "WORK"
 \end{verbatim}
 
+\reply{Fixed, thank you. New output:}
+\begin{verbatim}
+> is.list(as.list(tutorial.Person$PhoneType))
+[1] TRUE
+> str(as.list(tutorial.Person$PhoneType))
+List of 3
+ $ MOBILE: int 0
+ $ HOME  : int 1
+ $ WORK  : int 2
+\end{verbatim}
+
 \pointRaised{Comment 4 cont}{As with the other interfaces, names() returns NULL so it is again quite
   difficult to perform even simple operations such as finding out the 
   values. It may be natural use some of the standard methods like names(), 
   levels() or similar. As with the previous cases, the lack of [[ support
   makes it impossible to map named enum values to codes and vice-versa.}
-\reply{TBD}
+\reply{Fixed, thank you.  New output:}
+\begin{verbatim}
+> names(tutorial.Person$PhoneType)
+[1] "MOBILE" "HOME"   "WORK"  
+> tutorial.Person$PhoneType[["HOME"]]
+[1] 1
+\end{verbatim}
 
 \pointRaised{Comment 5}{In general, the package would benefit from one pass of checks to assess
   the consistency of the API. Since the authors intend direct interaction
   with the objects via basic standard R methods, the classes should behave 
   consistently.}
-\reply{TBD}
+\reply{We made several passes, correcting issues as documented in
+  \texttt{ChangeLog} and now present in our latest 0.4.2 release on CRAN.}
 
 \pointRaised{Comment 6}{Finally, most classes implement coercion to characters, which is not 
   mentioned and is not quite intuitive for some objects. For example, one
@@ -280,13 +337,29 @@
 option java_outer_classname = "AddressBookProtos";
 [...]
 \end{verbatim}
-\reply{TBD}
+\reply{In choosing the debug output for a file descriptor we agree
+  that \texttt{filename} is a reasonable thing to expect, but we also
+  think that the contents of the \texttt{.proto} file is also
+  reasonable, and also more useful.  We document this in
+  ``FileDescriptor-class'', the vignette, and other sources.
+  \texttt{@filename} is one of the slots of the FileDescriptor class
+  and so very easy to find.  The contents of the \texttt{.proto} are
+  not as easily accessible in a slot, however, and so we find it much
+  more useful to be output with \texttt{as.character()}.}
 
 \pointRaised{Comment 7}{It is not necessary clear what java\_package has to do with a file 
   descriptor in R. Depending on the intention here, it may be useful to 
   explain this feature.
 }
-\reply{TBD}
+\reply{This snippet has been removed as part of the general move of
+  less relevant details to the package documentation, but for
+  reference the \texttt{.proto} file syntax is defined in the Protocol Buffers
+  language guide which is referenced earlier. It is a cross platform
+  library and so this syntax specifies some parameters when Java code
+  is used to access the structures defined in this file.  No such
+  special syntax is required in the \texttt{.proto} files for R
+  language code and so this line about java\_package was not relevant
+  or needed in any way for RProtoBuf and is documented elsewhere.}
 
 \subsubsection*{Other comments:}
 
@@ -298,7 +371,14 @@
   no reason to not include them - they can be useful to store expressions
   that may not be necessary specific to R. Further on p. 18 your run into
   the same problem that could be fixed so easily.}
-\reply{TBD}
+\reply{You are right.  Environments are more than just hash
+  tables because they include other configuration parameters that are
+  necessary to serialize as well to make sure
+  serialization/unserialization is indempotent, but we agree it is
+  cleaner and the package and the exposition in the paper to just make
+  sure we serialize everything.  We can now fall back to
+  \texttt{base::serialize} and storing the bits in a rawString type of
+  RProtoBuf to make the R schema-less serialization more complete.}
 
 \pointRaised{Comment 9}{The examples in sections 7 and 8 are somewhat weak. It does not seem 
   clear why one would wish to unleash the power of PB just to transfer 


From noreply at r-forge.r-project.org  Tue Dec 16 02:18:04 2014
From: noreply at r-forge.r-project.org (noreply at r-forge.r-project.org)
Date: Tue, 16 Dec 2014 02:18:04 +0100 (CET)
Subject: [Rprotobuf-commits] r934 - papers/jss
Message-ID: <20141216011804.9FAC3187794@r-forge.r-project.org>

Author: murray
Date: 2014-12-16 02:18:04 +0100 (Tue, 16 Dec 2014)
New Revision: 934

Modified:
   papers/jss/response-to-reviewers.tex
Log:
Address remaining points in referee feedback.


Modified: papers/jss/response-to-reviewers.tex
===================================================================
--- papers/jss/response-to-reviewers.tex	2014-12-15 21:46:51 UTC (rev 933)
+++ papers/jss/response-to-reviewers.tex	2014-12-16 01:18:04 UTC (rev 934)
@@ -54,14 +54,17 @@
   important design decisions. I think you could comfortably reduce the paper
   by 5-10 pages, referring the interested reader to the documentation for
   more detail.}
-\reply{The paper was rewritten throughout and is now much tighter at just 23 pages.}
+\reply{The paper is now 6-pages much tighter at just 23 pages.
+  Sections 3 - 8 (all but sec 1 introduction, sec 2 protocol buffers,
+  and sec 9 conclusion have been rewritten to address the specific and
+  general feedback in these reviews)}
 
 \pointRaised{Comment 3}{I'd recommend shrinking section 3 to ~2 pages, and removing the
   subheadings. This section should quickly orient the reader to the
   RProtobuf API so they understand the big picture before learning more
   details in the subsequent sections. I'd recommend picking one OO style
   and sticking to it in this section - two is confusing.}
-\reply{We followed this recommendation and reduced section 3 to about 2 1/2 pages.}
+\reply{We followed this recommendation and reduced section 3 to about $2\frac{1}{2}$ pages.}
 
 \pointRaised{Comment 3}{Section 4 dives into the details without giving a good overview and
   motivation. Why use S4 and not RC? How are the objects made mutable?
@@ -74,10 +77,10 @@
   3-5 belong in the documentation, not in a vignette/paper.}
 \reply{Done. RProtoBuf was designed and implemented before RC were
   available, and this is noted in a footnote now.  Explanation of how
-  they are made mutable haas been added.  Better explanation of the
-  two styles and '\$' as been added, while no longer using the
+  they are made mutable has been added.  Better explanation of the
+  two styles and '\$' as been added.  We are no longer using the
   confusing term
-  'pseudo-method' anywhere.  Moved Tables 3-5 into the documentation
+  'pseudo-method' anywhere.  We moved Tables 3-5 into the documentation
   and out of the paper, as suggested.}
 
 \pointRaised{Comment 4}{Section 7 is weak. I think the important message is that RProtobuf is
@@ -93,15 +96,27 @@
   much simpler if instead of Message, you provided a "vectorised"
   Messages class (this would also make the interface more consistent and
   hence the package easier to use).}
-\reply{This is an area for future work and is a space explored in
-  another package called Motobuf by other authors.}
+\reply{This is a good observation that only became clear to us after
+  significant usage of \texttt{RProtoBuf}.  Providing a full ``vectorized'' Messages class would require slicing
+  operators that let you quickly extract a given field from each
+  element of the message vector in order to be really useful.  This
+  would require significant amounts of C++ code for efficient
+  manipulation on the order of data.table or other similar large C++ R
+  packages on CRAN.  There is another package called Motobuf by other authors
+  that takes this approach but in practice, at Google at least, the
+  ease-of-use provided by the simple Message interface of RProtoBuf
+  has won with users.  It is still future work to keep the simple
+  interactive interface of RProtoBuf with the vectorized efficiency of
+  Motobuf.  For now, users typically do their slicing of vectors like
+  this through a distributed database (NewSQL is the term of the day?)
+  like Dremel or other system and then just get the response Protocol
+  Buffers in return to the request.}
 
 \pointRaised{Comment 6}{Along these lines, I think it would make sense to combine sections 5
   and 6 and discuss translation challenges in both direction
   simultaneously. At the minimum, add the equivalent for Table 9 that
   shows how important R classes are converted to their protobuf
   equivalents.}
-
 \reply{We have updated these sections to make it clearer that the main
   distinction is between schema-based datastructures (section 5) and
   schema-less use where a catch-all .proto is used (section 6).
@@ -122,7 +137,13 @@
   occurs, and the implications of this on sharing data structures
   between programming languages. For example, how do you share date/time
   data between R and python using RProtoBuf?}
-\reply{TBD}
+\reply{All of these details are application-specific, whereas
+  RProtoBuf is an infrastructure package.  Distributed systems define
+  their own interfaces, with their own date/time fields, usually as
+  int64s of fractional seconds since the unix epoch for the systems I
+  have worked on.  An example is given for Histograms in the next
+  section.  Factors could be represented as repeated enums in protocol
+  buffers, certainly, if that is how one wanted to define a schema.}
 
 \pointRaised{Comment 8}{Table 10 is dying to be a plot, and a natural companion would be to
   show how long it takes to serialise data frames using both RProtoBuf
@@ -135,9 +156,8 @@
   it more convenient for interactive data analysis.  None of the
   built-in datasets are large enough for performance to really come up
   as an issue, and for any serialization method examples could be
-  found that significantly favor one over another, so we don't think
-  there will be benefit to adding anything here.
-}
+  found that significantly favor one over another in runtime, so we
+  don't think there will be benefit to adding anything here.  }
 
 \subsubsection*{RObjectTables magic}
 
@@ -181,13 +201,13 @@
 
 \pointRaised{Comment 14}{In Section 3.4, if messages isn't a vectorised class, the default
    print method should use \texttt{cat()} to eliminate the confusing \texttt{[1]}.}
-\reply{Done}
+\reply{Done, thanks.}
 
 \pointRaised{Comment 15}{The REXP definition would have been better defined using an enum that
    matches R's SEXPTYPE "enum". But I guess that ship has sailed.}
 \reply{Acknowledged.  We chose to maintain compatibility with RHIPE here.  The main
-use of RProtoBuf is not with rexp.proto however -- it with
-application-specific schemas in .proto files for sending data between
+use of RProtoBuf is not with \texttt{rexp.proto} however -- it with
+application-specific schemas in \texttt{.proto} files for sending data between
 applications.  Users that want to do something very R-specific are
 welcome to use their own \texttt{.proto} files with an enum to represent R SEXPTYPEs.}
 
@@ -324,7 +344,7 @@
 
 \pointRaised{Comment 6}{Finally, most classes implement coercion to characters, which is not 
   mentioned and is not quite intuitive for some objects. For example, one
-  may think that as.character() on a file descriptor returns let's say the 
+  may think that \texttt{as.character()} on a file descriptor returns let's say the 
   filename, but we get:}
 
 \begin{verbatim}
@@ -337,10 +357,12 @@
 option java_outer_classname = "AddressBookProtos";
 [...]
 \end{verbatim}
-\reply{In choosing the debug output for a file descriptor we agree
+\reply{The behavior is documented in the package documentation but
+  seemed like a minor detail not important for an already-long paper.
+  In choosing the debug output for a file descriptor we agree
   that \texttt{filename} is a reasonable thing to expect, but we also
   think that the contents of the \texttt{.proto} file is also
-  reasonable, and also more useful.  We document this in
+  reasonable, but more useful.  We document this in
   ``FileDescriptor-class'', the vignette, and other sources.
   \texttt{@filename} is one of the slots of the FileDescriptor class
   and so very easy to find.  The contents of the \texttt{.proto} are
@@ -394,9 +416,17 @@
   reader is not able to replicate the illustrated process. Possibly 
   explaining the benefits and providing more details on how one would 
   write such a job would make it much more relevant.}
-\reply{TBD}
+\reply{Yes, we added more detail about the advantages of using a
+  proper data type for the histograms in this example that you mentioned here -- the
+  ability to write combiners, prevent arbitrary splitting of the
+  records, etc that can greatly improve performance.  We agree with
+  the other reviewer that we don't want to get bogged down in details
+  about a particular MapReduce implementation (such as Hadoop) and so
+  now we specifically mention that goal here.
+  I think we make a better connection now between the
+  abstract MapReduce example given, and then the simpler Python
+  example code with a static example.}
 
-
 \pointRaised{Comment 10}{Section 8 is not very well motivated. It is much easier to use other 
   formats for HTTP exchange - JSON is probably the most popular, but even
   CSV works in simple settings. PB is a much less common standard. The 
@@ -405,7 +435,17 @@
   would sacrifice interoperability by using PB (they are still more hassle 
   and require special installations)? It would be useful if the reason 
   could be made explicit here or a better example chosen.}
-\reply{TBD}
+\reply{This section has been reworded to make it shorter and more
+  crisp, with fewer extraneous details about OpenCPU.
+Protocol
+  Buffers is an efficient protocol used between distributed systems at
+  many of the world's largest internet companies (Twitter, Sony,
+  Google, etc.) but the design and implementation of a large
+  enterprise-scale distributed system with a complex RPC system and
+  serialization needs is well beyond the scope of what we can add to a
+  paper about RProtoBuf.  We chose this example because it is a much
+  more accessible example that any reader can use to easily
+  send/receive RPCs and parse the results with RProtoBuf.}
 
 \end{document}
 

From noreply at r-forge.r-project.org  Wed Dec 17 00:02:00 2014
From: noreply at r-forge.r-project.org (noreply at r-forge.r-project.org)
Date: Wed, 17 Dec 2014 00:02:00 +0100 (CET)
Subject: [Rprotobuf-commits] r935 - papers/jss
Message-ID: <20141216230200.7B3021877C7@r-forge.r-project.org>

Author: edd
Date: 2014-12-17 00:02:00 +0100 (Wed, 17 Dec 2014)
New Revision: 935

Added:
   papers/jss/article-submitted-2014-03.pdf
Log:
initial submission

Added: papers/jss/article-submitted-2014-03.pdf
===================================================================
(Binary files differ)


Property changes on: papers/jss/article-submitted-2014-03.pdf
___________________________________________________________________
Added: svn:mime-type
   + application/octet-stream


From noreply at r-forge.r-project.org  Wed Dec 17 03:04:32 2014
From: noreply at r-forge.r-project.org (noreply at r-forge.r-project.org)
Date: Wed, 17 Dec 2014 03:04:32 +0100 (CET)
Subject: [Rprotobuf-commits] r936 - papers/jss
Message-ID: <20141217020432.5B3EA1878AB@r-forge.r-project.org>

Author: edd
Date: 2014-12-17 03:04:25 +0100 (Wed, 17 Dec 2014)
New Revision: 936

Modified:
   papers/jss/response-to-reviewers.tex
Log:
Halfway done another pass. This is coming together very well too.


Modified: papers/jss/response-to-reviewers.tex
===================================================================
--- papers/jss/response-to-reviewers.tex	2014-12-16 23:02:00 UTC (rev 935)
+++ papers/jss/response-to-reviewers.tex	2014-12-17 02:04:25 UTC (rev 936)
@@ -54,17 +54,18 @@
   important design decisions. I think you could comfortably reduce the paper
   by 5-10 pages, referring the interested reader to the documentation for
   more detail.}
-\reply{The paper is now 6-pages much tighter at just 23 pages.
-  Sections 3 - 8 (all but sec 1 introduction, sec 2 protocol buffers,
-  and sec 9 conclusion have been rewritten to address the specific and
-  general feedback in these reviews)}
+\reply{The paper is now six pages shorter at just 23 pages.
+  Sections 3 - 8 (all but Section 1 (``Introduction''), Section 2 (``Protocol Buffers''),
+  and Section 9 (``Conclusion'') have been thoroughly rewritten to address the specific and
+  general feedback in these reviews.}
 
 \pointRaised{Comment 3}{I'd recommend shrinking section 3 to ~2 pages, and removing the
   subheadings. This section should quickly orient the reader to the
   RProtobuf API so they understand the big picture before learning more
   details in the subsequent sections. I'd recommend picking one OO style
   and sticking to it in this section - two is confusing.}
-\reply{We followed this recommendation and reduced section 3 to about $2\frac{1}{2}$ pages.}
+\reply{We followed this recommendation, reduced section 3 to about
+  $2\frac{1}{2}$ pages, removed the subheadings and tightened the exposition.}
 
 \pointRaised{Comment 3}{Section 4 dives into the details without giving a good overview and
   motivation. Why use S4 and not RC? How are the objects made mutable?
@@ -76,12 +77,11 @@
   to refer to the documentation for further details. Similarly, Tables
   3-5 belong in the documentation, not in a vignette/paper.}
 \reply{Done. RProtoBuf was designed and implemented before RC were
-  available, and this is noted in a footnote now.  Explanation of how
+  available, and this is now noted explicitly in a new footnote.  Explanation of how
   they are made mutable has been added.  Better explanation of the
   two styles and '\$' as been added.  We are no longer using the
-  confusing term
-  'pseudo-method' anywhere.  We moved Tables 3-5 into the documentation
-  and out of the paper, as suggested.}
+  confusing term 'pseudo-method' anywhere.  We also moved Tables 3-5 into the
+  documentation and out of the paper, as suggested.}
 
 \pointRaised{Comment 4}{Section 7 is weak. I think the important message is that RProtobuf is
   being used in practice at large scale for for large data, and is
@@ -103,8 +103,8 @@
   would require significant amounts of C++ code for efficient
   manipulation on the order of data.table or other similar large C++ R
   packages on CRAN.  There is another package called Motobuf by other authors
-  that takes this approach but in practice, at Google at least, the
-  ease-of-use provided by the simple Message interface of RProtoBuf
+  that takes this approach but in practice (at least for the several hundred
+  users at Google), the ease-of-use provided by the simple Message interface of RProtoBuf
   has won with users.  It is still future work to keep the simple
   interactive interface of RProtoBuf with the vectorized efficiency of
   Motobuf.  For now, users typically do their slicing of vectors like
@@ -117,9 +117,9 @@
   simultaneously. At the minimum, add the equivalent for Table 9 that
   shows how important R classes are converted to their protobuf
   equivalents.}
-\reply{We have updated these sections to make it clearer that the main
-  distinction is between schema-based datastructures (section 5) and
-  schema-less use where a catch-all .proto is used (section 6).
+\reply{Done. We have updated these sections to make it clearer that the main
+  distinction is between schema-based datastructures (Section 5) and
+  schema-less use where a catch-all \texttt{.proto} is used (Section 6).
   Neither section is meant to focus on only a single direction of the
   conversion, but how conversion works when you have a schema or not.
   How important R classes are converted to their protobuf equivalents
@@ -129,7 +129,7 @@
   two services, such as the HistogramTools example in the next section.
   Much more detail has been added to an interesting part of section 6 --
   which datasets exactly are better served with RProtoBuf than
-  base::serialize and why?}
+  \texttt{base::serialize} and why?}
 
 \pointRaised{Comment 7}{You should discuss how missing values are handled for strings and
   integers, and why enums are not equivalent to factors. I think you
@@ -140,19 +140,19 @@
 \reply{All of these details are application-specific, whereas
   RProtoBuf is an infrastructure package.  Distributed systems define
   their own interfaces, with their own date/time fields, usually as
-  int64s of fractional seconds since the unix epoch for the systems I
+  a double of fractional seconds since the unix epoch for the systems I
   have worked on.  An example is given for Histograms in the next
-  section.  Factors could be represented as repeated enums in protocol
-  buffers, certainly, if that is how one wanted to define a schema.}
+  section.  Factors could be represented as repeated enums in Protocol
+  Buffers, certainly, if that is how one wanted to define a schema.}
 
 \pointRaised{Comment 8}{Table 10 is dying to be a plot, and a natural companion would be to
   show how long it takes to serialise data frames using both RProtoBuf
   and R's native serialisation. Is there a performance penalty to using
   protobufs?}
-\reply{Table 10 has been replaced with a plot, the outliers are
+\reply{Done. Table 10 has been replaced with a plot, the outliers are
   labeled, and the text now includes some interesting explanation
   about the outliers.  Page 4 explains that the R implementation of
-  protocol buffers uses reflection to make operations slower but makes
+  Protocol Buffers uses reflection to make operations slower but makes
   it more convenient for interactive data analysis.  None of the
   built-in datasets are large enough for performance to really come up
   as an issue, and for any serialization method examples could be
@@ -165,7 +165,7 @@
   good fit for an infrastructure package and it's not clear what
   advantages it has over explicitly loading a protobuf definition into
   an object.}
-\reply{More information about the advantages and disadvantages of this
+\reply{Done. More information about the advantages and disadvantages of this
   approach have been added.}
 
 \pointRaised{Comment 10}{Using global state makes understanding code much harder. In Table 1,
@@ -175,28 +175,28 @@
   as well as \texttt{HistogramTools}? This needs more explanation, and a
   comment on the implications of this approach on CRAN packages and
   namespaces.}
-\reply{We followed this recommendation and added explanation for how
-\texttt{tutorial.Person} is loaded, specifically : \emph{A small number of message types are imported when the
-package is first loaded, including the tutorial.Person type we saw in
-the last section.}  Thank you also for spotting the superfluous attach
-of \texttt{RProtoBuf}, it has been removed from the example.}
+\reply{Done. We followed this recommendation and added explanation for how
+  \texttt{tutorial.Person} is loaded, specifically : \emph{A small number of message types are imported when the
+    package is first loaded, including the tutorial.Person type we saw in
+    the last section.}  Thank you also for spotting the superfluous attach
+  of \texttt{RProtoBuf}, it has been removed from the example.}
 
 \pointRaised{Comment 11}{
   I'd prefer you eliminate this magic from the magic, but failing that,
   you need a good explanation of why.}
-\reply{We've added more explanation about this.}
+\reply{Done. We've added more explanation about this.}
 
 \subsubsection*{Code comments}
 
 \pointRaised{Comment 12}{Using \texttt{file.create()} to determine the absolute path seems like a bad idea.}
-\reply{We followed this recommendation and removed two instances of
+\reply{Done. We followed this recommendation and removed two instances of
   \texttt{file.create()} for this purpose with calls to
   \texttt{normalizePath} with \texttt{mustWork=FALSE}.}
 
 \subsubsection*{Minor niggles}
 
 \pointRaised{Comment 13}{Don't refer to the message passing style of OO as traditional.}
-\reply{Done, we don't refer to this style as traditional anywhere in
+\reply{Done. We don't refer to this style as traditional anywhere in
   the manuscript anymore.}
 
 \pointRaised{Comment 14}{In Section 3.4, if messages isn't a vectorised class, the default
@@ -213,7 +213,7 @@
 
 \pointRaised{Comment 16}{Why does \texttt{serialize\_pb(CO2, NULL)} fail silently? Shouldn't it at least
    warn that the serialization is partial?}
-\reply{Fixed, \texttt{serialize\_pb} now works for all built-in datatypes in R
+\reply{Done. We fixed this and \texttt{serialize\_pb} now works for all built-in datatypes in R
   and no longer fails silently if it encounters something it can't serialize.}
 
 \section*{Response to Reviewer \#2}


From noreply at r-forge.r-project.org  Wed Dec 17 03:34:24 2014
From: noreply at r-forge.r-project.org (noreply at r-forge.r-project.org)
Date: Wed, 17 Dec 2014 03:34:24 +0100 (CET)
Subject: [Rprotobuf-commits] r937 - papers/jss
Message-ID: <20141217023425.0822C1878E8@r-forge.r-project.org>

Author: edd
Date: 2014-12-17 03:34:24 +0100 (Wed, 17 Dec 2014)
New Revision: 937

Modified:
   papers/jss/response-to-reviewers.tex
Log:
a few more edits


Modified: papers/jss/response-to-reviewers.tex
===================================================================
--- papers/jss/response-to-reviewers.tex	2014-12-17 02:04:25 UTC (rev 936)
+++ papers/jss/response-to-reviewers.tex	2014-12-17 02:34:24 UTC (rev 937)
@@ -229,7 +229,7 @@
 \pointRaised{Comment 2}{There are, however, a few inconsistencies in the implementation and some 
   issues with specific sections in the paper. In the following both issues 
   will be addressed sequentially by their occurrence in the paper.}
-\reply{These and others have been identified and addressed.  Thank you
+\reply{Done. These and others have been identified and addressed.  Thank you
   for taking the time to enumerate these issues.}
 
 \pointRaised{Comment 3}{p.4 illustrates the use of messages. The class implements list-like 
@@ -245,7 +245,7 @@
  > p[[3]]
 [1] ""
 \end{verbatim}
-\reply{We've corrected the list-like accessor, fixed \texttt{length()} to
+\reply{Done. We have corrected the list-like accessor, fixed \texttt{length()} to
   correspond to the number of set fields, and added \texttt{names()}:}
 \begin{verbatim}
 > p
@@ -273,7 +273,8 @@
  > length(tutorial.Person)
 [1] 1
 \end{verbatim}
-\reply{We agree, and have addressed this inconsistency.  Thank you:}
+\reply{Done. We agree, and have addressed this inconsistency.  Thank you for
+  catching this.}
 \begin{verbatim}
 > tutorial.Person$email
 descriptor for field 'email' of type 'tutorial.Person' 
@@ -290,8 +291,8 @@
   directly (although the low-level object methods seem to be exposed as 
   \texttt{\$field\_count()} and \texttt{\$fields()} - but that seems extremely cumbersome). 
   Again, implementing names() and subsetting may help here.}
-\reply{\texttt{names} and subsetting implemented.  Thank you for the
-  suggestion.:}
+\reply{Done. We have implemented \texttt{names} and subsetting.  Thank you for the
+  suggestion.}
 \begin{verbatim}
 > tutorial.Person[[1]]
 descriptor for field 'name' of type 'tutorial.Person' 
@@ -311,7 +312,7 @@
   - attr(*, "names")= chr [1:3] "MOBILE" "HOME" "WORK"
 \end{verbatim}
 
-\reply{Fixed, thank you. New output:}
+\reply{Done, thank you. New output below:}
 \begin{verbatim}
 > is.list(as.list(tutorial.Person$PhoneType))
 [1] TRUE
@@ -327,7 +328,7 @@
   values. It may be natural use some of the standard methods like names(), 
   levels() or similar. As with the previous cases, the lack of [[ support
   makes it impossible to map named enum values to codes and vice-versa.}
-\reply{Fixed, thank you.  New output:}
+\reply{Done, thank you.  New output:}
 \begin{verbatim}
 > names(tutorial.Person$PhoneType)
 [1] "MOBILE" "HOME"   "WORK"  
@@ -339,7 +340,7 @@
   the consistency of the API. Since the authors intend direct interaction
   with the objects via basic standard R methods, the classes should behave 
   consistently.}
-\reply{We made several passes, correcting issues as documented in
+\reply{We made several passes, correcting issues as documented in the
   \texttt{ChangeLog} and now present in our latest 0.4.2 release on CRAN.}
 
 \pointRaised{Comment 6}{Finally, most classes implement coercion to characters, which is not 
@@ -362,7 +363,7 @@
   In choosing the debug output for a file descriptor we agree
   that \texttt{filename} is a reasonable thing to expect, but we also
   think that the contents of the \texttt{.proto} file is also
-  reasonable, but more useful.  We document this in
+  reasonable, but more useful.  We document this in the help for
   ``FileDescriptor-class'', the vignette, and other sources.
   \texttt{@filename} is one of the slots of the FileDescriptor class
   and so very easy to find.  The contents of the \texttt{.proto} are
@@ -373,7 +374,7 @@
   descriptor in R. Depending on the intention here, it may be useful to 
   explain this feature.
 }
-\reply{This snippet has been removed as part of the general move of
+\reply{Done. This snippet has been removed as part of the general move of
   less relevant details to the package documentation, but for
   reference the \texttt{.proto} file syntax is defined in the Protocol Buffers
   language guide which is referenced earlier. It is a cross platform
@@ -393,13 +394,13 @@
   no reason to not include them - they can be useful to store expressions
   that may not be necessary specific to R. Further on p. 18 your run into
   the same problem that could be fixed so easily.}
-\reply{You are right.  Environments are more than just hash
+\reply{Acknowledged.  Environments are more than just hash
   tables because they include other configuration parameters that are
   necessary to serialize as well to make sure
   serialization/unserialization is indempotent, but we agree it is
   cleaner and the package and the exposition in the paper to just make
   sure we serialize everything.  We can now fall back to
-  \texttt{base::serialize} and storing the bits in a rawString type of
+  \texttt{base::serialize()} and storing the bits in a rawString type of
   RProtoBuf to make the R schema-less serialization more complete.}
 
 \pointRaised{Comment 9}{The examples in sections 7 and 8 are somewhat weak. It does not seem 
@@ -435,9 +436,8 @@
   would sacrifice interoperability by using PB (they are still more hassle 
   and require special installations)? It would be useful if the reason 
   could be made explicit here or a better example chosen.}
-\reply{This section has been reworded to make it shorter and more
-  crisp, with fewer extraneous details about OpenCPU.
-Protocol
+\reply{Done. This section has been reworded to make it shorter and more
+  crisp, with fewer extraneous details about OpenCPU. Protocol
   Buffers is an efficient protocol used between distributed systems at
   many of the world's largest internet companies (Twitter, Sony,
   Google, etc.) but the design and implementation of a large