[Rprotobuf-commits] r855 - papers/jss

noreply at r-forge.r-project.org noreply at r-forge.r-project.org
Sun Jan 26 23:29:04 CET 2014


Author: edd
Date: 2014-01-26 23:29:03 +0100 (Sun, 26 Jan 2014)
New Revision: 855

Modified:
   papers/jss/article.Rnw
Log:
one spell checks, lots of 'data set' instead of dataset, one 'work flow'


Modified: papers/jss/article.Rnw
===================================================================
--- papers/jss/article.Rnw	2014-01-26 21:47:30 UTC (rev 854)
+++ papers/jss/article.Rnw	2014-01-26 22:29:03 UTC (rev 855)
@@ -136,9 +136,9 @@
 of decoupled components in order to better manage software complexity 
 through reusability, modularity, and fault isolation \citep{Wegiel:2010:CTT:1932682.1869479}.
 These pipelines are frequently built using different programming 
-languages for the different phases of data analysis -- collection,
+languages for the different phases of data analysis --- collection,
 cleaning, modeling, analysis, post-processing, and
-presentation -- in order to take advantage of the unique combination of
+presentation --- in order to take advantage of the unique combination of
 performance, speed of development, and library support offered by
 different environments and languages.  Each stage of such a data
 analysis pipeline may produce intermediate results that need to be
@@ -171,7 +171,7 @@
 complexity at the parsing side (which are somewhat mitigated by the
 availability of mature libraries and parsers). Because \texttt{XML} is 
 text-based and has no native notion of numeric types or arrays, it usually not a
-very practical format to store numeric datasets as they appear in statistical
+very practical format to store numeric data sets as they appear in statistical
 applications.
 
 
@@ -214,7 +214,7 @@
 Section~\ref{sec:types} describes the challenges of type coercion
 between \proglang{R} and other languages.  Section~\ref{sec:evaluation} introduces a
 general \proglang{R} language schema for serializing arbitrary \proglang{R} objects and evaluates
-it against the serialization capbilities built directly into \proglang{R}.  Sections~\ref{sec:mapreduce}
+it against the serialization capabilities built directly into \proglang{R}.  Sections~\ref{sec:mapreduce}
 and \ref{sec:opencpu} provide real-world use cases of \CRANpkg{RProtoBuf}
 in MapReduce and web service environments, respectively, before
 Section~\ref{sec:summary} concludes.
@@ -231,7 +231,7 @@
   applications as well as different computers or operating systems.
 \item \emph{Efficient}:  Data is serialized into a compact binary
   representation for transmission or storage.
-\item \emph{Extensible}:  New fields can be added to Protocol Buffer Schemas
+\item \emph{Extensible}:  New fields can be added to Protocol Buffer schemas
   in a forward-compatible way that does not break older applications.
 \item \emph{Stable}:  Protocol Buffers have been in wide use for over a
   decade.
@@ -246,7 +246,7 @@
 \end{figure}
 
 Figure~\ref{fig:protobuf-distributed-usecase} illustrates an example
-communication workflow with Protocol Buffers and an interactive \proglang{R} session.
+communication work flow with Protocol Buffers and an interactive \proglang{R} session.
 Common use cases include populating a request remote-procedure call (RPC)
 Protocol Buffer in \proglang{R} that is then serialized and sent over the network to a
 remote server.  The server would then deserialize the message, act on the
@@ -1097,7 +1097,7 @@
 \subsection[Evaluation: Converting R data sets]{Evaluation: Converting \proglang{R} data sets}
 
 To illustrate how this method works, we attempt to convert all of the built-in 
-datasets from \proglang{R} into this serialized Protocol Buffer representation.
+data sets from \proglang{R} into this serialized Protocol Buffer representation.
 
 <<echo=TRUE>>=
 datasets <- as.data.frame(data(package="datasets")$results)
@@ -1105,8 +1105,8 @@
 n <- nrow(datasets)
 @
 
-There are \Sexpr{n} standard data sets included in the base-r \pkg{datasets}
-package. These datasets include data frames, matrices, time series, tables lists,
+There are \Sexpr{n} standard data sets included in the \pkg{datasets}
+package included with \proglang{R}. These data sets include data frames, matrices, time series, tables lists,
 and some more exotic data classes. The \texttt{can\_serialize\_pb} method is 
 used to determine which of those can fully be converted to the \texttt{rexp.proto}
 Protocol Buffer representation. This method simply checks if any of the values or
@@ -1118,7 +1118,7 @@
 
 \Sexpr{m} data sets can be converted to Protocol Buffers
 without loss of information (\Sexpr{format(100*m/n,digits=1)}\%). Upon closer
-inspection, all other datasets are objects of class \texttt{nfnGroupedData}.
+inspection, all other data sets are objects of class \texttt{nfnGroupedData}.
 This class represents a special type of data frame that has some additional 
 attributes (such as a \emph{formula} object) used by the \pkg{nlme} package.
 Because formulas are \proglang{R} \emph{language} objects, they have little meaning to
@@ -1171,10 +1171,10 @@
                        check.names=FALSE)
 @
 
-Table~\ref{tab:compression} shows the sizes of 50 sample \proglang{R} datasets as
+Table~\ref{tab:compression} shows the sizes of 50 sample \proglang{R} data sets as
 returned by object.size() compared to the serialized sizes.
 %The summary compression sizes are listed below, and a full table for a
-%sample of 50 datasets is included on the next page.  
+%sample of 50 data sets is included on the next page.  
 Note that Protocol Buffer serialization results in slightly
 smaller byte streams compared to native \proglang{R} serialization in most cases,
 but this difference disappears if the results are compressed with gzip.
@@ -1260,7 +1260,7 @@
 \end{tabular}
 }
 \caption{Serialization sizes for default serialization in \proglang{R} and
-  \CRANpkg{RProtoBuf} for 50 \proglang{R} datasets.}
+  \CRANpkg{RProtoBuf} for 50 \proglang{R} data sets.}
 \label{tab:compression}
 \end{center}
 \end{table}
@@ -1430,7 +1430,7 @@
 \subsection[HTTP GET: Retrieving an R object]{HTTP GET: Retrieving an \proglang{R} object}
 
 The \texttt{HTTP GET} method is used to read a resource from OpenCPU. For example,
-to access the dataset \texttt{Animals} from the package \texttt{MASS}, a 
+to access the data set \texttt{Animals} from the package \texttt{MASS}, a 
 client performs the following HTTP request:
 
 \begin{verbatim}
@@ -1446,10 +1446,10 @@
 
 Because both HTTP and Protocol Buffers have libraries available for many 
 languages, clients can be implemented in just a few lines of code. Below
-is example code for both \proglang{R} and Python that retrieves a dataset from \proglang{R} with 
+is example code for both \proglang{R} and Python that retrieves a data set from \proglang{R} with 
 OpenCPU using a protobuf message. In \proglang{R}, we use the HTTP client from 
 the \texttt{httr} package \citep{httr}. In this example we
-download a dataset which is part of the base \proglang{R} distribution, so we can
+download a data set which is part of the base \proglang{R} distribution, so we can
 verify that the object was transferred without loss of information.
 
 <<eval=FALSE>>=
@@ -1469,7 +1469,7 @@
 well be done without Protocol Buffers. The main advantage of using an inter-operable format 
 is that we can actually access \proglang{R} objects from within another
 programming language. For example, in a very similar fashion we can retrieve the same
-dataset in a Python client. To parse messages in Python, we first compile the 
+data set in a Python client. To parse messages in Python, we first compile the 
 \texttt{rexp.proto} descriptor into a python module using the \texttt{protoc} compiler:
 
 \begin{verbatim}
@@ -1494,7 +1494,7 @@
 msg.ParseFromString(res.read())
 print(msg)
 \end{verbatim}
-The \texttt{msg} object contains all data from the Animals dataset. From here we
+The \texttt{msg} object contains all data from the Animals data set. From here we
 can easily extract the desired fields for further use in Python.
 
 
@@ -1565,7 +1565,7 @@
 %Protocol Buffers is itself not a protocol.
 %Forward-compatibility is one of the features. No need to re-iterate those 
 The Protocol Buffers standard and library offer a unique combination of features, 
-performance, and maturity, that seems particulary well suited for data-driven 
+performance, and maturity, that seems particularly well suited for data-driven 
 applications and numerical computing.
 
 The \CRANpkg{RProtoBuf} package builds on the Protocol Buffers \proglang{C++} library, 



More information about the Rprotobuf-commits mailing list