[Rprotobuf-commits] r828 - papers/jss
noreply at r-forge.r-project.org
noreply at r-forge.r-project.org
Thu Jan 23 01:46:41 CET 2014
Author: murray
Date: 2014-01-23 01:46:41 +0100 (Thu, 23 Jan 2014)
New Revision: 828
Modified:
papers/jss/article.Rnw
Log:
Add more \proglangs, we now have \proglang{R} at least 119 times in
this document, which might be a bit much.
Modified: papers/jss/article.Rnw
===================================================================
--- papers/jss/article.Rnw 2014-01-23 00:39:21 UTC (rev 827)
+++ papers/jss/article.Rnw 2014-01-23 00:46:41 UTC (rev 828)
@@ -191,7 +191,7 @@
A number of binary formats based on \texttt{JSON} have been proposed
that reduce the parsing cost and improve efficiency. \pkg{MessagePack}
-and \pkg{BSON} both have R
+and \pkg{BSON} both have \proglang{R}
interfaces \citep{msgpackR,rmongodb}, but these formats lack a separate schema for the serialized
data and thus still duplicate field names with each message sent over
the network or stored in a file. Such formats also lack support for
@@ -258,7 +258,7 @@
package. Section~\ref{sec:types} describes the challenges of type coercion
between \proglang{R} and other languages. Section~\ref{sec:evaluation} introduces a
general \proglang{R} language schema for serializing arbitrary \proglang{R} objects and evaluates
-it against the serialization capbilities built directly into R. Sections~\ref{sec:mapreduce}
+it against the serialization capbilities built directly into \proglang{R}. Sections~\ref{sec:mapreduce}
and \ref{sec:opencpu} provide real-world use cases of \CRANpkg{RProtoBuf}
in MapReduce and web service environments, respectively, before
Section~\ref{sec:summary} concludes.
@@ -312,9 +312,9 @@
Protocol Buffer in \proglang{R} that is then serialized and sent over the network to a
remote server. The server would then deserialize the message, act on the
request, and respond with a new Protocol Buffer over the network.
-The key difference to, say, a request to an Rserve instance is that
+The key difference to, say, a request to an \pkg{Rserve} instance is that
the remote server may be implemented in any language, with no
-dependence on R.
+dependence on \proglang{R}.
While traditional IDLs have at times been criticized for code bloat and
complexity, Protocol Buffers are based on a simple list and records
@@ -456,7 +456,7 @@
This section describes how to use the \proglang{R} API to create and manipulate
protocol buffer messages in \proglang{R}, and how to read and write the
binary representation of the message (often called the \emph{payload}) to files and arbitrary binary
-R connections.
+\proglang{R} connections.
The two fundamental building blocks of Protocol Buffers are \emph{Messages}
and \emph{Descriptors}. Messages provide a common abstract encapsulation of
structured data fields of the type specified in a Message Descriptor.
@@ -479,16 +479,6 @@
%languages. The definition
-
-%This section may contain a figure such as Figure~\ref{figure:rlogo}.
-%
-%\begin{figure}[htbp]
-% \centering
-% \includegraphics{Rlogo}
-% \caption{The logo of R.}
-% \label{figure:rlogo}
-%\end{figure}
-
\subsection[Importing Message Descriptors from .proto files]{Importing Message Descriptors from \texttt{.proto} files}
%The three basic abstractions of \CRANpkg{RProtoBuf} are Messages,
@@ -562,7 +552,7 @@
\subsection{Access and modify fields of a message}
Once the message is created, its fields can be queried
-and modified using the dollar operator of R, making protocol
+and modified using the dollar operator of \proglang{R}, making protocol
buffer messages seem like lists.
<<>>=
@@ -712,7 +702,7 @@
generic in the S3 sense, such as \texttt{new} and
\texttt{serialize}.
Table~\ref{class-summary-table} lists the six
-primary Message and Descriptor classes in RProtoBuf. Each \proglang{R} object
+primary Message and Descriptor classes in \CRANpkg{RProtoBuf}. Each \proglang{R} object
contains an external pointer to an object managed by the
\texttt{protobuf} \proglang{C++} library, and the \proglang{R} objects make calls into more
than 100 \proglang{C++} functions that provide the
@@ -765,7 +755,7 @@
functions with these S4 classes:
\begin{itemize}
\item The functional dispatch mechanism of the the form
- \verb|method(object, arguments)| (common to R), and
+ \verb|method(object, arguments)| (common to \proglang{R}), and
\item The traditional object oriented notation
\verb|object$method(arguments)|.
\end{itemize}
@@ -905,7 +895,7 @@
\label{subsec-field-descriptor}
The class \emph{FieldDescriptor} represents field
-descriptors in R. This is a wrapper S4 class around the
+descriptors in \proglang{R}. This is a wrapper S4 class around the
\texttt{google::protobuf::FieldDescriptor} \proglang{C++} class.
Table~\ref{fielddescriptor-methods-table} describes the methods
defined for the \texttt{FieldDescriptor} class.
@@ -956,7 +946,7 @@
\subsection{Enum Descriptors}
\label{subsec-enum-descriptor}
-The class \emph{EnumDescriptor} represents enum descriptors in R.
+The class \emph{EnumDescriptor} represents enum descriptors in \proglang{R}.
This is a wrapper S4 class around the
\texttt{google::protobuf::EnumDescriptor} \proglang{C++} class.
Table~\ref{enumdescriptor-methods-table} describes the methods
@@ -1007,7 +997,7 @@
\subsection{File Descriptors}
\label{subsec-file-descriptor}
-The class \emph{FileDescriptor} represents file descriptors in R.
+The class \emph{FileDescriptor} represents file descriptors in \proglang{R}.
This is a wrapper S4 class around the
\texttt{google::protobuf::FileDescriptor} \proglang{C++} class.
Table~\ref{filedescriptor-methods-table} describes the methods
@@ -1052,7 +1042,7 @@
\label{subsec-enumvalue-descriptor}
The class \emph{EnumValueDescriptor} represents enumeration value
-descriptors in R. This is a wrapper S4 class around the
+descriptors in \proglang{R}. This is a wrapper S4 class around the
\texttt{google::protobuf::EnumValueDescriptor} \proglang{C++} class.
Table~\ref{EnumValueDescriptor-methods-table} describes the methods
defined for the \texttt{EnumValueDescriptor} class.
@@ -1141,7 +1131,7 @@
\subsection{Booleans}
-R booleans can accept three values: \texttt{TRUE}, \texttt{FALSE}, and
+\proglang{R} booleans can accept three values: \texttt{TRUE}, \texttt{FALSE}, and
\texttt{NA}. However, most other languages, including the Protocol
Buffer schema, only accept \texttt{TRUE} or \texttt{FALSE}. This means
that we simply can not store \proglang{R} logical vectors that include all three
@@ -1175,9 +1165,9 @@
\subsection{Unsigned Integers}
-R lacks a native unsigned integer type. Values between $2^{31}$ and
+\proglang{R} lacks a native unsigned integer type. Values between $2^{31}$ and
$2^{32} - 1$ read from unsigned into Protocol Buffer fields must be
-stored as doubles in R.
+stored as doubles in \proglang{R}.
<<>>=
as.integer(2^31-1)
@@ -1189,7 +1179,7 @@
\subsection{64-bit integers}
\label{sec:int64}
-R also does not support the native 64-bit integer type. Numeric vectors
+\proglang{R} also does not support the native 64-bit integer type. Numeric vectors
with values $\geq 2^{31}$ can only be stored as doubles, which have
limited precision. Thereby \proglang{R} loses the ability to distinguish some
distinct integers:
@@ -1199,9 +1189,9 @@
@
However, most modern languages do have support for 64-bit integers,
-which becomes problematic when \pkg{RProtoBuf} is used to exchange data
+which becomes problematic when \CRANpkg{RProtoBuf} is used to exchange data
with a system that requires this integer type. To work around this,
-RProtoBuf allows users to get and set 64-bit integer values by specifying
+\CRANpkg{RProtoBuf} allows users to get and set 64-bit integer values by specifying
them as character strings.
If we try to set an int64 field in \proglang{R} to double values, we lose
@@ -1213,7 +1203,7 @@
length(unique(test$repeated_int64))
@
-But when the values are specified as character strings, RProtoBuf
+But when the values are specified as character strings, \CRANpkg{RProtoBuf}
will automatically coerce them into a true 64-bit integer types
before storing them in the Protocol Buffer message:
@@ -1221,13 +1211,13 @@
test$repeated_int64 <- c("9007199254740992", "9007199254740993")
@
-When reading the value back into R, numeric types are returned by
+When reading the value back into \proglang{R}, numeric types are returned by
default, but when the full precision is required a character value
will be returned if the \texttt{RProtoBuf.int64AsString} option is set
to \texttt{TRUE}. The character values are useful because they can
-accurately be used as unique identifiers and can easily be passed to R
+accurately be used as unique identifiers and can easily be passed to \proglang{R}
packages such as \CRANpkg{int64} \citep{int64} or \CRANpkg{bit64}
-\citep{bit64} which represent 64-bit integers in R.
+\citep{bit64} which represent 64-bit integers in \proglang{R}.
<<>>=
options("RProtoBuf.int64AsString" = FALSE)
@@ -1250,7 +1240,7 @@
messages of a defined schema. This is useful when there are
pre-existing systems with defined schemas or significant software
components written in other languages that need to be accessed from
-within R.
+within \proglang{R}.
The package also provides methods for converting arbitrary \proglang{R} data structures into protocol
buffers and vice versa with a universal \proglang{R} object schema. The \texttt{serialize\_pb} and \texttt{unserialize\_pb}
@@ -1275,10 +1265,10 @@
The \texttt{rexp.proto} schema supports all main \proglang{R} storage types holding \emph{data}.
These include \texttt{NULL}, \texttt{list} and vectors of type \texttt{logical},
\texttt{character}, \texttt{double}, \texttt{integer} and \texttt{complex}. In addition,
-every type can contain a named set of attributes, as is the case in R. The \texttt{rexp.proto}
+every type can contain a named set of attributes, as is the case in \proglang{R}. The \texttt{rexp.proto}
schema does not support some of the special \proglang{R} specific storage types, such as \texttt{function},
\texttt{language} or \texttt{environment}. Such objects have no native equivalent
-type in Protocol Buffers, and have little meaning outside the context of R.
+type in Protocol Buffers, and have little meaning outside the context of \proglang{R}.
When serializing \proglang{R} objects using \texttt{serialize\_pb}, values or attributes of
unsupported types are skipped with a warning. If the user really wishes to serialize these
objects, they need to be converted into a supported type. For example, the can use
@@ -1367,12 +1357,12 @@
%The summary compression sizes are listed below, and a full table for a
%sample of 50 datasets is included on the next page.
Sizes are comparable but Protocol Buffers provide simple getters and setters
-in multiple languages instead of requiring other programs to parse the R
+in multiple languages instead of requiring other programs to parse the \proglang{R}
serialization format. % \citep{serialization}.
One takeaway from this table is that the universal \proglang{R} object schema
included in \pkg{RProtoBuf} does not in general provide
any significant saving in file size compared to the normal serialization
-mechanism in R.
+mechanism in \proglang{R}.
% redundant: which is seen as equally compact.
The benefits of \pkg{RProtoBuf} accrue more naturally in applications where
multiple programming languages are involved, or when a more concise
@@ -1389,7 +1379,7 @@
\scalebox{0.9}{
\begin{tabular}{lrrrrr}
\toprule
- Data Set & object.size & \multicolumn{2}{c}{R Serialization} &
+ Data Set & object.size & \multicolumn{2}{c}{\proglang{R} Serialization} &
\multicolumn{2}{c}{RProtoBuf Serial.} \\
& & default & gzipped & default & gzipped \\
\cmidrule(r){2-6}
@@ -1513,10 +1503,10 @@
\end{example}
This HistogramState message type is designed to be helpful if some of
-the Map or Reduce tasks are written in R, or if those components are
+the Map or Reduce tasks are written in \proglang{R}, or if those components are
written in other languages and only the resulting output histograms
-need to be manipulated in R. For example, to create HistogramState
-messages in Python for later consumption by R, we first compile the
+need to be manipulated in \proglang{R}. For example, to create HistogramState
+messages in Python for later consumption by \proglang{R}, we first compile the
\texttt{histogram.proto} descriptor into a python module using the
\texttt{protoc} compiler:
@@ -1547,7 +1537,7 @@
\end{Code}
The protocol buffer can then be read into \proglang{R} and converted to a native
-R histogram object for plotting:
+\proglang{R} histogram object for plotting:
\begin{Code}
library(RProtoBuf)
@@ -1638,7 +1628,7 @@
Because both HTTP and Protocol Buffers have libraries available for many
languages, clients can be implemented in just a few lines of code. Below
is example code for both \proglang{R} and Python that retrieves a dataset from \proglang{R} with
-OpenCPU using a protobuf message. In R, we use the HTTP client from
+OpenCPU using a protobuf message. In \proglang{R}, we use the HTTP client from
the \texttt{httr} package \citep{httr}. In this example we
download a dataset which is part of the base \proglang{R} distribution, so we can
verify that the object was transferred without loss of information.
@@ -1712,7 +1702,7 @@
\texttt{stats::rnorm(n=42, mean=100)}. The function arguments (in this
case \texttt{n} and \texttt{mean}) as well as the return value (a vector
with 42 random numbers) are transferred using a protobuf message. RPC in
-OpenCPU works like the \texttt{do.call} function in R, hence all arguments
+OpenCPU works like the \texttt{do.call} function in \proglang{R}, hence all arguments
are contained within a list.
<<eval=FALSE>>=
@@ -1818,7 +1808,7 @@
other languages.
The \pkg{RProtoBuf} package provides users with the ability to generate,
-parse and manipulate Protocol Buffer messages in R. It is our hope that this
+parse and manipulate Protocol Buffer messages in \proglang{R}. It is our hope that this
package will make Protocol Buffers more accessible to the \proglang{R} community, and
thereby makes a small contribution towards better integration between \proglang{R} and
other software systems and applications.
More information about the Rprotobuf-commits
mailing list