[Rprotobuf-commits] r627 - papers/rjournal

noreply at r-forge.r-project.org noreply at r-forge.r-project.org
Mon Dec 30 00:17:39 CET 2013


Author: edd
Date: 2013-12-30 00:17:34 +0100 (Mon, 30 Dec 2013)
New Revision: 627

Modified:
   papers/rjournal/eddelbuettel-francois-stokely.Rnw
Log:
some edits for typos etc


Modified: papers/rjournal/eddelbuettel-francois-stokely.Rnw
===================================================================
--- papers/rjournal/eddelbuettel-francois-stokely.Rnw	2013-12-28 19:08:34 UTC (rev 626)
+++ papers/rjournal/eddelbuettel-francois-stokely.Rnw	2013-12-29 23:17:34 UTC (rev 627)
@@ -11,6 +11,11 @@
 \title{RProtoBuf: Efficient Cross-Language Data Serialization in R}
 \author{by Dirk Eddelbuettel, Romain Fran\c{c}ois, and Murray Stokely}
 
+%% DE: I tend to have wider option(width=...) so this
+%%     guarantees better line breaks
+<<echo=FALSE,print=FALSE>>=
+options(width=65, prompt="R> ", digits=4)
+@
 
 \maketitle
 
@@ -34,11 +39,11 @@
 isolation \citep{Wegiel:2010:CTT:1932682.1869479}.  Different
 programming languages are often used for the different phases of data
 analysis -- collection, cleaning, analysis, post-processing, and
-presentation in order to take advantage of the unique combination of 
+presentation in order to take advantage of the unique combination of
 performance, speed of development, and library support offered by
 different environments.  Each stage of the data
 analysis pipeline may involve storing intermediate results in a
-file or sending them over the network.  Programming langauges such as
+file or sending them over the network.  Programming languages such as
 Java, Ruby, Python, and R include built-in serialization support, but
 these formats are tied to the specific programming language in use.
 % TODO(ms): and they often don't support versioning among other faults.
@@ -46,8 +51,8 @@
 often used for exporting tabular data.  However, CSV files have a
 number of disadvantages, such as a limitation of exporting only
 tabular datasets, lack of type-safety, inefficient text representation
-and parsing, and abiguities in the format involving special
-characters.  JSON is another widely supported format used mostly on
+and parsing, and ambiguities in the format involving special
+characters.  JSON is another widely-supported format used mostly on
 the web that removes many of these disadvantages, but it too suffers
 from being too slow to parse and also does not provide strong typing
 between integers and floating point.  Large numbers of JSON messages
@@ -95,6 +100,8 @@
 Introductory section which may include references in parentheses
 \citep{R}, or cite a reference such as \citet{R} in the text.
 
+%% TODO(de,ms)  What follows is oooooold and was lifted from the webpage
+%%              Rewrite?
 Protocol buffers are a language-neutral, platform-neutral, extensible
 way of serializing structured data for use in communications
 protocols, data storage, and more.
@@ -161,6 +168,7 @@
 %from a variety of data streams using a variety of different
 %languages.  The definition
 
+%% TODO(de) Can we make this not break the width of the page?
 \noindent
 \begin{table}
 \begin{tabular}{@{\hskip .01\textwidth}p{.40\textwidth}@{\hskip .015\textwidth}|@{\hskip .015\textwidth}p{0.55\textwidth}@{\hskip .01\textwidth}}
@@ -196,7 +204,7 @@
 person$name <- "Romain"
 cat(as.character(person))
 serialize(person, NULL)
-@ 
+@
 \end{minipage} \\
 \hline
 \end{tabular}
@@ -228,7 +236,7 @@
 
 New \texttt{.proto} files are imported with the \code{readProtoFiles}
 function, which can import a single file, all files in a directory, or
-all \texttt{.proto} files provided by another R package. 
+all \texttt{.proto} files provided by another R package.
 
 The \texttt{.proto} file syntax for defining the structure of protocol
 buffer data is described comprehensively on Google Code:
@@ -424,7 +432,7 @@
 
 Each R object stores an external pointer to an object managed by
 the \texttt{protobuf} C++ library.
-The \CRANpkg{Rcpp} \citep{eddelbuettel2011rcpp} package is used to
+The \CRANpkg{Rcpp} package \citep{eddelbuettel2011rcpp} is used to
 facilitate the integration of the R and C++ code for these objects.
 
 % Message, Descriptor, FieldDescriptor, EnumDescriptor,
@@ -768,7 +776,7 @@
 tutorial.Person$PhoneType$value(1)
 tutorial.Person$PhoneType$value(name="HOME")
 tutorial.Person$PhoneType$value(number=1)
-@ 
+@
 
 \begin{table}[h]
 \centering
@@ -879,7 +887,7 @@
 a$optional_bool <- NA
 <<echo=FALSE,eval=TRUE,print=TRUE>>=
 try(a$optional_bool <- NA,silent=TRUE)
-@ 
+@
 
 \subsection{64-bit integers}
 \label{sec:int64}
@@ -940,7 +948,7 @@
 
 <<echo=FALSE,print=FALSE>>=
 options("RProtoBuf.int64AsString" = FALSE)
-@ 
+@
 
 
 \section{Evaluation: data.frame to Protocol Buffer Serialization}
@@ -975,8 +983,8 @@
 be safely converted to a serialized protocol buffer representation.
 
 <<echo=TRUE>>=
-datasets$valid.proto <- sapply(datasets$load.name, function(x) can_serialize_pb(eval(as.name(x))))
-datasets <- subset(datasets, valid.proto==TRUE)
+#datasets$valid.proto <- sapply(datasets$load.name, function(x) can_serialize_pb(eval(as.name(x))))
+#datasets <- subset(datasets, valid.proto==TRUE)
 m <- nrow(datasets)
 @
 
@@ -1039,56 +1047,56 @@
 \multicolumn{2}{c}{RProtoBuf Serialization} \\
  & & Default & gzipped & Default & gzipped \\
   \hline
-uspop & 584.00 & 268 & 172 & 211 & 148 \\ 
-  Titanic & 1960.00 & 633 & 257 & 481 & 249 \\ 
-  volcano & 42656.00 & 42517 & 5226 & 42476 & 4232 \\ 
-  euro.cross & 2728.00 & 1319 & 910 & 1207 & 891 \\ 
-  attenu & 14568.00 & 8234 & 2165 & 7771 & 2336 \\ 
-  ToothGrowth & 2568.00 & 1486 & 349 & 1239 & 391 \\ 
-  lynx & 1344.00 & 1028 & 429 & 971 & 404 \\ 
-  nottem & 2352.00 & 2036 & 627 & 1979 & 641 \\ 
-  sleep & 2752.00 & 746 & 282 & 483 & 260 \\ 
-  co2 & 4176.00 & 3860 & 1473 & 3803 & 1453 \\ 
-  austres & 1144.00 & 828 & 439 & 771 & 410 \\ 
-  ability.cov & 1944.00 & 716 & 357 & 589 & 341 \\ 
-  EuStockMarkets & 60664.00 & 59785 & 21232 & 59674 & 19882 \\ 
-  treering & 64272.00 & 63956 & 17647 & 63900 & 17758 \\ 
-  freeny.x & 1944.00 & 1445 & 1311 & 1372 & 1289 \\ 
-  Puromycin & 2088.00 & 813 & 306 & 620 & 320 \\ 
-  warpbreaks & 2768.00 & 1231 & 310 & 811 & 343 \\ 
-  BOD & 1088.00 & 334 & 182 & 226 & 168 \\ 
-  sunspots & 22992.00 & 22676 & 6482 & 22620 & 6742 \\ 
-  beaver2 & 4184.00 & 3423 & 751 & 3468 & 840 \\ 
-  anscombe & 2424.00 & 991 & 375 & 884 & 352 \\ 
-  esoph & 5624.00 & 3111 & 548 & 2240 & 665 \\ 
-  PlantGrowth & 1680.00 & 646 & 303 & 459 & 314 \\ 
-  infert & 15848.00 & 14328 & 1172 & 13197 & 1404 \\ 
-  BJsales & 1632.00 & 1316 & 496 & 1259 & 465 \\ 
-  stackloss & 1688.00 & 917 & 293 & 844 & 283 \\ 
-  crimtab & 7936.00 & 4641 & 713 & 1655 & 576 \\ 
-  LifeCycleSavings & 6048.00 & 3014 & 1420 & 2825 & 1407 \\ 
-  Harman74.cor & 9144.00 & 6056 & 2045 & 5861 & 2070 \\ 
-  nhtemp & 912.00 & 596 & 240 & 539 & 223 \\ 
-  faithful & 5136.00 & 4543 & 1339 & 4936 & 1776 \\ 
-  freeny & 5296.00 & 2465 & 1518 & 2271 & 1507 \\ 
-  discoveries & 1232.00 & 916 & 199 & 859 & 180 \\ 
-  state.x77 & 7168.00 & 4251 & 1754 & 4068 & 1756 \\ 
-  pressure & 1096.00 & 498 & 277 & 427 & 273 \\ 
-  fdeaths & 1008.00 & 692 & 291 & 635 & 272 \\ 
-  euro & 976.00 & 264 & 186 & 202 & 161 \\ 
-  LakeHuron & 1216.00 & 900 & 420 & 843 & 404 \\ 
-  mtcars & 6736.00 & 3798 & 1204 & 3633 & 1206 \\ 
-  precip & 4992.00 & 1793 & 813 & 1615 & 815 \\ 
-  state.area & 440.00 & 422 & 246 & 405 & 235 \\ 
-  attitude & 3024.00 & 1990 & 544 & 1920 & 561 \\ 
-  randu & 10496.00 & 9794 & 8859 & 10441 & 9558 \\ 
-  state.name & 3088.00 & 844 & 408 & 724 & 415 \\ 
-  airquality & 5496.00 & 4551 & 1241 & 2874 & 1294 \\ 
-  airmiles & 624.00 & 308 & 170 & 251 & 148 \\ 
-  quakes & 33112.00 & 32246 & 9898 & 29063 & 11595 \\ 
-  islands & 3496.00 & 1232 & 563 & 1098 & 561 \\ 
-  OrchardSprays & 3600.00 & 2164 & 445 & 1897 & 483 \\ 
-  WWWusage & 1232.00 & 916 & 274 & 859 & 251 \\ 
+uspop & 584.00 & 268 & 172 & 211 & 148 \\
+  Titanic & 1960.00 & 633 & 257 & 481 & 249 \\
+  volcano & 42656.00 & 42517 & 5226 & 42476 & 4232 \\
+  euro.cross & 2728.00 & 1319 & 910 & 1207 & 891 \\
+  attenu & 14568.00 & 8234 & 2165 & 7771 & 2336 \\
+  ToothGrowth & 2568.00 & 1486 & 349 & 1239 & 391 \\
+  lynx & 1344.00 & 1028 & 429 & 971 & 404 \\
+  nottem & 2352.00 & 2036 & 627 & 1979 & 641 \\
+  sleep & 2752.00 & 746 & 282 & 483 & 260 \\
+  co2 & 4176.00 & 3860 & 1473 & 3803 & 1453 \\
+  austres & 1144.00 & 828 & 439 & 771 & 410 \\
+  ability.cov & 1944.00 & 716 & 357 & 589 & 341 \\
+  EuStockMarkets & 60664.00 & 59785 & 21232 & 59674 & 19882 \\
+  treering & 64272.00 & 63956 & 17647 & 63900 & 17758 \\
+  freeny.x & 1944.00 & 1445 & 1311 & 1372 & 1289 \\
+  Puromycin & 2088.00 & 813 & 306 & 620 & 320 \\
+  warpbreaks & 2768.00 & 1231 & 310 & 811 & 343 \\
+  BOD & 1088.00 & 334 & 182 & 226 & 168 \\
+  sunspots & 22992.00 & 22676 & 6482 & 22620 & 6742 \\
+  beaver2 & 4184.00 & 3423 & 751 & 3468 & 840 \\
+  anscombe & 2424.00 & 991 & 375 & 884 & 352 \\
+  esoph & 5624.00 & 3111 & 548 & 2240 & 665 \\
+  PlantGrowth & 1680.00 & 646 & 303 & 459 & 314 \\
+  infert & 15848.00 & 14328 & 1172 & 13197 & 1404 \\
+  BJsales & 1632.00 & 1316 & 496 & 1259 & 465 \\
+  stackloss & 1688.00 & 917 & 293 & 844 & 283 \\
+  crimtab & 7936.00 & 4641 & 713 & 1655 & 576 \\
+  LifeCycleSavings & 6048.00 & 3014 & 1420 & 2825 & 1407 \\
+  Harman74.cor & 9144.00 & 6056 & 2045 & 5861 & 2070 \\
+  nhtemp & 912.00 & 596 & 240 & 539 & 223 \\
+  faithful & 5136.00 & 4543 & 1339 & 4936 & 1776 \\
+  freeny & 5296.00 & 2465 & 1518 & 2271 & 1507 \\
+  discoveries & 1232.00 & 916 & 199 & 859 & 180 \\
+  state.x77 & 7168.00 & 4251 & 1754 & 4068 & 1756 \\
+  pressure & 1096.00 & 498 & 277 & 427 & 273 \\
+  fdeaths & 1008.00 & 692 & 291 & 635 & 272 \\
+  euro & 976.00 & 264 & 186 & 202 & 161 \\
+  LakeHuron & 1216.00 & 900 & 420 & 843 & 404 \\
+  mtcars & 6736.00 & 3798 & 1204 & 3633 & 1206 \\
+  precip & 4992.00 & 1793 & 813 & 1615 & 815 \\
+  state.area & 440.00 & 422 & 246 & 405 & 235 \\
+  attitude & 3024.00 & 1990 & 544 & 1920 & 561 \\
+  randu & 10496.00 & 9794 & 8859 & 10441 & 9558 \\
+  state.name & 3088.00 & 844 & 408 & 724 & 415 \\
+  airquality & 5496.00 & 4551 & 1241 & 2874 & 1294 \\
+  airmiles & 624.00 & 308 & 170 & 251 & 148 \\
+  quakes & 33112.00 & 32246 & 9898 & 29063 & 11595 \\
+  islands & 3496.00 & 1232 & 563 & 1098 & 561 \\
+  OrchardSprays & 3600.00 & 2164 & 445 & 1897 & 483 \\
+  WWWusage & 1232.00 & 916 & 274 & 859 & 251 \\
    \hline
 \end{tabular}
 }



More information about the Rprotobuf-commits mailing list