[Rprotobuf-commits] r908 - papers/jss
noreply at r-forge.r-project.org
noreply at r-forge.r-project.org
Tue Nov 25 03:59:01 CET 2014
Author: murray
Date: 2014-11-25 03:59:00 +0100 (Tue, 25 Nov 2014)
New Revision: 908
Modified:
papers/jss/article.Rnw
Log:
Improve example in section 7 using some of the specific advantages
suggested by referee #2 and point out why we've given the user a
simplified example and how it differs from the real MapReduce context
where this would be more useful.
Modified: papers/jss/article.Rnw
===================================================================
--- papers/jss/article.Rnw 2014-11-25 02:39:23 UTC (rev 907)
+++ papers/jss/article.Rnw 2014-11-25 02:59:00 UTC (rev 908)
@@ -1142,7 +1142,11 @@
This HistogramState message type is designed to be helpful if some of
the Map or Reduce tasks are written in \proglang{R}, or if those components are
written in other languages and only the resulting output histograms
-need to be manipulated in \proglang{R}. For example, to create HistogramState
+need to be manipulated in \proglang{R}.
+
+\subsection*{A trivial single-machine example for Python to R serialization}
+
+To create HistogramState
messages in Python for later consumption by \proglang{R}, we first compile the
\code{histogram.proto} descriptor into a python module using the
\code{protoc} compiler:
@@ -1205,7 +1209,18 @@
@
\end{center}
-One of the authors has used this design pattern with large-scale \proglang{C++}
+This simple example uses a constant histogram generated in
+\proglang{Python} to illustrate the serialization concepts without
+requiring the reader to be familiar with the interface of any
+particular MapReduce implementation. In practice, using Protocol
+Buffers to pass histograms between another programming language and R
+would provide a much greater benefit in a distributed context.
+For example, a first-class data type to represent histograms would
+prevent individual histograms from being split up and would allow the
+use of combiners on Map workers to process large data sets more
+efficiently than simply passing around lists of counts and buckets.
+
+One of the authors has used this design pattern with \proglang{C++}
MapReduces over very large data sets to write out histogram protocol
buffers for several large-scale studies of distributed storage systems
\citep{sciencecloud,janus}.
More information about the Rprotobuf-commits
mailing list