[Rprotobuf-commits] r526 - in pkg: . vignettes/RProtoBuf
noreply at r-forge.r-project.org
noreply at r-forge.r-project.org
Wed Sep 4 02:30:14 CEST 2013
Author: murray
Date: 2013-09-04 02:30:14 +0200 (Wed, 04 Sep 2013)
New Revision: 526
Modified:
pkg/ChangeLog
pkg/vignettes/RProtoBuf/RProtoBuf.Rnw
Log:
Add a new section on 64-bit issues, document the
RProtoBuf.int64AsString option, break out the 'other approaches'
section from the 'future work', and fix a few typos.
Modified: pkg/ChangeLog
===================================================================
--- pkg/ChangeLog 2013-09-03 23:03:15 UTC (rev 525)
+++ pkg/ChangeLog 2013-09-04 00:30:14 UTC (rev 526)
@@ -1,3 +1,10 @@
+2013-09-03 Murray Stokely <murray at FreeBSD.org>
+
+ * vignettes/RProtoBuf/RProtoBuf.Rnw: Add a new section on 64-bit
+ issues, document the RProtoBuf.int64AsString option, break out the
+ 'other approaches' section from the 'future work', and fix a few
+ typos.
+
2013-08-30 Dirk Eddelbuettel <edd at debian.org>
* NAMESPACE: Import 'file_path_as_absolute' from package tools, and
Modified: pkg/vignettes/RProtoBuf/RProtoBuf.Rnw
===================================================================
--- pkg/vignettes/RProtoBuf/RProtoBuf.Rnw 2013-09-03 23:03:15 UTC (rev 525)
+++ pkg/vignettes/RProtoBuf/RProtoBuf.Rnw 2013-09-04 00:30:14 UTC (rev 526)
@@ -157,7 +157,7 @@
\subsection{Access and modify fields of a message}
-Once the message created, its fields can be quiered
+Once the message is created, its fields can be queried
and modified using the dollar operator of R, making protocol
buffer messages seem like lists.
@@ -179,6 +179,10 @@
p[[ "email" ]]
@
+Protocol buffers include a 64-bit integer type, but R lacks native
+64-bit integer support. A workaround is available and described in
+Section~\ref{sec:int64} for working with large integer values.
+
% TODO(mstokely): Document extensions here.
% There are none in addressbook.proto though.
@@ -1259,15 +1263,87 @@
implemented by the \texttt{RProtoBuf} package by calling an internal
method of the \texttt{protobuf} C++ library.
-\section{Plans for future releases}
+\section{64-bit integer issues}
+\label{sec:int64}
+R does not have native 64-bit integer support. Instead, R treats
+large integers as doubles which have limited precision. For example,
+it loses the ability to distinguish some distinct integers:
+
+<<>>=
+2^53 == (2^53 + 1)
+@
+
+Protocol Buffers are frequently used to pass data between different
+systems, however, and most other systems these days have support for
+64-bit integers. To work around this, RProtoBuf allows users to get
+and set 64-bit integer types by treating them as characters.
+
+<<echo=FALSE,print=FALSE>>=
+if (!exists("protobuf_unittest.TestAllTypes",
+ "RProtoBuf:DescriptorPool")) {
+ unittest.proto.file <- system.file("unitTests", "data",
+ "unittest.proto",
+ package="RProtoBuf")
+ readProtoFiles(file=unittest.proto.file)
+}
+@
+
+If we try to set an int64 field in R to double values, we lose
+precision:
+
+<<>>=
+test <- new(protobuf_unittest.TestAllTypes)
+test$repeated_int64 <- c(2^53, 2^53+1)
+length(unique(test$repeated_int64))
+@
+
+However, we can specify the values as character strings so that the
+C++ library on which RProtoBuf is based can store a true 64-bit
+integer representation of the data.
+
+<<>>=
+test$repeated_int64 <- c("9007199254740992", "9007199254740993")
+@
+
+When reading the value back into R, numeric types are returned by
+default, but when the full precision is required a character value
+will be returned if the \texttt{RProtoBuf.int64AsString} option is set
+to \texttt{TRUE}.
+
+<<>>=
+options("RProtoBuf.int64AsString" = FALSE)
+test$repeated_int64
+length(unique(test$repeated_int64))
+options("RProtoBuf.int64AsString" = TRUE)
+test$repeated_int64
+length(unique(test$repeated_int64))
+@
+
+<<echo=FALSE,print=FALSE>>=
+options("RProtoBuf.int64AsString" = FALSE)
+@
+
+\section{Other approaches}
+
Saptarshi Guha wrote another package that deals with integration
of protocol buffer messages with R, taking a different angle :
serializing any R object as a message, based on a single catch-all
-\texttt{proto} file. We plan to integrate this functionality into
-\texttt{RProtoBuf}. Saptarshi's package is available at
-\url{http://ml.stat.purdue.edu/rhipe/doc/html/ProtoBuffers.html}
+\texttt{proto} file. Saptarshi's package is available at
+\url{http://ml.stat.purdue.edu/rhipe/doc/html/ProtoBuffers.html}.
+Jeroen Ooms took a similar approach influenced by Saptarshi in his
+\texttt{RProtoBufUtils} package. Unlike Saptarshi's package,
+RProtoBufUtils depends on RProtoBuf for underlying message operations.
+This package is available at
+\url{https://github.com/jeroenooms/RProtoBufUtils}.
+
+% Phillip Yelland wrote another implementation, currently proprietary,
+% that has significant speed advantages when querying fields from a
+% large number of protocol buffers, but is less user friendly for the
+% basic cases documented here.
+
+\section{Plans for future releases}
Protocol buffers have a mechanism for remote procedure calls (rpc)
that is not yet used by \texttt{RProtoBuf}, but we may one day
take advantage of this by writing a protocol buffer message R server,
@@ -1293,4 +1369,3 @@
\end{document}
-
More information about the Rprotobuf-commits
mailing list