[Rprotobuf-commits] r553 - in papers: . rjournal
noreply at r-forge.r-project.org
noreply at r-forge.r-project.org
Mon Dec 16 17:26:21 CET 2013
Author: murray
Date: 2013-12-16 17:26:21 +0100 (Mon, 16 Dec 2013)
New Revision: 553
Add boilerplate for an R Journal article. Hadley's ggmap one from the
last issue was nearly 20 pages, so I think we'll be fine length wise.
Added: papers/rjournal/Makefile
--- papers/rjournal/Makefile (rev 0)
+++ papers/rjournal/Makefile 2013-12-16 16:26:21 UTC (rev 553)
@@ -0,0 +1,15 @@
+all: clean RJwrapper.pdf
+ rm -fr RJwrapper.pdf
+ rm -fr RJwrapper.out
+ rm -fr RJwrapper.aux
+ rm -fr RJwrapper.log
+ rm -fr RJwrapper.bbl
+ rm -fr RJwrapper.blg
+RJwrapper.pdf: RJwrapper.tex eddelbuettel-francois-stokely.tex RJournal.sty
+ pdflatex RJwrapper.tex
+ bibtex RJwrapper
+ pdflatex RJwrapper.tex
+ pdflatex RJwrapper.tex
Added: papers/rjournal/RJournal.sty
--- papers/rjournal/RJournal.sty (rev 0)
+++ papers/rjournal/RJournal.sty 2013-12-16 16:26:21 UTC (rev 553)
@@ -0,0 +1,335 @@
+% Package `RJournal' to use with LaTeX2e
+% Copyright (C) 2010 by the R Foundation
+% Copyright (C) 2013 by the R Journal
+% Originally written by Kurt Hornik and Friedrich Leisch with subsequent
+% edits by the editorial board
+\ProvidesPackage{RJournal}[2013/08/27 v0.13 RJournal package]
+% Overall page layout, fonts etc -----------------------------------------------
+% Issues of of \emph{The R Journal} are created from the standard \LaTeX{}
+% document class \pkg{report}.
+ textwidth=14cm, top=1cm, bottom=1cm,
+ includehead,includefoot,centering,
+ footskip=1.5cm}
+\fancyhead[L]{\textsc{\RJ at sectionhead}}
+\fancyfoot[L]{The R Journal Vol. \RJ at volume/\RJ at number, \RJ at month}
+\fancyfoot[R]{ISSN 2073-4859}
+% We use the following fonts (all with T1 encoding):
+% rm & palatino
+% tt & inconsolata
+% sf & helvetica
+% math & palatino
+% Dark blue colour for all links
+ colorlinks,%
+ citecolor=link,%
+ filecolor=link,%
+ linkcolor=link,%
+ urlcolor=link
+% Give the text a little room to breath
+% Issue and article metadata ---------------------------------------------------
+% Basic front matter information about the issue: volume, number, and
+% date.
+\newcommand{\volume}[1]{\def\RJ at volume{#1}}
+\newcommand{\volnumber}[1]{\def\RJ at number{#1}}
+\renewcommand{\month}[1]{\def\RJ at month{#1}}
+\renewcommand{\year}[1]{\def\RJ at year{#1}}
+% Individual articles correspond to
+% chapters, and are contained in |article| environments. This makes it
+% easy to have figures counted within articles and hence hyperlinked
+% correctly.
+% An article has an author, a title, and optionally a subtitle. We use
+% the obvious commands for specifying these. Articles will be put in certain
+% journal sections, named by \sectionhead.
+\newcommand {\sectionhead} [1]{\def\RJ at sectionhead{#1}}
+\renewcommand{\author} [1]{\def\RJ at author{#1}}
+\renewcommand{\title} [1]{\def\RJ at title{#1}}
+\newcommand {\subtitle} [1]{\def\RJ at subtitle{#1}}
+% Control appearance of titles: make slightly smaller than usual, and
+% suppress section numbering. See http://tex.stackexchange.com/questions/69749
+% for why we don't use \setcounter{secnumdepth}{-1}
+\titleformat{\section} {\normalfont\large\bfseries}{}{0em}{}
+\titlecontents{chapter} [0em]{}{}{}{\titlerule*[1em]{.}\contentspage}
+% Article layout ---------------------------------------------------------------
+% Environment |article| clears the article header information at its beginning.
+% We use |\FloatBarrier| from the placeins package to keep floats within
+% the article.
+% Refereed articles should have an abstract, so we redefine |\abstract| to
+% give the desired style
+\textbf{Abstract} #1
+% The real work is done by a redefined version of |\maketitle|. Note
+% that even though we do not want chapters (articles) numbered, we
+% need to increment the chapter counter, so that figures get correct
+% labelling.
+ \chapter{\RJ at title}\refstepcounter{chapter}
+ \ifx\empty\RJ at subtitle
+ \else
+ \noindent\textbf{\RJ at subtitle}
+ \par\nobreak\addvspace{\baselineskip}
+ \fi
+ \ifx\empty\RJ at author
+ \else
+ \noindent\textit{\RJ at author}
+ \par\nobreak\addvspace{\baselineskip}
+ \fi
+ \@afterindentfalse\@nobreaktrue\@afterheading
+% Now for some ugly redefinitions. We do not want articles to start a
+% new page. (Actually, we do, but this is handled via explicit
+% \newpage
+% The name at of@eq is a hack to get hyperlinks to equations to work
+% within each article, even though there may be multiple eq.(1)
+% \begin{macrocode}
+\renewcommand\chapter{\secdef\RJ at chapter\@schapter}
+ \hyphenpenalty=10000\exhyphenpenalty=10000\relax}
+\newcommand{\RJ at chapter}{%
+ \edef\name at of@eq{equation.\@arabic{\c at chapter}}%
+ \renewcommand{\@seccntformat}[1]{}%
+ \@startsection{chapter}{0}{0mm}{%
+ -2\baselineskip \@plus -\baselineskip \@minus -.2ex}{\p@}{%
+ \phantomsection\normalfont\huge\bfseries\raggedright}}
+% Book reviews should appear as sections in the text and in the pdf bookmarks,
+% however we wish them to appear as chapters in the TOC. Thus we define an
+% alternative to |\maketitle| for reviews.
+ \pdfbookmark[1]{#1}{#1}
+ \section*{#1}
+ \addtocontents{toc}{\protect\contentsline{chapter}{#1}{\thepage}{#1.1}}
+% We want bibliographies as starred sections within articles.
+% Equations, figures and tables are counted within articles, but we do
+% not show the article number. For equations it becomes a bit messy to avoid
+% having hyperref getting it wrong.
+% \numberwithin{equation}{chapter}
+\renewcommand{\theequation}{\@arabic\c at equation}
+\renewcommand{\thefigure}{\@arabic\c at figure}
+\renewcommand{\thetable}{\@arabic\c at table}
+% Issue layout -----------------------------------------------------------------
+% Need to provide our own version of |\tableofcontents|. We use the
+% tikz package to get the rounded rectangle. Notice that |\section*|
+% is really the same as |\chapter*|.
+ \vspace{1cm}
+ \section*{\contentsname}
+ { \@starttoc{toc} }
+ \thispagestyle{empty}
+ \hypersetup{
+ pdftitle={The R Journal Volume \RJ at volume/\RJ at number, \RJ at month \RJ at year},%
+ pdfauthor={R Foundation for Statistical Computing},%
+ }
+ \noindent
+ \begin{center}
+ \fontsize{50pt}{50pt}\selectfont
+ The \raisebox{-8pt}{\includegraphics[height=77pt]{Rlogo-4}}\hspace{10pt}
+ Journal
+ \end{center}
+ {\large \hfill Volume \RJ at volume/\RJ at number, \RJ at month{} \RJ at year \quad}
+ \rule{\textwidth}{1pt}
+ \begin{center}
+ {\Large A peer-reviewed, open-access publication of the \\
+ R Foundation for Statistical Computing}
+ \end{center}
+ % And finally, put in the TOC box. Note the way |tocdepth| is adjusted
+ % before and after producing the TOC: thus, we can ensure that only
+ % articles show up in the printed TOC, but that in the PDF version,
+ % bookmarks are created for sections and subsections as well (provided
+ % that the non-starred forms are used).
+ \setcounter{tocdepth}{0}
+ \tableofcontents
+ \setcounter{tocdepth}{2}
+ \clearpage
+% Text formatting --------------------------------------------------------------
+% Simple font selection is not good enough. For example, |\texttt{--}|
+% gives `\texttt{--}', i.e., an endash in typewriter font. Hence, we
+% need to turn off ligatures, which currently only happens for commands
+% |\code| and |\samp| and the ones derived from them. Hyphenation is
+% another issue; it should really be turned off inside |\samp|. And
+% most importantly, \LaTeX{} special characters are a nightmare. E.g.,
+% one needs |\~{}| to produce a tilde in a file name marked by |\file|.
+% Perhaps a few years ago, most users would have agreed that this may be
+% unfortunate but should not be changed to ensure consistency. But with
+% the advent of the WWW and the need for getting `|~|' and `|#|' into
+% URLs, commands which only treat the escape and grouping characters
+% specially have gained acceptance
+{{\normalfont\ttfamily\hyphenchar\font=-1 #1}}%
+% \acronym is effectively disabled since not used consistently
+{{\normalfont\fontseries{b}\selectfont #1}}%
+% Example environments ---------------------------------------------------------
+% Support for output from Sweave, and generic session style code
+% These used to have fontshape=sl for Sinput/Scode/Sin, but pslatex
+% won't use a condensed font in that case.
+% Mathematics ------------------------------------------------------------------
+% The implementation of |\operatorname| is similar to the mechanism
+% \LaTeXe{} uses for functions like sin and cos, and simpler than the
+% one of \AmSLaTeX{}. We use |\providecommand| for the definition in
+% order to keep the one of the \pkg{amstex} if this package has
+% already been loaded.
+% \begin{macrocode}
+ \mathop{\operator at font#1}\nolimits}
+ \mathop{\operator at font I\hspace{-1.5pt}P\hspace{.13pt}}}
+ \mathop{\operator at font I\hspace{-1.5pt}E\hspace{.13pt}}}
+% Figures ----------------------------------------------------------------------
+% Wide environments for figures and tables -------------------------------------
+% An easy way to make a figure span the full width of the page
+ \captionsetup{margin=2cm}
+ \captionsetup{margin=2cm}
Added: papers/rjournal/RJwrapper.brf
--- papers/rjournal/RJwrapper.brf (rev 0)
+++ papers/rjournal/RJwrapper.brf 2013-12-16 16:26:21 UTC (rev 553)
@@ -0,0 +1,2 @@
+\backcite {R}{{1}{2.1}{section.2.1}}
+\backcite {R}{{1}{2.1}{section.2.1}}
Added: papers/rjournal/RJwrapper.tex
--- papers/rjournal/RJwrapper.tex (rev 0)
+++ papers/rjournal/RJwrapper.tex 2013-12-16 16:26:21 UTC (rev 553)
@@ -0,0 +1,24 @@
+%% load any required packages here
+%% do not edit, for illustration only
+\sectionhead{Contributed research article}
+%% replace RJtemplate with your article
+ \input{eddelbuettel-francois-stokely}
Added: papers/rjournal/eddelbuettel-francois-stokely.bib
--- papers/rjournal/eddelbuettel-francois-stokely.bib (rev 0)
+++ papers/rjournal/eddelbuettel-francois-stokely.bib 2013-12-16 16:26:21 UTC (rev 553)
@@ -0,0 +1,168 @@
+ at inproceedings{cantrill2004dynamic,
+ title={Dynamic Instrumentation of Production Systems.},
+ author={Cantrill, Bryan and Shapiro, Michael W and Leventhal, Adam H and others},
+ booktitle={USENIX Annual Technical Conference, General Track},
+ pages={15--28},
+ year={2004}
+ at article{swain1991color,
+ title={Color indexing},
+ author={Swain, Michael J and Ballard, Dana H},
+ journal={International journal of computer vision},
+ volume={7},
+ number={1},
+ pages={11--32},
+ year={1991},
+ publisher={Springer}
+ at article{rubner2000earth,
+ title={The earth mover's distance as a metric for image retrieval},
+ author={Rubner, Yossi and Tomasi, Carlo and Guibas, Leonidas J},
+ journal={International Journal of Computer Vision},
+ volume={40},
+ number={2},
+ pages={99--121},
+ year={2000},
+ publisher={Springer}
+ at book{kullback1997information,
+ title={Information theory and statistics},
+ author={Kullback, Solomon},
+ year={1997},
+ publisher={Courier Dover Publications}
+ at inproceedings{puzicha1997non,
+ title={Non-parametric similarity measures for unsupervised texture segmentation and image retrieval},
+ author={Puzicha, Jan and Hofmann, Thomas and Buhmann, Joachim M},
+ booktitle={Computer Vision and Pattern Recognition, 1997. Proceedings., 1997 IEEE Computer Society Conference on},
+ pages={267--272},
+ year={1997},
+ organization={IEEE}
+ at inproceedings{fang1999computing,
+ title={Computing Iceberg Queries Efficiently.},
+ author={Fang, Min and Shivakumar, Narayanan and Garcia-Molina, Hector and Motwani, Rajeev and Ullman, Jeffrey D},
+ booktitle={Internaational Conference on Very Large Databases (VLDB'98), New York, August 1998},
+ year={1999},
+ organization={Stanford InfoLab}
+ at Manual{emdist,
+ title = {emdist: Earth Mover's Distance},
+ author = {Simon Urbanek and Yossi Rubner},
+ year = {2012},
+ note = {R package version 0.3-1},
+ url = {http://CRAN.R-project.org/package=emdist},
+ at article{pearson1895contributions,
+ title={Contributions to the mathematical theory of evolution. II. Skew variation in homogeneous material},
+ author={Pearson, Karl},
+ journal={Philosophical Transactions of the Royal Society of London. A},
+ volume={186},
+ pages={343--414},
+ year={1895},
+ publisher={JSTOR}
+ at Manual{rprotobuf,
+ title = {RProtoBuf: R Interface to the Protocol Buffers API},
+ author = {Romain Francois and Dirk Eddelbuettel and Murray Stokely},
+ note = {R package version 0.2.6},
+ year = {2012},
+ url = {http://cran.r-project.org/web/packages/RProtoBuf/index.html},
+ at Manual{r,
+ title = {R: A Language and Environment for Statistical Computing},
+ author = {{R Core Team}},
+ organization = {R Foundation for Statistical Computing},
+ address = {Vienna, Austria},
+ year = {2012},
+ note = {{ISBN} 3-900051-07-0},
+ url = {http://www.R-project.org/},
+ }
+ at article{dean2008mapreduce,
+ title={MapReduce: simplified data processing on large clusters},
+ author={Dean, Jeffrey and Ghemawat, Sanjay},
+ journal={Communications of the ACM},
+ volume={51},
+ number={1},
+ pages={107--113},
+ year={2008},
+ publisher={ACM}
+ at article{bostock2011d3,
+ title={D$^3$ Data-Driven Documents},
+ author={Bostock, Michael and Ogievetsky, Vadim and Heer, Jeffrey},
+ journal={Visualization and Computer Graphics, IEEE Transactions on},
+ volume={17},
+ number={12},
+ pages={2301--2309},
+ year={2011},
+ publisher={IEEE}
+% celebrated article in this field. Also see the parallel paragraph.
+ at article{Manku:1998:AMO:276305.276342,
+ author = {Manku, Gurmeet Singh and Rajagopalan, Sridhar and Lindsay, Bruce G.},
+ title = {Approximate medians and other quantiles in one pass and with limited memory},
+ journal = {SIGMOD Rec.},
+ issue_date = {June 1998},
+ volume = {27},
+ number = {2},
+ month = jun,
+ year = {1998},
+ issn = {0163-5808},
+ pages = {426--435},
+ numpages = {10},
+ url = {http://doi.acm.org/10.1145/276305.276342},
+ doi = {10.1145/276305.276342},
+ acmid = {276342},
+ publisher = {ACM},
+ address = {New York, NY, USA},
+% Has a section on protocol buffers
+ at article{Pike:2005:IDP:1239655.1239658,
+ author = {Pike, Rob and Dorward, Sean and Griesemer, Robert and Quinlan, Sean},
+ title = {Interpreting the data: Parallel analysis with Sawzall},
+ journal = {Sci. Program.},
+ issue_date = {October 2005},
+ volume = {13},
+ number = {4},
+ month = oct,
+ year = {2005},
+ issn = {1058-9244},
+ pages = {277--298},
+ numpages = {22},
+ acmid = {1239658},
+ publisher = {IOS Press},
+ address = {Amsterdam, The Netherlands, The Netherlands},
+ at Manual{protobuf,
+ title = {Protocol Buffers: Developer Guide},
+ author = {Google},
+ year = {2012},
+ url = {http://code.google.com/apis/protocolbuffers/docs/overview.html}
+ at article{sturges1926choice,
+ title={The choice of a class interval},
+ author={Sturges, Herbert A},
+ journal={Journal of the American Statistical Association},
+ volume={21},
+ number={153},
+ pages={65--66},
+ year={1926}
+ at article{scott1979optimal,
+ title={On optimal and data-based histograms},
+ author={Scott, David W},
+ journal={Biometrika},
+ volume={66},
+ number={3},
+ pages={605--610},
+ year={1979},
+ publisher={Biometrika Trust}
+ at book{scott2009multivariate,
+ title={Multivariate density estimation: theory, practice, and visualization},
+ author={Scott, David W},
+ volume={383},
+ year={2009},
+ publisher={Wiley. com}
Added: papers/rjournal/eddelbuettel-francois-stokely.tex
--- papers/rjournal/eddelbuettel-francois-stokely.tex (rev 0)
+++ papers/rjournal/eddelbuettel-francois-stokely.tex 2013-12-16 16:26:21 UTC (rev 553)
@@ -0,0 +1,156 @@
+% !TeX root = RJwrapper.tex
+\title{RProtoBuf: Efficient Cross-Language Data Serialization in R}
+\author{by Dirk Eddelbuettel, Romain Fran\c{c}ois, and Murray Stokely}
+\abstract{Modern data collection and analysis pipelines often involve
+ a sophisticated mix of applications written in general purpose and
+ specialized programming languages. Protocol Buffers are a popular
+ method of serializing structured data between applications. The
+ \textbf{RProtoBuf} package provides a complete interface to this
+ library.
+TODO keep it less than 150 words.
+Comparison with what people start with in R : CSV
+comparison with what is only slightly better: JSON
+Introductory section which may include references in parentheses
+\citep{R}, or cite a reference such as \citet{R} in the text.
+Protocol buffers are a language-neutral, platform-neutral, extensible
+way of serializing structured data for use in communications
+protocols, data storage, and more.
+Protocol Buffers offer key features such as an efficient data interchange
+format that is both language- and operating system-agnostic yet uses a
+lightweight and highly performant encoding, object serialization and
+de-serialization as well data and configuration management. Protocol
+buffers are also forward compatible: updates to the \texttt{proto}
+files do not break programs built against the previous specification.
+While benchmarks are not available, Google states on the project page that in
+comparison to XML, protocol buffers are at the same time \textsl{simpler},
+between three to ten times \textsl{smaller}, between twenty and one hundred
+times \textsl{faster}, as well as less ambiguous and easier to program.
+The protocol buffers code is released under an open-source (BSD) license. The
+protocol buffer project (\url{http://code.google.com/p/protobuf/})
+contains a C++ library and a set of runtime libraries and compilers for
+C++, Java and Python.
+With these languages, the workflow follows standard practice of so-called
+Interface Description Languages (IDL)
+(c.f. \href{http://en.wikipedia.org/wiki/Interface_description_language}{Wikipedia
+ on IDL}). This consists of compiling a protocol buffer description file
+(ending in \texttt{.proto}) into language specific classes that can be used
+to create, read, write and manipulate protocol buffer messages. In other
+words, given the 'proto' description file, code is automatically generated
+for the chosen target language(s). The project page contains a tutorial for
+each of these officially supported languages:
+Besides the officially supported C++, Java and Python implementations, several projects have been
+created to support protocol buffers for many languages. The list of known
+languages to support protocol buffers is compiled as part of the
+project page: \url{http://code.google.com/p/protobuf/wiki/ThirdPartyAddOns}
+The protocol buffer project page contains a comprehensive
+description of the language: \url{http://code.google.com/apis/protocolbuffers/docs/proto.html}
+%This section may contain a figure such as Figure~\ref{figure:rlogo}.
+% \centering
+% \includegraphics{Rlogo}
+% \caption{The logo of R.}
+% \label{figure:rlogo}
+\section{Dynamic use: Protocol Buffers and R}
+This section describes how to use the R API to create and manipulate
+protocol buffer messages in R, and how to read and write the
+binary \emph{payload} of the messages to files and arbitrary binary
+R connections.
+\subsection{Importing proto files}
+In contrast to the other languages (Java, C++, Python) that are officially
+supported by Google, the implementation used by the \texttt{RProtoBuf}
+package does not rely on the \texttt{protoc} compiler (with the exception of
+the two functions discussed in the previous section). This means that no
+initial step of statically compiling the proto file into C++ code that is
+then accessed by R code is necessary. Instead, \texttt{proto} files are
+parsed and processed \textsl{at runtime} by the protobuf C++ library---which
+is much more appropriate for a dynamic language.
+The \texttt{readProtoFiles} function allows importing \texttt{proto}
+files in several ways.
+% Example code snippet.
+% TODO(mstokely): Remove this.
+ x <- 1:10
+ result <- myFunction(x)
+\section{Related work on IDLs (greatly expanded from what you have)}
+\section{Design tradeoffs: reflection vs proto compiler (not addressed
+ at all in current vignettes)}
+\subsection{Performance considerations}
+TODO RProtoBuf is quite flexible and easy to use for interactive
+analysis, but it is not designed for certain classes of operations one
+might like to do with protocol buffers. For example, taking a list of
+10,000 protocol buffers, extracting a named field from each one, and
+computing a aggregate statistics on those values would be extremely
+slow with RProtoBuf, and while this is a useful class of operations,
+it is outside of the scope of RProtoBuf. We should be very clear
+about this to clarify the goals and strengths of RProtoBuf and its
+reflection and object mapping.
+\subsection{Serialization comparison}
+TODO comparison of protobuf serialization sizes/times for various vectors. Compared to R's native serialization. Discussion of the RHIPE approach of serializing any/all R objects, vs more specific protocol buffers for specific R objects.
+\section{Basic usage example - tutorial.Person}
+\section{Application: distributed Data Collection with MapReduce}
+We could describe a common MapReduce pattern of having the MR written
+in another language output protocol buffers that are later pulled into
+R. There is some text about this in section 2 of
+\section{Application: Sending/receiving Interaction With Servers}
+This file is only a basic article template. For full details of \emph{The R Journal} style and information on how to prepare your article for submission, see the \href{http://journal.r-project.org/latex/RJauthorguide.pdf}{Instructions for Authors}.
+\address{Author One\\
+ Affiliation\\
+ Address\\
+ Country}
+\email{author1 at work}
+\address{Author Two\\
+ Affiliation\\
+ Address\\
+ Country}
+\email{author2 at work}
+\address{Murray Stokely\\
+ Google, Inc.\\
+ 1600 Amphitheatre Parkway\\
+ Mountain View, CA 94043\\
+ USA}
+\email{mstokely at google.com}
More information about the Rprotobuf-commits
mailing list