From noreply at r-forge.r-project.org Mon Apr 13 02:05:41 2015 From: noreply at r-forge.r-project.org (noreply at r-forge.r-project.org) Date: Mon, 13 Apr 2015 02:05:41 +0200 (CEST) Subject: [Rprotobuf-commits] r938 - in papers/jss: . JSSstyle Message-ID: <20150413000541.5A280183EE6@r-forge.r-project.org> Author: edd Date: 2015-04-13 02:05:40 +0200 (Mon, 13 Apr 2015) New Revision: 938 Added: papers/jss/JSSstyle/ papers/jss/JSSstyle/JSSstyle.zip papers/jss/JSSstyle/README.txt papers/jss/JSSstyle/article.tex papers/jss/JSSstyle/bookreview.tex papers/jss/JSSstyle/codesnippet.tex papers/jss/JSSstyle/jss.bst papers/jss/JSSstyle/jss.cls papers/jss/JSSstyle/jss.dtx papers/jss/JSSstyle/jss.pdf papers/jss/JSSstyle/jsslogo.jpg papers/jss/JSSstyle/softwarereview.tex Log: added JSSstyle/ for reference Added: papers/jss/JSSstyle/JSSstyle.zip =================================================================== (Binary files differ) Property changes on: papers/jss/JSSstyle/JSSstyle.zip ___________________________________________________________________ Added: svn:mime-type + application/octet-stream Added: papers/jss/JSSstyle/README.txt =================================================================== --- papers/jss/JSSstyle/README.txt (rev 0) +++ papers/jss/JSSstyle/README.txt 2015-04-13 00:05:40 UTC (rev 938) @@ -0,0 +1,44 @@ +*************************************************** +** jss: A Document Class for Publications in the ** +** Journal of Statistical Software ** +*************************************************** + +This zip-archive contains the pdfLaTeX infrastructure for +publications in the Journal of Statistical Software. The files + - jss.cls (LaTeX2e class) + - jss.bst (BibTeX style) + - jsslogo.jpg (JPG logo) +need to be included in your search path (local working directory, +texmf or localtexmf tree). + +A manual how to use jss.cls is provided in + - jss.pdf. + +Furthermore, there are templates for articles, code snippets, book +reviews and software reviews available in + - article.tex + - codesnippet.tex + - bookreview.tex + - softwarereview.tex + +JSS papers should be prepared using JSS styles; the submission of +the final version needs to include the full sources (.tex, .bib, and +all graphics). A quick check for the most important aspects of the +JSS style is given below; authors should make sure that all of them +are addressed in the final version: + - The manuscript can be compiled by pdfLaTeX. + - \proglang, \pkg and \code have been used for highlighting + throughout the paper (including titles and references), except + where explicitly escaped. + - References are provided in a .bib BibTeX database and included + in the text by \cite, \citep, \citet, etc. + - Titles and headers are formatted as described in the JSS manual: + - \title in title style, + - \section etc. in sentence style, + - all titles in the BibTeX file in title style. + - Figures, tables and equations are marked with a \label and + referred to by \ref, e.g., "Figure~\ref{...}". + - Software packes are \cite{}d properly. +For more details, see the style FAQ at http://www.jstatsoft.org/style +and the manual jss.pdf, in particular the style checklist in +Section 2.1. Added: papers/jss/JSSstyle/article.tex =================================================================== --- papers/jss/JSSstyle/article.tex (rev 0) +++ papers/jss/JSSstyle/article.tex 2015-04-13 00:05:40 UTC (rev 938) @@ -0,0 +1,64 @@ +\documentclass[article]{jss} + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +%% declarations for jss.cls %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +%% almost as usual +\author{Achim Zeileis\\Universit\"at Innsbruck \And + Second Author\\Plus Affiliation} +\title{A Capitalized Title: Something about a Package \pkg{foo}} + +%% for pretty printing and a nice hypersummary also set: +\Plainauthor{Achim Zeileis, Second Author} %% comma-separated +\Plaintitle{A Capitalized Title: Something about a Package foo} %% without formatting +\Shorttitle{\pkg{foo}: A Capitalized Title} %% a short title (if necessary) + +%% an abstract and keywords +\Abstract{ + The abstract of the article. +} +\Keywords{keywords, comma-separated, not capitalized, \proglang{Java}} +\Plainkeywords{keywords, comma-separated, not capitalized, Java} %% without formatting +%% at least one keyword must be supplied + +%% publication information +%% NOTE: Typically, this can be left commented and will be filled out by the technical editor +%% \Volume{50} +%% \Issue{9} +%% \Month{June} +%% \Year{2012} +%% \Submitdate{2012-06-04} +%% \Acceptdate{2012-06-04} + +%% The address of (at least) one author should be given +%% in the following format: +\Address{ + Achim Zeileis\\ + Department of Statistics and Mathematics\\ + Faculty of Economics and Statistics\\ + Universit\"at Innsbruck\\ + 6020 Innsbruck, Austria\\ + E-mail: \email{Achim.Zeileis at uibk.ac.at}\\ + URL: \url{http://eeecon.uibk.ac.at/~zeileis/} +} +%% It is also possible to add a telephone and fax number +%% before the e-mail in the following format: +%% Telephone: +43/512/507-7103 +%% Fax: +43/512/507-2851 + +%% for those who use Sweave please include the following line (with % symbols): +%% need no \usepackage{Sweave.sty} + +%% end of declarations %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + + +\begin{document} + +%% include your article here, just as usual +%% Note that you should use the \pkg{}, \proglang{} and \code{} commands. + +\section[About Java]{About \proglang{Java}} +%% Note: If there is markup in \(sub)section, then it has to be escape as above. + +\end{document} Added: papers/jss/JSSstyle/bookreview.tex =================================================================== --- papers/jss/JSSstyle/bookreview.tex (rev 0) +++ papers/jss/JSSstyle/bookreview.tex 2015-04-13 00:05:40 UTC (rev 938) @@ -0,0 +1,51 @@ +\documentclass[bookreview]{jss} + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +%% declarations for jss.cls %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +%% reviewer +\Reviewer{Frederic Udina\\Pompeu Fabra University} +\Plainreviewer{Frederic Udina} + +%% about the book +\Booktitle{Visualizing Categorical Data} +\Bookauthor{Michael Friendly} +\Publisher{SAS Institute Inc.} +\Pubaddress{Carey, NC} +\Pubyear{2000} +\ISBN{1-58025-660-0} +\Pages{456} +\Price{USD 69.95 (P)} +\URL{http://www.math.yorku.ca/SCS/vcd/} +%% if different from \Booktitle also set +%% \Plaintitle{Visualizing Categorical Data} +%% \Shorttitle{Visualizing Categorical Data} + +%% publication information +%% NOTE: Typically, this can be left commented and will be filled out by the technical editor +%% \Volume{50} +%% \Issue{7} +%% \Month{June} +%% \Year{2012} +%% \Submitdate{2012-06-04} + +%% address of (at least one) author +\Address{ + Frederic Udina\\ + Pompeu Fabra University\\ + Department of Economics and Business\\ + Barcelona, Spain 08005\\ + E-mail: \email{udina at upf.es}\\ + URL: \url{http://libiya.upf.es/} +} + +%% end of declarations %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + + +\begin{document} + +%% include the review as usual +%% Note that you should use the \pkg{}, \proglang{} and \code{} commands. + +\end{document} Added: papers/jss/JSSstyle/codesnippet.tex =================================================================== --- papers/jss/JSSstyle/codesnippet.tex (rev 0) +++ papers/jss/JSSstyle/codesnippet.tex 2015-04-13 00:05:40 UTC (rev 938) @@ -0,0 +1,65 @@ +\documentclass[codesnippet]{jss} + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +%% declarations for jss.cls %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +%% almost as usual +\author{Achim Zeileis\\Universit\"at Innsbruck \And + Second Author\\Plus Affiliation} +\title{A Capitalized Title: Possibly Containing More Details} + +%% for pretty printing and a nice hypersummary also set: +\Plainauthor{Achim Zeileis, Second Author} %% comma-separated +\Plaintitle{A Capitalized Title: Possibly Containing More Details} %% without formatting +\Shorttitle{A Capitalized Title} %% a short title (if necessary) + +%% an abstract and keywords +\Abstract{ + Here should be the abstract. +} +\Keywords{keywords, comma-separated, not capitalized, \proglang{Java}} +\Plainkeywords{keywords, comma-separated, not capitalized, Java} %% without formatting +%% at least one keyword must be supplied + +%% publication information +%% NOTE: Typically, this can be left commented and will be filled out by the technical editor +%% \Volume{50} +%% \Issue{9} +%% \Month{June} +%% \Year{2012} +%% \Submitdate{2012-06-04} +%% \Acceptdate{2012-06-04} + +%% The address of (at least) one author should be given +%% in the following format: +\Address{ + Achim Zeileis\\ + Department of Statistics and Mathematics\\ + Faculty of Economics and Statistics\\ + Universit\"at Innsbruck\\ + 6020 Innsbruck, Austria\\ + E-mail: \email{Achim.Zeileis at uibk.ac.at}\\ + URL: \url{http://eeecon.uibk.ac.at/~zeileis/} +} +%% It is also possible to add a telephone and fax number +%% before the e-mail in the following format: +%% Telephone: +43/512/507-7103 +%% Fax: +43/512/507-2851 + +%% for those who use Sweave please include the following line (with % symbols): +%% need no \usepackage{Sweave.sty} + +%% end of declarations %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + + +\begin{document} + +%% include your article here, just as usual +%% Note that you should use the \pkg{}, \proglang{} and \code{} commands. + +\section[About Java]{About \proglang{Java}} +%% Note: If there is markup in \(sub)section, then it has to be escape as above. + + +\end{document} Added: papers/jss/JSSstyle/jss.bst =================================================================== --- papers/jss/JSSstyle/jss.bst (rev 0) +++ papers/jss/JSSstyle/jss.bst 2015-04-13 00:05:40 UTC (rev 938) @@ -0,0 +1,1631 @@ +%% +%% This is file `jss.bst', +%% generated with the docstrip utility. +%% +%% The original source files were: +%% +%% merlin.mbs (with options: `ay,nat,nm-rvx,keyxyr,dt-beg,yr-par,note-yr,tit-qq,atit-u,trnum-it,vol-bf,volp-com,num-xser,pre-edn,isbn,issn,edpar,pp,ed,xedn,xand,etal-it,revdata,eprint,url,url-blk,doi,nfss') +%% +%% ** BibTeX style file for JSS publications (http://www.jstatsoft.org/) +%% +%% Copyright 1994-2007 Patrick W Daly +%% License: GPL-2 + % =============================================================== + % IMPORTANT NOTICE: + % This bibliographic style (bst) file has been generated from one or + % more master bibliographic style (mbs) files, listed above, provided + % with kind permission of Patrick W Daly. + % + % This generated file can be redistributed and/or modified under the terms + % of the General Public License (Version 2). + % =============================================================== + % Name and version information of the main mbs file: + % \ProvidesFile{merlin.mbs}[2007/04/24 4.20 (PWD, AO, DPC)] + % For use with BibTeX version 0.99a or later + %------------------------------------------------------------------- + % This bibliography style file is intended for texts in ENGLISH + % This is an author-year citation style bibliography. As such, it is + % non-standard LaTeX, and requires a special package file to function properly. + % Such a package is natbib.sty by Patrick W. Daly + % The form of the \bibitem entries is + % \bibitem[Jones et al.(1990)]{key}... + % \bibitem[Jones et al.(1990)Jones, Baker, and Smith]{key}... + % The essential feature is that the label (the part in brackets) consists + % of the author names, as they should appear in the citation, with the year + % in parentheses following. There must be no space before the opening + % parenthesis! + % With natbib v5.3, a full list of authors may also follow the year. + % In natbib.sty, it is possible to define the type of enclosures that is + % really wanted (brackets or parentheses), but in either case, there must + % be parentheses in the label. + % The \cite command functions as follows: + % \citet{key} ==>> Jones et al. (1990) + % \citet*{key} ==>> Jones, Baker, and Smith (1990) + % \citep{key} ==>> (Jones et al., 1990) + % \citep*{key} ==>> (Jones, Baker, and Smith, 1990) + % \citep[chap. 2]{key} ==>> (Jones et al., 1990, chap. 2) + % \citep[e.g.][]{key} ==>> (e.g. Jones et al., 1990) + % \citep[e.g.][p. 32]{key} ==>> (e.g. Jones et al., p. 32) + % \citeauthor{key} ==>> Jones et al. + % \citeauthor*{key} ==>> Jones, Baker, and Smith + % \citeyear{key} ==>> 1990 + %--------------------------------------------------------------------- + +ENTRY + { address + archive + author + booktitle + chapter + collaboration + doi + edition + editor + eid + eprint + howpublished + institution + isbn + issn + journal + key + month + note + number + numpages + organization + pages + publisher + school + series + title + type + url + volume + year + } + {} + { label extra.label sort.label short.list } +INTEGERS { output.state before.all mid.sentence after.sentence after.block } +FUNCTION {init.state.consts} +{ #0 'before.all := + #1 'mid.sentence := + #2 'after.sentence := + #3 'after.block := +} +STRINGS { s t} +FUNCTION {output.nonnull} +{ 's := + output.state mid.sentence = + { ", " * write$ } + { output.state after.block = + { add.period$ write$ + newline$ + "\newblock " write$ + } + { output.state before.all = + 'write$ + { add.period$ " " * write$ } + if$ + } + if$ + mid.sentence 'output.state := + } + if$ + s +} +FUNCTION {output} +{ duplicate$ empty$ + 'pop$ + 'output.nonnull + if$ +} +FUNCTION {output.check} +{ 't := + duplicate$ empty$ + { pop$ "empty " t * " in " * cite$ * warning$ } + 'output.nonnull + if$ +} +FUNCTION {fin.entry} +{ add.period$ + write$ + newline$ +} + +FUNCTION {new.block} +{ output.state before.all = + 'skip$ + { after.block 'output.state := } + if$ +} +FUNCTION {new.sentence} +{ output.state after.block = + 'skip$ + { output.state before.all = + 'skip$ + { after.sentence 'output.state := } + if$ + } + if$ +} +FUNCTION {add.blank} +{ " " * before.all 'output.state := +} + +FUNCTION {date.block} +{ + new.block +} + +FUNCTION {not} +{ { #0 } + { #1 } + if$ +} +FUNCTION {and} +{ 'skip$ + { pop$ #0 } + if$ +} +FUNCTION {or} +{ { pop$ #1 } + 'skip$ + if$ +} +FUNCTION {non.stop} +{ duplicate$ + "}" * add.period$ + #-1 #1 substring$ "." = +} + +STRINGS {z} +FUNCTION {remove.dots} +{ 'z := + "" + { z empty$ not } + { z #1 #1 substring$ + z #2 global.max$ substring$ 'z := + duplicate$ "." = 'pop$ + { * } + if$ + } + while$ +} +FUNCTION {new.block.checkb} +{ empty$ + swap$ empty$ + and + 'skip$ + 'new.block + if$ +} +FUNCTION {field.or.null} +{ duplicate$ empty$ + { pop$ "" } + 'skip$ + if$ +} +FUNCTION {emphasize} +{ duplicate$ empty$ + { pop$ "" } + { "\emph{" swap$ * "}" * } + if$ +} +FUNCTION {bolden} +{ duplicate$ empty$ + { pop$ "" } + { "\textbf{" swap$ * "}" * } + if$ +} +FUNCTION {tie.or.space.prefix} +{ duplicate$ text.length$ #3 < + { "~" } + { " " } + if$ + swap$ +} + +FUNCTION {capitalize} +{ "u" change.case$ "t" change.case$ } + +FUNCTION {space.word} +{ " " swap$ * " " * } + % Here are the language-specific definitions for explicit words. + % Each function has a name bbl.xxx where xxx is the English word. + % The language selected here is ENGLISH +FUNCTION {bbl.and} +{ "and"} + +FUNCTION {bbl.etal} +{ "et~al." } + +FUNCTION {bbl.editors} +{ "eds." } + +FUNCTION {bbl.editor} +{ "ed." } + +FUNCTION {bbl.edby} +{ "edited by" } + +FUNCTION {bbl.edition} +{ "edition" } + +FUNCTION {bbl.volume} +{ "volume" } + +FUNCTION {bbl.of} +{ "of" } + +FUNCTION {bbl.number} +{ "number" } + +FUNCTION {bbl.nr} +{ "no." } + +FUNCTION {bbl.in} +{ "in" } + +FUNCTION {bbl.pages} +{ "pp." } + +FUNCTION {bbl.page} +{ "p." } + +FUNCTION {bbl.eidpp} +{ "pages" } + +FUNCTION {bbl.chapter} +{ "chapter" } + +FUNCTION {bbl.techrep} +{ "Technical Report" } + +FUNCTION {bbl.mthesis} +{ "Master's thesis" } + +FUNCTION {bbl.phdthesis} +{ "Ph.D. thesis" } + +MACRO {jan} {"January"} + +MACRO {feb} {"February"} + +MACRO {mar} {"March"} + +MACRO {apr} {"April"} + +MACRO {may} {"May"} + +MACRO {jun} {"June"} + +MACRO {jul} {"July"} + +MACRO {aug} {"August"} + +MACRO {sep} {"September"} + +MACRO {oct} {"October"} + +MACRO {nov} {"November"} + +MACRO {dec} {"December"} + +MACRO {acmcs} {"ACM Computing Surveys"} + +MACRO {acta} {"Acta Informatica"} + +MACRO {cacm} {"Communications of the ACM"} + +MACRO {ibmjrd} {"IBM Journal of Research and Development"} + +MACRO {ibmsj} {"IBM Systems Journal"} + +MACRO {ieeese} {"IEEE Transactions on Software Engineering"} + +MACRO {ieeetc} {"IEEE Transactions on Computers"} + +MACRO {ieeetcad} + {"IEEE Transactions on Computer-Aided Design of Integrated Circuits"} + +MACRO {ipl} {"Information Processing Letters"} + +MACRO {jacm} {"Journal of the ACM"} + +MACRO {jcss} {"Journal of Computer and System Sciences"} + +MACRO {scp} {"Science of Computer Programming"} + +MACRO {sicomp} {"SIAM Journal on Computing"} + +MACRO {tocs} {"ACM Transactions on Computer Systems"} + +MACRO {tods} {"ACM Transactions on Database Systems"} + +MACRO {tog} {"ACM Transactions on Graphics"} + +MACRO {toms} {"ACM Transactions on Mathematical Software"} + +MACRO {toois} {"ACM Transactions on Office Information Systems"} + +MACRO {toplas} {"ACM Transactions on Programming Languages and Systems"} + +MACRO {tcs} {"Theoretical Computer Science"} +FUNCTION {bibinfo.check} +{ swap$ + duplicate$ missing$ + { + pop$ pop$ + "" + } + { duplicate$ empty$ + { + swap$ pop$ + } + { swap$ + pop$ + } + if$ + } + if$ +} +FUNCTION {bibinfo.warn} +{ swap$ + duplicate$ missing$ + { + swap$ "missing " swap$ * " in " * cite$ * warning$ pop$ + "" + } + { duplicate$ empty$ + { + swap$ "empty " swap$ * " in " * cite$ * warning$ + } + { swap$ + pop$ + } + if$ + } + if$ +} +FUNCTION {format.eprint} +{ eprint duplicate$ empty$ + 'skip$ + { "\eprint" + archive empty$ + 'skip$ + { "[" * archive * "]" * } + if$ + "{" * swap$ * "}" * + } + if$ +} +FUNCTION {format.url} +{ url empty$ + { "" } + { "\urlprefix\url{" url * "}" * } + if$ +} + +INTEGERS { nameptr namesleft numnames } + + +STRINGS { bibinfo} + +FUNCTION {format.names} +{ 'bibinfo := + duplicate$ empty$ 'skip$ { + 's := + "" 't := + #1 'nameptr := + s num.names$ 'numnames := + numnames 'namesleft := + { namesleft #0 > } + { s nameptr + "{vv~}{ll}{ jj}{ f{}}" + format.name$ + remove.dots + bibinfo bibinfo.check + 't := + nameptr #1 > + { + namesleft #1 > + { ", " * t * } + { + s nameptr "{ll}" format.name$ duplicate$ "others" = + { 't := } + { pop$ } + if$ + "," * + t "others" = + { + " " * bbl.etal emphasize * + } + { " " * t * } + if$ + } + if$ + } + 't + if$ + nameptr #1 + 'nameptr := + namesleft #1 - 'namesleft := + } + while$ + } if$ +} +FUNCTION {format.names.ed} +{ + 'bibinfo := + duplicate$ empty$ 'skip$ { + 's := + "" 't := + #1 'nameptr := + s num.names$ 'numnames := + numnames 'namesleft := + { namesleft #0 > } + { s nameptr + "{f{}~}{vv~}{ll}{ jj}" + format.name$ + remove.dots + bibinfo bibinfo.check + 't := + nameptr #1 > + { + namesleft #1 > + { ", " * t * } + { + s nameptr "{ll}" format.name$ duplicate$ "others" = + { 't := } + { pop$ } + if$ + "," * + t "others" = + { + + " " * bbl.etal emphasize * + } + { " " * t * } + if$ + } + if$ + } + 't + if$ + nameptr #1 + 'nameptr := + namesleft #1 - 'namesleft := + } + while$ + } if$ +} +FUNCTION {format.key} +{ empty$ + { key field.or.null } + { "" } + if$ +} + +FUNCTION {format.authors} +{ author "author" format.names + duplicate$ empty$ 'skip$ + { collaboration "collaboration" bibinfo.check + duplicate$ empty$ 'skip$ + { " (" swap$ * ")" * } + if$ + * + } + if$ +} +FUNCTION {get.bbl.editor} +{ editor num.names$ #1 > 'bbl.editors 'bbl.editor if$ } + +FUNCTION {format.editors} +{ editor "editor" format.names duplicate$ empty$ 'skip$ + { + " " * + get.bbl.editor + "(" swap$ * ")" * + * + } + if$ +} +FUNCTION {format.isbn} +{ isbn "isbn" bibinfo.check + duplicate$ empty$ 'skip$ + { + new.block + "ISBN " swap$ * + } + if$ +} + +FUNCTION {format.issn} +{ issn "issn" bibinfo.check + duplicate$ empty$ 'skip$ + { + new.block + "ISSN " swap$ * + } + if$ +} + +FUNCTION {format.doi} +{ doi "doi" bibinfo.check + duplicate$ empty$ 'skip$ + { + new.block + "\doi{" swap$ * "}" * + } + if$ +} +FUNCTION {format.note} +{ + note empty$ + { "" } + { note #1 #1 substring$ + duplicate$ "{" = + 'skip$ + { output.state mid.sentence = + { "l" } + { "u" } + if$ + change.case$ + } + if$ + note #2 global.max$ substring$ * "note" bibinfo.check + } + if$ +} + +FUNCTION {format.title} +{ title + "title" bibinfo.check + duplicate$ empty$ 'skip$ + { + "\enquote{" swap$ * + add.period$ "}" * + } + if$ +} +FUNCTION {format.full.names} +{'s := + "" 't := + #1 'nameptr := + s num.names$ 'numnames := + numnames 'namesleft := + { namesleft #0 > } + { s nameptr + "{vv~}{ll}" format.name$ + 't := + nameptr #1 > + { + namesleft #1 > + { ", " * t * } + { + s nameptr "{ll}" format.name$ duplicate$ "others" = + { 't := } + { pop$ } + if$ + t "others" = + { + " " * bbl.etal emphasize * + } + { + numnames #2 > + { "," * } + 'skip$ + if$ + bbl.and + space.word * t * + } + if$ + } + if$ + } + 't + if$ + nameptr #1 + 'nameptr := + namesleft #1 - 'namesleft := + } + while$ +} + +FUNCTION {author.editor.key.full} +{ author empty$ + { editor empty$ + { key empty$ + { cite$ #1 #3 substring$ } + 'key + if$ + } + { editor format.full.names } + if$ + } + { author format.full.names } + if$ +} + +FUNCTION {author.key.full} +{ author empty$ + { key empty$ + { cite$ #1 #3 substring$ } + 'key + if$ + } + { author format.full.names } + if$ +} + +FUNCTION {editor.key.full} +{ editor empty$ + { key empty$ + { cite$ #1 #3 substring$ } + 'key + if$ + } + { editor format.full.names } + if$ +} + +FUNCTION {make.full.names} +{ type$ "book" = + type$ "inbook" = + or + 'author.editor.key.full + { type$ "proceedings" = + 'editor.key.full + 'author.key.full + if$ + } + if$ +} + +FUNCTION {output.bibitem} +{ newline$ + "\bibitem[{" write$ + label write$ + ")" make.full.names duplicate$ short.list = + { pop$ } + { * } + if$ + "}]{" * write$ + cite$ write$ + "}" write$ + newline$ + "" + before.all 'output.state := +} + +FUNCTION {n.dashify} +{ + 't := + "" + { t empty$ not } + { t #1 #1 substring$ "-" = + { t #1 #2 substring$ "--" = not + { "--" * + t #2 global.max$ substring$ 't := + } + { { t #1 #1 substring$ "-" = } + { "-" * + t #2 global.max$ substring$ 't := + } + while$ + } + if$ + } + { t #1 #1 substring$ * + t #2 global.max$ substring$ 't := + } + if$ + } + while$ +} + +FUNCTION {word.in} +{ bbl.in capitalize + " " * } + +FUNCTION {format.date} +{ year "year" bibinfo.check duplicate$ empty$ + { + "empty year in " cite$ * "; set to ????" * warning$ + pop$ "????" + } + 'skip$ + if$ + extra.label * + before.all 'output.state := + " (" swap$ * ")" * +} +FUNCTION {format.btitle} +{ title "title" bibinfo.check + duplicate$ empty$ 'skip$ + { + emphasize + } + if$ +} +FUNCTION {either.or.check} +{ empty$ + 'pop$ + { "can't use both " swap$ * " fields in " * cite$ * warning$ } + if$ +} +FUNCTION {format.bvolume} +{ volume empty$ + { "" } + { bbl.volume volume tie.or.space.prefix + "volume" bibinfo.check * * + series "series" bibinfo.check + duplicate$ empty$ 'pop$ + { swap$ bbl.of space.word * swap$ + emphasize * } + if$ + "volume and number" number either.or.check + } + if$ +} +FUNCTION {format.number.series} +{ volume empty$ + { number empty$ + { series field.or.null } + { series empty$ + { number "number" bibinfo.check } + { output.state mid.sentence = + { bbl.number } + { bbl.number capitalize } + if$ + number tie.or.space.prefix "number" bibinfo.check * * + bbl.in space.word * + series "series" bibinfo.check * + } + if$ + } + if$ + } + { "" } + if$ +} + +FUNCTION {format.edition} +{ edition duplicate$ empty$ 'skip$ + { + output.state mid.sentence = + { "l" } + { "t" } + if$ change.case$ + "edition" bibinfo.check + " " * bbl.edition * + } + if$ +} +INTEGERS { multiresult } +FUNCTION {multi.page.check} +{ 't := + #0 'multiresult := + { multiresult not + t empty$ not + and + } + { t #1 #1 substring$ + duplicate$ "-" = + swap$ duplicate$ "," = + swap$ "+" = + or or + { #1 'multiresult := } + { t #2 global.max$ substring$ 't := } + if$ + } + while$ + multiresult +} +FUNCTION {format.pages} +{ pages duplicate$ empty$ 'skip$ + { duplicate$ multi.page.check + { + bbl.pages swap$ + n.dashify + } + { + bbl.page swap$ + } + if$ + tie.or.space.prefix + "pages" bibinfo.check + * * + } + if$ +} +FUNCTION {format.journal.pages} +{ pages duplicate$ empty$ 'pop$ + { swap$ duplicate$ empty$ + { pop$ pop$ format.pages } + { + ", " * + swap$ + n.dashify + "pages" bibinfo.check + * + } + if$ + } + if$ +} +FUNCTION {format.journal.eid} +{ eid "eid" bibinfo.check + duplicate$ empty$ 'pop$ + { swap$ duplicate$ empty$ 'skip$ + { + ", " * + } + if$ + swap$ * + numpages empty$ 'skip$ + { bbl.eidpp numpages tie.or.space.prefix + "numpages" bibinfo.check * * + " (" swap$ * ")" * * + } + if$ + } + if$ +} +FUNCTION {format.vol.num.pages} +{ volume field.or.null + duplicate$ empty$ 'skip$ + { + "volume" bibinfo.check + } + if$ + bolden + number "number" bibinfo.check duplicate$ empty$ 'skip$ + { + swap$ duplicate$ empty$ + { "there's a number but no volume in " cite$ * warning$ } + 'skip$ + if$ + swap$ + "(" swap$ * ")" * + } + if$ * + eid empty$ + { format.journal.pages } + { format.journal.eid } + if$ +} + +FUNCTION {format.chapter.pages} +{ chapter empty$ + 'format.pages + { type empty$ + { bbl.chapter } + { type "l" change.case$ + "type" bibinfo.check + } + if$ + chapter tie.or.space.prefix + "chapter" bibinfo.check + * * + pages empty$ + 'skip$ + { ", " * format.pages * } + if$ + } + if$ +} + +FUNCTION {format.booktitle} +{ + booktitle "booktitle" bibinfo.check + emphasize +} +FUNCTION {format.in.ed.booktitle} +{ format.booktitle duplicate$ empty$ 'skip$ + { + editor "editor" format.names.ed duplicate$ empty$ 'pop$ + { + " " * + get.bbl.editor + "(" swap$ * "), " * + * swap$ + * } + if$ + word.in swap$ * + } + if$ +} +FUNCTION {format.thesis.type} +{ type duplicate$ empty$ + 'pop$ + { swap$ pop$ + "t" change.case$ "type" bibinfo.check + } + if$ +} +FUNCTION {format.tr.number} +{ number "number" bibinfo.check + type duplicate$ empty$ + { pop$ bbl.techrep } + 'skip$ + if$ + "type" bibinfo.check + swap$ duplicate$ empty$ + { pop$ "t" change.case$ } + { tie.or.space.prefix * * } + if$ +} +FUNCTION {format.article.crossref} +{ + word.in + " \cite{" * crossref * "}" * +} +FUNCTION {format.book.crossref} +{ volume duplicate$ empty$ + { "empty volume in " cite$ * "'s crossref of " * crossref * warning$ + pop$ word.in + } + { bbl.volume + capitalize + swap$ tie.or.space.prefix "volume" bibinfo.check * * bbl.of space.word * + } + if$ + " \cite{" * crossref * "}" * +} +FUNCTION {format.incoll.inproc.crossref} +{ + word.in + " \cite{" * crossref * "}" * +} +FUNCTION {format.org.or.pub} +{ 't := + "" + address empty$ t empty$ and + 'skip$ + { + t empty$ + { address "address" bibinfo.check * + } + { t * + address empty$ + 'skip$ + { ", " * address "address" bibinfo.check * } + if$ + } + if$ + } + if$ +} +FUNCTION {format.publisher.address} +{ publisher "publisher" bibinfo.warn format.org.or.pub +} + +FUNCTION {format.organization.address} +{ organization "organization" bibinfo.check format.org.or.pub +} + +FUNCTION {article} +{ output.bibitem + format.authors "author" output.check + author format.key output + format.date "year" output.check + date.block + format.title "title" output.check + new.block + crossref missing$ + { + journal + "journal" bibinfo.check + emphasize + "journal" output.check + format.vol.num.pages output + } + { format.article.crossref output.nonnull + format.pages output + } + if$ + format.issn output + format.doi output + new.block + format.note output + format.eprint output + format.url output + fin.entry +} +FUNCTION {book} +{ output.bibitem + author empty$ + { format.editors "author and editor" output.check + editor format.key output + } + { format.authors output.nonnull + crossref missing$ + { "author and editor" editor either.or.check } + 'skip$ + if$ + } + if$ + format.date "year" output.check + date.block + format.btitle "title" output.check + crossref missing$ + { format.bvolume output + new.block + format.number.series output + format.edition output + new.sentence + format.publisher.address output + } + { + new.block + format.book.crossref output.nonnull + } + if$ + format.isbn output + format.doi output + new.block + format.note output + format.eprint output + format.url output + fin.entry +} +FUNCTION {booklet} +{ output.bibitem + format.authors output + author format.key output + format.date "year" output.check + date.block + format.title "title" output.check + new.block + howpublished "howpublished" bibinfo.check output + address "address" bibinfo.check output + format.isbn output + format.doi output + new.block + format.note output + format.eprint output + format.url output + fin.entry +} + +FUNCTION {inbook} +{ output.bibitem + author empty$ + { format.editors "author and editor" output.check + editor format.key output + } + { format.authors output.nonnull + crossref missing$ + { "author and editor" editor either.or.check } + 'skip$ + if$ + } + if$ + format.date "year" output.check + date.block + format.btitle "title" output.check + crossref missing$ + { + format.bvolume output + format.chapter.pages "chapter and pages" output.check + new.block + format.number.series output + format.edition output + new.sentence + format.publisher.address output + } + { + format.chapter.pages "chapter and pages" output.check + new.block + format.book.crossref output.nonnull + } + if$ + crossref missing$ + { format.isbn output } + 'skip$ + if$ + format.doi output + new.block + format.note output + format.eprint output + format.url output + fin.entry +} + +FUNCTION {incollection} +{ output.bibitem + format.authors "author" output.check + author format.key output + format.date "year" output.check + date.block + format.title "title" output.check + new.block + crossref missing$ + { format.in.ed.booktitle "booktitle" output.check + format.bvolume output + format.number.series output + format.edition output + format.chapter.pages output + new.sentence + format.publisher.address output + format.isbn output + } + { format.incoll.inproc.crossref output.nonnull + format.chapter.pages output + } + if$ + format.doi output + new.block + format.note output + format.eprint output + format.url output + fin.entry +} +FUNCTION {inproceedings} +{ output.bibitem + format.authors "author" output.check + author format.key output + format.date "year" output.check + date.block + format.title "title" output.check + new.block + crossref missing$ + { format.in.ed.booktitle "booktitle" output.check + format.bvolume output + format.number.series output + format.pages output + new.sentence + publisher empty$ + { format.organization.address output } + { organization "organization" bibinfo.check output + format.publisher.address output + } + if$ + format.isbn output + format.issn output + } + { format.incoll.inproc.crossref output.nonnull + format.pages output + } + if$ + format.doi output + new.block + format.note output + format.eprint output + format.url output [TRUNCATED] To get the complete diff run: svnlook diff /svnroot/rprotobuf -r 938 From noreply at r-forge.r-project.org Mon Apr 13 02:07:19 2015 From: noreply at r-forge.r-project.org (noreply at r-forge.r-project.org) Date: Mon, 13 Apr 2015 02:07:19 +0200 (CEST) Subject: [Rprotobuf-commits] r939 - papers/jss Message-ID: <20150413000719.E5CB41879E1@r-forge.r-project.org> Author: edd Date: 2015-04-13 02:07:19 +0200 (Mon, 13 Apr 2015) New Revision: 939 Added: papers/jss/conditional-acceptance-email-2015-04.txt papers/jss/conditional-acceptance-notes-2015-04.txt Log: replies from JSS and TODOs for finalizing, with my notes in notes.txt Added: papers/jss/conditional-acceptance-email-2015-04.txt =================================================================== --- papers/jss/conditional-acceptance-email-2015-04.txt (rev 0) +++ papers/jss/conditional-acceptance-email-2015-04.txt 2015-04-13 00:07:19 UTC (rev 939) @@ -0,0 +1,61 @@ +From: Editor of the Journal of Statistical Software +To: "edd at debian.org" +Subject: JSS 1313 Conditional Acceptance +Date: Mon, 6 Apr 2015 19:30:32 +0000 + +Dear author, + +Your submission + + JSS 1313 + +is conditionally accepted for publication in JSS. + +Your manuscript has just finished the post-processing stage. In order to +continue in the process there are a few changes that need to be made. Attached +to this email is a comments file where you can find all the necessary changes. + +For further questions please see FAQ at http://www.jstatsoft.org/style. + + +Please send the full sources for your submission to the technical editor ( +editor at jstatsoft.org). It should contain: + + + 1. The full sources of the latest version of the software. (Binary versions + can be provided in addition.) + 2. The .tex, .bib, and all graphics for the manuscript: where the names of + your files should be - your .tex file should be called jssxxx.tex and your + .bib file should be called jssxxx.bib. As well as a complied .pdf version + of your manuscript. + 3. Information on how to replicate the examples in the manuscript. (Typically, + this is a self-contained standalone script file that loads/calls/sources + the software package from (1). If data or other external files are needed, + that are not yet provided with the software package, include these as + well.) Please use subdirectories for Figures/ and Code/ + 4. Please wrap all these files into a single .zip (or .tar.gz) file. + 5. Please make sure the .zip files only contains the necessary files. That is, + please do not include .aux, .log, etc. files and any unused files such + as jss.cls, jss.bst, jsslogo.jpg, etc. + + +Note for R authors: If you have prepared your manuscript using Sweave, the +files in (2) can be produced by Sweave, those in (3) by Stangle (possibly +enhancing the comments). Also indicate in your e-mail that Sweave was used and +the technical editor will provide you with further Sweave-specific information. + +Thanks for choosing JSS and contributing to free statistical software. + +Best regards, + +Jan de Leeuw +Bettina Gr?n +Achim Zeileis + + + + + + +---------------------------------------------------------------------- +xplain text (us-ascii): JSS 1313 post comments.txt, JSS 1313 [display] Added: papers/jss/conditional-acceptance-notes-2015-04.txt =================================================================== --- papers/jss/conditional-acceptance-notes-2015-04.txt (rev 0) +++ papers/jss/conditional-acceptance-notes-2015-04.txt 2015-04-13 00:07:19 UTC (rev 939) @@ -0,0 +1,85 @@ +JSS 1313: Eddelbuettel, Stokely, Ooms + +RProtoBuf: Efficient Cross-Language Data Serialization in R + +--------------------------------------------------------- +For further instruction on JSS style requirements please see the JSS style manual (in particular section 2.1 Style Checklist) at http://www.jstatsoft.org/downloads/JSSstyle.zip + + ## START DEdd: Inserted per copy/paste from jss.pdf: + + 2.1 Style checklist + + A quick check for the most important aspects of the JSS style is given + below. Authors should make sure that all of them are addressed in the ?nal + version. More details can be found in the remainder of this manual. + ? The manuscript can be compiled by pdfLATEX. + ? \proglang, \pkg and \code have been used for highlighting throughout the paper + (including titles and references), except where explicitly escaped. + ? References are provided in a .bib BibTEX database and included in the text by \cite, + \citep, \citet, etc. + ? Titles and headers are formatted properly: + ? \title in title style, + ? \section etc. in sentence style, + ? all titles in the BibTEX ?le in title style. + ? Figures, tables and equations are marked with a \label and referred to by \ref, e.g., + ?Figure~\ref{...}?. + ? Software packages are \cite{}d properly. + + ## END DEdd: Inserted per copy/paste from jss.pdf: + +Also see FAQ at: http://www.jstatsoft.org/style + +For further references please see RECENT JSS papers for detailed documentation and examples. +--------------------------------------------------------- + + +From the editorial team: + +o From one reviewer: As far as I can see there's only one difference between +the two columns of Table 3. It would be nice to highlight this. + + ## DEdd: Done, added a sentence below table and tightened wording in that + Table note. + + +Manuscript style comments: + +o Code should have enough spaces to facilitate reading. Please include spaces before and after operators and after commas (unless spaces have syntactical meaning). + + ## DEdd: No change, we were good already + +o The table in Figure 2 should have row/column labels in sentence +style. (Only the first word of a label should be capitalized). + + ## DEdd: Done (not sure I like it better) + +o In all cases, code input/output must fit within the normal text width of the manuscript. Thus, code input should have appropriate line breaks and code output should preferably be generated with a suitable width (or otherwise edited). E.g., see p. 9. + + ## DEdd: Replaces the Sweave code with its latex output and manually broke the long line + +o For bullet lists/itemized lists please use either a comma, semi-colon, or period at the end of each item. + + ## DEdd: Done; one small change + +o As a reminder, please make sure that: + - \proglang, \pkg and \code have been used for highlighting throughout the paper (including titles and references), except where explicitly escaped. + + +References: + +o John Wiley & Sons (not: Wiley, John Wiley & Sons Inc.) + + ## DEdd We only had one 'Wiley' where I removed a stray ".com" + +o As a reminder, + - Please make sure that all software packages are \cite{}'d properly. + + - All references should be in title style. + + - See FAQ for specific reference instructions. + + ## DEdd Update bibliography to current version numbers, and title styled + +Code: + +o As a reminder, please make sure that the files needed to replicate all code/examples within the manuscript are included in a standalone replication script. From noreply at r-forge.r-project.org Mon Apr 13 02:08:37 2015 From: noreply at r-forge.r-project.org (noreply at r-forge.r-project.org) Date: Mon, 13 Apr 2015 02:08:37 +0200 (CEST) Subject: [Rprotobuf-commits] r940 - papers/jss Message-ID: <20150413000837.AC3B61879E5@r-forge.r-project.org> Author: edd Date: 2015-04-13 02:08:37 +0200 (Mon, 13 Apr 2015) New Revision: 940 Modified: papers/jss/article.R papers/jss/article.Rnw papers/jss/article.bib Log: today's changes Modified: papers/jss/article.R =================================================================== --- papers/jss/article.R 2015-04-13 00:07:19 UTC (rev 939) +++ papers/jss/article.R 2015-04-13 00:08:37 UTC (rev 940) @@ -108,23 +108,15 @@ ################################################### -### code chunk number 14: article.Rnw:719-722 +### code chunk number 14: article.Rnw:805-808 ################################################### -f <- tutorial.Person$fileDescriptor() -f -f$Person - - -################################################### -### code chunk number 15: article.Rnw:785-788 -################################################### if (!exists("JSSPaper.Example1", "RProtoBuf:DescriptorPool")) { readProtoFiles(file="int64.proto") } ################################################### -### code chunk number 16: article.Rnw:810-814 +### code chunk number 15: article.Rnw:830-834 ################################################### as.integer(2^31-1) as.integer(2^31 - 1) + as.integer(1) @@ -133,20 +125,20 @@ ################################################### -### code chunk number 17: article.Rnw:826-827 +### code chunk number 16: article.Rnw:846-847 ################################################### 2^53 == (2^53 + 1) ################################################### -### code chunk number 18: article.Rnw:878-880 +### code chunk number 17: article.Rnw:898-900 ################################################### msg <- serialize_pb(iris, NULL) identical(iris, unserialize_pb(msg)) ################################################### -### code chunk number 19: article.Rnw:908-911 +### code chunk number 18: article.Rnw:928-931 ################################################### datasets <- as.data.frame(data(package="datasets")$results) datasets$name <- sub("\\s+.*$", "", datasets$Item) @@ -154,7 +146,7 @@ ################################################### -### code chunk number 20: article.Rnw:929-972 +### code chunk number 19: article.Rnw:949-992 ################################################### datasets$object.size <- unname(sapply(datasets$name, function(x) object.size(eval(as.name(x))))) @@ -202,7 +194,7 @@ ################################################### -### code chunk number 21: SER +### code chunk number 20: SER ################################################### old.mar<-par("mar") new.mar<-old.mar @@ -254,7 +246,7 @@ ################################################### -### code chunk number 22: article.Rnw:1211-1215 +### code chunk number 21: article.Rnw:1231-1235 ################################################### require(HistogramTools) readProtoFiles(package="HistogramTools") @@ -263,7 +255,7 @@ ################################################### -### code chunk number 23: article.Rnw:1303-1310 (eval = FALSE) +### code chunk number 22: article.Rnw:1323-1330 (eval = FALSE) ################################################### ## library("RProtoBuf") ## library("httr") @@ -275,7 +267,7 @@ ################################################### -### code chunk number 24: article.Rnw:1360-1376 (eval = FALSE) +### code chunk number 23: article.Rnw:1380-1396 (eval = FALSE) ################################################### ## library("httr") ## library("RProtoBuf") @@ -296,7 +288,7 @@ ################################################### -### code chunk number 25: article.Rnw:1380-1383 (eval = FALSE) +### code chunk number 24: article.Rnw:1400-1403 (eval = FALSE) ################################################### ## fnargs <- unserialize_pb(inputmsg) ## val <- do.call(stats::rnorm, fnargs) Modified: papers/jss/article.Rnw =================================================================== --- papers/jss/article.Rnw 2015-04-13 00:07:19 UTC (rev 939) +++ papers/jss/article.Rnw 2015-04-13 00:08:37 UTC (rev 940) @@ -612,7 +612,7 @@ \begin{itemize} \item The functional dispatch mechanism of the the form - \verb|method(object, arguments)| (common to \proglang{R}), and + \verb|method(object, arguments)| (common to \proglang{R}). \item The message passing object-oriented notation of the form \verb|object$method(arguments)|. \end{itemize} @@ -716,12 +716,30 @@ The \verb|$| operator can be used to retrieve named fields defined in the FileDescriptor, or to invoke methods. -<<>>= -f <- tutorial.Person$fileDescriptor() -f -f$Person -@ +% < < > > = +% f <- tutorial.Person$fileDescriptor() +% f +% f$Person +% @ +\begin{Schunk} +\begin{Sinput} +R> f <- tutorial.Person$fileDescriptor() +R> f +\end{Sinput} +\begin{Soutput} +file descriptor for package tutorial \ + (/usr/local/lib/R/site-library/RProtoBuf/proto/addressbook.proto) +\end{Soutput} +\begin{Sinput} +R> f$Person +\end{Sinput} +\begin{Soutput} +descriptor for type 'tutorial.Person' +\end{Soutput} +\end{Schunk} + + \section{Type coercion} \label{sec:types} @@ -739,7 +757,7 @@ \begin{table}[h] \centering \begin{small} -\begin{tabular}{lp{5cm}p{5cm}} +\begin{tabular}{lp{5cm}p{5.5cm}} \toprule Field type & \proglang{R} type (non repeated) & \proglang{R} type (repeated) \\ \cmidrule(r){2-3} @@ -765,10 +783,12 @@ \end{tabular} \end{small} \caption{\label{table-get-types}Correspondence between field type and - \proglang{R} type retrieved by the extractors. Note that \proglang{R} lacks native + \proglang{R} type retrieved by the extractors. \proglang{R} lacks native 64-bit integers, so the \code{RProtoBuf.int64AsString} option is available to return large integers as characters to avoid losing - precision. This option is described in Section~\ref{sec:int64}.} + precision; see Section~\ref{sec:int64} below. + All but the \code{Message} type can be represented in vectors of one or + more elements; for the latter a list is used.} \end{table} \subsection{Booleans} @@ -1067,8 +1087,8 @@ %\begin{center} \begin{tabular}{rlrrrrr} \toprule - Data Set & object.size & \multicolumn{2}{c}{\proglang{R} Serialization} & - \multicolumn{2}{c}{RProtoBuf Serialization} \\ + Data set & object.size & \multicolumn{2}{c}{\proglang{R} serialization} & + \multicolumn{2}{c}{RProtoBuf serialization} \\ & & default & gzipped & default & gzipped \\ \cmidrule(r){2-6} crimtab & 7,936 & 4,641 (41.5\%) & 713 (91.0\%) & 1,655 (79.2\%) & 576 (92.7\%)\\ Modified: papers/jss/article.bib =================================================================== --- papers/jss/article.bib 2015-04-13 00:07:19 UTC (rev 939) +++ papers/jss/article.bib 2015-04-13 00:08:37 UTC (rev 940) @@ -7,7 +7,7 @@ publisher={JSTOR} } @article{azzalini1990look, - title={A look at some data on the Old Faithful geyser}, + title={A Look at Some Data on the Old Faithful Geyser}, author={Azzalini, A and Bowman, AW}, journal={Applied Statistics}, pages={357--365}, @@ -15,7 +15,7 @@ publisher={JSTOR} } @article{dean2009designs, - title={Designs, lessons and advice from building large distributed systems}, + title={Designs, Lessons and Advice from Building Large Distributed Systems}, author={Dean, Jeff}, journal={Keynote from LADIS}, year={2009} @@ -92,11 +92,10 @@ year = 2013 } - at article{clinec++, + at Manual{clinec++, title = {C++ FAQ}, author = {Marshall Cline}, - journal = {Also available as - http://www. parashift. com/c++-faq-lite/index. html}, + url = {http://www.parashift.com/c++-faq-lite/index.html}, year = 2013 } @@ -104,16 +103,16 @@ title = {RJSONIO: Serialize R objects to JSON, JavaScript Object Notation}, author = {Duncan {Temple Lang}}, - year = 2011, - note = {R package version 0.96-0}, + year = 2014, + note = {R package version 1.3-0}, url = {http://CRAN.R-project.org/package=RJSONIO}, } @Manual{rjson, title = {rjson: JSON for R}, author = {Alex Couture-Beil}, - year = 2012, - note = {R package version 0.2.10}, + year = 2014, + note = {R package version 0.2.15}, url = {http://CRAN.R-project.org/package=rjson}, } @@ -137,15 +136,15 @@ title = {int64: 64 Bit Integer Types}, author = {Romain Fran{\c{c}}ois}, year = 2011, - note = {R package version 1.1.2}, - url = {http://CRAN.R-project.org/package=int64}, + note = {Archived R package version 1.1.2}, + url = {http://cran.r-project.org/src/contrib/Archive/int64/}, } @Manual{bit64, - title = {bit64: A S3 class for Vectors of 64bit Integers}, + title = {bit64: A S3 Class for Vectors of 64bit Integers}, author = {Jens Oehlschl\"{a}gel}, - year = 2012, - note = {R package version 0.9-3}, + year = 2014, + note = {R package version 0.9-4}, url = {http://CRAN.R-project.org/package=bit64}, } @@ -446,14 +445,14 @@ author = {David W Scott}, volume = 383, year = 2009, - publisher = {Wiley. com} + publisher = {Wiley} } @Manual{httr, title = {httr: Tools for Working with URLs and HTTP}, author = {Hadley Wickham}, - year = 2014, - note = {R package version 0.3}, + year = 2015, + note = {R package version 0.6.1}, url = {http://CRAN.R-project.org/package=httr}, } @@ -512,7 +511,7 @@ author = {{Apache Software Foundation}}, title = {Apache Avro}, url = {http://avro.apache.org}, - note = {Data Serialization System, Version 1.7.6}, + note = {Data Serialization System, Version 1.7.7}, year = 2014 } @@ -520,7 +519,7 @@ author = {{Apache Software Foundation}}, title = {Apache Thrift}, url = {http://thrift.apache.org}, - note = {Software Framework for Scalable Cross-Language Services, Version 0.9.1}, - year = 2013 + note = {Software Framework for Scalable Cross-Language Services, Version 0.9.2}, + year = 2014 } From noreply at r-forge.r-project.org Mon Apr 13 02:28:03 2015 From: noreply at r-forge.r-project.org (noreply at r-forge.r-project.org) Date: Mon, 13 Apr 2015 02:28:03 +0200 (CEST) Subject: [Rprotobuf-commits] r941 - papers/jss Message-ID: <20150413002803.42F3A1877CF@r-forge.r-project.org> Author: edd Date: 2015-04-13 02:28:02 +0200 (Mon, 13 Apr 2015) New Revision: 941 Modified: papers/jss/article.bib Log: reformated and sorted (aka two emacs function calls) Modified: papers/jss/article.bib =================================================================== --- papers/jss/article.bib 2015-04-13 00:08:37 UTC (rev 940) +++ papers/jss/article.bib 2015-04-13 00:28:02 UTC (rev 941) @@ -1,80 +1,160 @@ - at article{garson1900metric, - title={The Metric System of Identification of Criminals, as Used in Great Britain and Ireland}, - author={Garson, John George}, - journal={Journal of the Anthropological Institute of Great Britain and Ireland}, - pages={161--198}, - year={1900}, - publisher={JSTOR} + at Misc{Apache:Avro, + author = {{Apache Software Foundation}}, + title = {Apache Avro}, + url = {http://avro.apache.org}, + note = {Data Serialization System, Version 1.7.7}, + year = 2014 } - at article{azzalini1990look, - title={A Look at Some Data on the Old Faithful Geyser}, - author={Azzalini, A and Bowman, AW}, - journal={Applied Statistics}, - pages={357--365}, - year={1990}, - publisher={JSTOR} + + at Misc{Apache:Thrift, + author = {{Apache Software Foundation}}, + title = {Apache Thrift}, + url = {http://thrift.apache.org}, + note = {Software Framework for Scalable Cross-Language + Services, Version 0.9.2}, + year = 2014 } - at article{dean2009designs, - title={Designs, Lessons and Advice from Building Large Distributed Systems}, - author={Dean, Jeff}, - journal={Keynote from LADIS}, - year={2009} + + at Manual{CRAN:Rserve, + title = {Rserve: Binary R server}, + author = {Simon Urbanek}, + year = 2013, + note = {R package version 1.7-3}, + url = {http://CRAN.R-Project.org/package=Rserve} } - at article{eddelbuettel2011rcpp, - title = {Rcpp: Seamless R and C++ Integration}, - author = {Dirk Eddelbuettel and Romain Fran{\c{c}}ois}, - journal = {Journal of Statistical Software}, - volume = 40, - number = 8, - pages = {1--18}, - year = 2011 + + at article{Manku:1998:AMO:276305.276342, + author = {Gurmeet Singh Manku and Sridhar Rajagopalan and + Bruce G. Lindsay}, + title = {Approximate medians and other quantiles in one pass + and with limited memory}, + journal = {SIGMOD Rec.}, + issue_date = {June 1998}, + volume = 27, + number = 2, + month = jun, + year = 1998, + issn = {0163-5808}, + pages = {426--435}, + numpages = 10, + url = {http://doi.acm.org/10.1145/276305.276342}, + doi = {10.1145/276305.276342}, + acmid = 276342, + publisher = {ACM}, + address = {New York, NY, USA}, } - at inproceedings{dremel, - title = {Dremel: Interactive Analysis of Web-Scale Datasets}, - author = {Sergey Melnik and Andrey Gubarev and Jing Jing Long - and Geoffrey Romer and Shiva Shivakumar and Matt - Tolton and Theo Vassilakis}, - year = 2010, - URL = {http://www.vldb2010.org/accept.htm}, - booktitle = {Proc. of the 36th Int'l Conf on Very Large Data - Bases}, - pages = {330-339} + + at article{Pike:2005:IDP:1239655.1239658, + author = {Rob Pike and Sean Dorward and Robert Griesemer and + Sean Quinlan}, + title = {Interpreting the data: Parallel analysis with + Sawzall}, + journal = {Sci. Program.}, + issue_date = {October 2005}, + volume = 13, + number = 4, + month = oct, + year = 2005, + issn = {1058-9244}, + pages = {277--298}, + numpages = 22, + acmid = 1239658, + publisher = {IOS Press}, + address = {Amsterdam, The Netherlands, The Netherlands}, } - at Manual{msgpackR, - title = {msgpackR: A library to serialize or unserialize data - in MessagePack format}, - author = {Mikiya Tanizawa}, - year = 2013, - note = {R package version 1.1}, - url = {http://CRAN.R-project.org/package=msgpackR}, + + at Manual{RJSONIO, + title = {RJSONIO: Serialize R objects to JSON, JavaScript + Object Notation}, + author = {Duncan {Temple Lang}}, + year = 2014, + note = {R package version 1.3-0}, + url = {http://CRAN.R-project.org/package=RJSONIO}, } - at inproceedings{sciencecloud, - title = {Projecting Disk Usage Based on Historical Trends in - a Cloud Environment}, - author = {Murray Stokely and Amaan Mehrabian and Christoph - Albrecht and Francois Labelle and Arif Merchant}, + + at Manual{RObjectTables, + title = {User-Defined Tables in the R Search Path}, + author = {Duncan {Temple Lang}}, year = 2012, - booktitle = {ScienceCloud 2012 Proceedings of the 3rd - International Workshop on Scientific Cloud - Computing}, - pages = {63--70} + url = + {http://www.omegahat.org/RObjectTables/RObjectTables.pdf}, } - at inproceedings{janus, - title = {Janus: Optimal Flash Provisioning for Cloud Storage - Workloads}, - author = {Christoph Albrecht and Arif Merchant and Murray - Stokely and Muhammad Waliji and Francois Labelle and - Nathan Coehlo and Xudong Shi and Eric Schrock}, - year = 2013, - URL = - {https://www.usenix.org/system/files/conference/atc13/atc13-albrecht.pdf}, - booktitle = {Proceedings of the USENIX Annual Technical - Conference}, - pages = {91--102}, - address = {2560 Ninth Street, Suite 215, Berkeley, CA 94710, - USA} + + at inproceedings{Sumaray:2012:CDS:2184751.2184810, + author = {Audie Sumaray and S. Kami Makki}, + title = {A Comparison of Data Serialization Formats for + Optimal Efficiency on a Mobile Platform}, + booktitle = {Proceedings of the 6th International Conference on + Ubiquitous Information Management and Communication}, + series = {ICUIMC '12}, + year = 2012, + isbn = {978-1-4503-1172-4}, + location = {Kuala Lumpur, Malaysia}, + pages = {48:1--48:6}, + articleno = 48, + numpages = 6, + url = {http://doi.acm.org/10.1145/2184751.2184810}, + doi = {10.1145/2184751.2184810}, + acmid = 2184810, + publisher = {ACM}, + address = {New York, NY, USA}, + keywords = {Android, Dalvik, JSON, ProtoBuf, XML, data + serialization, thrift}, } + at InProceedings{Urbanek:2003:Rserve, + author = {Simon Urbanek}, + title = {{Rserve}: A Fast Way to Provide {R} Functionality to + Applications}, + booktitle = {Proceedings of the 3rd International Workshop on + Distributed Statistical Computing, Vienna, Austria}, + editor = {Kurt Hornik and Friedrich Leisch and Achim Zeileis}, + year = 2003, + url = + {http://www.ci.tuwien.ac.at/Conferences/DSC-2003/Proceedings/}, + note = {{ISSN 1609-395X}} +} + + at article{Wegiel:2010:CTT:1932682.1869479, + author = {Michal Wegiel and Chandra Krintz}, + title = {Cross-language, Type-safe, and Transparent Object + Sharing for Co-located Managed Runtimes}, + journal = {SIGPLAN Not.}, + issue_date = {October 2010}, + volume = 45, + number = 10, + month = oct, + year = 2010, + issn = {0362-1340}, + pages = {223--240}, + numpages = 18, + url = {http://doi.acm.org/10.1145/1932682.1869479}, + doi = {10.1145/1932682.1869479}, + acmid = 1869479, + publisher = {ACM}, + address = {New York, NY, USA}, + keywords = {collection, communication, cross-language, garbage, + managed, memory, model, object, rpc, runtimes, + shared, synchronization, transparent, type-safe}, +} + + at article{azzalini1990look, + title = {A Look at Some Data on the Old Faithful Geyser}, + author = {Azzalini, A and Bowman, AW}, + journal = {Applied Statistics}, + pages = {357--365}, + year = 1990, + publisher = {JSTOR} +} + + at Manual{bit64, + title = {bit64: A S3 Class for Vectors of 64bit Integers}, + author = {Jens Oehlschl\"{a}gel}, + year = 2014, + note = {R package version 0.9-4}, + url = {http://CRAN.R-project.org/package=bit64}, +} + @article{blocker2013, ajournal = "Bernoulli", author = "Alexander W. Blocker and Xiao-Li Meng", @@ -92,6 +172,28 @@ year = 2013 } + at article{bostock2011d3, + title = {D$^3$ Data-Driven Documents}, + author = {Michael Bostock and Vadim Ogievetsky and Jeffrey + Heer}, + journal = {Visualization and Computer Graphics, IEEE + Transactions on}, + volume = 17, + number = 12, + pages = {2301--2309}, + year = 2011, + publisher = {IEEE} +} + + at inproceedings{cantrill2004dynamic, + title = {Dynamic Instrumentation of Production Systems.}, + author = {Bryan Cantrill and Michael W Shapiro and Adam H + Leventhal and others}, + booktitle = {USENIX Annual Technical Conference, General Track}, + pages = {15--28}, + year = 2004 +} + @Manual{clinec++, title = {C++ FAQ}, author = {Marshall Cline}, @@ -99,53 +201,55 @@ year = 2013 } - at Manual{RJSONIO, - title = {RJSONIO: Serialize R objects to JSON, JavaScript - Object Notation}, - author = {Duncan {Temple Lang}}, - year = 2014, - note = {R package version 1.3-0}, - url = {http://CRAN.R-project.org/package=RJSONIO}, + at article{dean2008mapreduce, + title = {MapReduce: Simplified Data Processing on Large + Clusters}, + author = {Jeffrey Dean and Sanjay Ghemawat}, + journal = {Communications of the ACM}, + volume = 51, + number = 1, + pages = {107--113}, + year = 2008, + publisher = {ACM} } - at Manual{rjson, - title = {rjson: JSON for R}, - author = {Alex Couture-Beil}, - year = 2014, - note = {R package version 0.2.15}, - url = {http://CRAN.R-project.org/package=rjson}, + at article{dean2009designs, + title = {Designs, Lessons and Advice from Building Large + Distributed Systems}, + author = {Dean, Jeff}, + journal = {Keynote from LADIS}, + year = 2009 } - at article{jsonlite, - title = {The jsonlite Package: A Practical and Consistent Mapping Between JSON Data and R Objects}, - journal = {arXiv: Computation (stat.CO); Mathematical Software (cs.MS); Software Engineering (cs.SE)}, - author = {Jeroen Ooms}, - year = 2014, - url = {http://arxiv.org/abs/1403.2805}, + at inproceedings{dremel, + title = {Dremel: Interactive Analysis of Web-Scale Datasets}, + author = {Sergey Melnik and Andrey Gubarev and Jing Jing Long + and Geoffrey Romer and Shiva Shivakumar and Matt + Tolton and Theo Vassilakis}, + year = 2010, + URL = {http://www.vldb2010.org/accept.htm}, + booktitle = {Proc. of the 36th Int'l Conf on Very Large Data + Bases}, + pages = {330-339} } - at Manual{rmongodb, - title = {rmongodb: R-MongoDB driver}, - author = {Gerald Lindsly}, - year = 2013, - note = {R package version 1.3.3}, - url = {http://CRAN.R-project.org/package=rmongodb}, + at article{eddelbuettel2011rcpp, + title = {Rcpp: Seamless R and C++ Integration}, + author = {Dirk Eddelbuettel and Romain Fran{\c{c}}ois}, + journal = {Journal of Statistical Software}, + volume = 40, + number = 8, + pages = {1--18}, + year = 2011 } - at Manual{int64, - title = {int64: 64 Bit Integer Types}, - author = {Romain Fran{\c{c}}ois}, - year = 2011, - note = {Archived R package version 1.1.2}, - url = {http://cran.r-project.org/src/contrib/Archive/int64/}, -} - - at Manual{bit64, - title = {bit64: A S3 Class for Vectors of 64bit Integers}, - author = {Jens Oehlschl\"{a}gel}, + at manual{eddelbuettel2013exposing, + title = {Exposing C++ Functions and Classes with Rcpp + Modules}, + author = {Dirk Eddelbuettel and Romain Fran{\c{c}}ois}, year = 2014, - note = {R package version 0.9-4}, - url = {http://CRAN.R-project.org/package=bit64}, + note = {Vignette included in R package Rcpp}, + url = {http://CRAN.R-project.org/package=Rcpp}, } @book{eddelbuettel2013seamless, @@ -155,62 +259,88 @@ publisher = {Springer-Verlag} } - at Manual{rhipe, - title = {RHIPE: A Distributed Environment for the Analysis of - Large and Complex Datasets}, - author = {Saptarshi Guha}, - year = 2010, - url = {http://www.stat.purdue.edu/~sguha/rhipe/}, + at Manual{emdist, + title = {emdist: Earth Mover's Distance}, + author = {Simon Urbanek and Yossi Rubner}, + year = 2012, + note = {R package version 0.3-1}, + url = {http://cran.r-project.org/package=emdist}, } - at misc{serialization, - author = {Luke Tierney}, - title = {A New Serialization Mechanism for R}, - url = {http://www.cs.uiowa.edu/~luke/R/serialize/serialize.ps}, - year = 2003, + at inproceedings{fang1999computing, + title = {Computing Iceberg Queries Efficiently.}, + author = {Min Fang and Narayanan Shivakumar and Hector + Garcia-Molina and Rajeev Motwani and Jeffrey D + Ullman}, + booktitle = {Internaational Conference on Very Large Databases + (VLDB'98), New York, August 1998}, + year = 1999, + organization = {Stanford InfoLab} } - at manual{eddelbuettel2013exposing, - title = {Exposing C++ Functions and Classes with Rcpp - Modules}, - author = {Dirk Eddelbuettel and Romain Fran{\c{c}}ois}, - year = 2014, - note = {Vignette included in R package Rcpp}, - url = {http://CRAN.R-project.org/package=Rcpp}, + at article{garson1900metric, + title = {The Metric System of Identification of Criminals, as + Used in Great Britain and Ireland}, + author = {Garson, John George}, + journal = {Journal of the Anthropological Institute of Great + Britain and Ireland}, + pages = {161--198}, + year = 1900, + publisher = {JSTOR} } - at inproceedings{cantrill2004dynamic, - title = {Dynamic Instrumentation of Production Systems.}, - author = {Bryan Cantrill and Michael W Shapiro and Adam H - Leventhal and others}, - booktitle = {USENIX Annual Technical Conference, General Track}, - pages = {15--28}, - year = 2004 + at Manual{histogramtools, + title = {HistogramTools: Utility Functions for R Histograms}, + author = {Murray Stokely}, + year = 2013, + note = {R package version 0.3}, + url = + {https://r-forge.r-project.org/projects/histogramtools/}, } - at article{swain1991color, - title = {Color indexing}, - author = {Michael J Swain and Dana H Ballard}, - journal = {International journal of computer vision}, - volume = 7, - number = 1, - pages = {11--32}, - year = 1991, - publisher = {Springer-Verlag} + at Manual{httr, + title = {httr: Tools for Working with URLs and HTTP}, + author = {Hadley Wickham}, + year = 2015, + note = {R package version 0.6.1}, + url = {http://CRAN.R-project.org/package=httr}, } - at article{rubner2000earth, - title = {The earth mover's distance as a metric for image - retrieval}, - author = {Yossi Rubner and Carlo Tomasi and Leonidas J Guibas}, - journal = {International Journal of Computer Vision}, - volume = 40, - number = 2, - pages = {99--121}, - year = 2000, - publisher = {Springer-Verlag} + at Manual{int64, + title = {int64: 64 Bit Integer Types}, + author = {Romain Fran{\c{c}}ois}, + year = 2011, + note = {Archived R package version 1.1.2}, + url = + {http://cran.r-project.org/src/contrib/Archive/int64/}, } + at inproceedings{janus, + title = {Janus: Optimal Flash Provisioning for Cloud Storage + Workloads}, + author = {Christoph Albrecht and Arif Merchant and Murray + Stokely and Muhammad Waliji and Francois Labelle and + Nathan Coehlo and Xudong Shi and Eric Schrock}, + year = 2013, + URL = + {https://www.usenix.org/system/files/conference/atc13/atc13-albrecht.pdf}, + booktitle = {Proceedings of the USENIX Annual Technical + Conference}, + pages = {91--102}, + address = {2560 Ninth Street, Suite 215, Berkeley, CA 94710, + USA} +} + + at article{jsonlite, + title = {The jsonlite Package: A Practical and Consistent + Mapping Between JSON Data and R Objects}, + journal = {arXiv: Computation (stat.CO); Mathematical Software + (cs.MS); Software Engineering (cs.SE)}, + author = {Jeroen Ooms}, + year = 2014, + url = {http://arxiv.org/abs/1403.2805}, +} + @book{kullback1997information, title = {Information theory and statistics}, author = {Solomon Kullback}, @@ -218,6 +348,53 @@ publisher = {Courier Dover Publications} } + at Manual{msgpackR, + title = {msgpackR: A library to serialize or unserialize data + in MessagePack format}, + author = {Mikiya Tanizawa}, + year = 2013, + note = {R package version 1.1}, + url = {http://CRAN.R-project.org/package=msgpackR}, +} + + at Manual{nlme, + title = {nlme: Linear and Nonlinear Mixed Effects Models}, + author = {Jos\'{e} Pinheiro and Douglas Bates and Saikat + DebRoy and Deepayan Sarkar and {EISPACK authors} and + {R Core}}, + year = 2013, + note = {R package version 3.1-113}, + url = {http://CRAN.R-project.org/package=nlme}, +} + + at book{nolan2013xml, + title = {XML and Web Technologies for Data Sciences with R}, + author = {Deborah Nolan and Duncan {Temple Lang}}, + year = 2013, + publisher = {Springer-Verlag} +} + + at article{opencpu, + journal = {arXiv: Computation (stat.CO); Mathematical Software + (cs.MS); Software Engineering (cs.SE)}, + title = {The OpenCPU System: Towards a Universal Interface + for Scientific Computing through Separation of + Concerns}, + author = {Jeroen Ooms}, + year = 2014, + url = {http://arxiv.org/abs/1406.4806}, +} +% celebrated article in this field. Also see the parallel paragraph. + + at Manual{protobuf, + title = {Protocol Buffers: Developer Guide}, + author = {Google}, + year = 2012, + url = + {http://code.google.com/apis/protocolbuffers/docs/overview.html} +} +% Has a section on protocol buffers + @inproceedings{puzicha1997non, title = {Non-parametric similarity measures for unsupervised texture segmentation and image retrieval}, @@ -231,89 +408,40 @@ organization = {IEEE} } - at inproceedings{fang1999computing, - title = {Computing Iceberg Queries Efficiently.}, - author = {Min Fang and Narayanan Shivakumar and Hector - Garcia-Molina and Rajeev Motwani and Jeffrey D - Ullman}, - booktitle = {Internaational Conference on Very Large Databases - (VLDB'98), New York, August 1998}, - year = 1999, - organization = {Stanford InfoLab} + at Manual{r, + title = {R: A Language and Environment for Statistical + Computing}, + author = {{R Core Team}}, + organization = {R Foundation for Statistical Computing}, + address = {Vienna, Austria}, + year = 2014, + url = {http://www.R-project.org/}, } - at Manual{emdist, - title = {emdist: Earth Mover's Distance}, - author = {Simon Urbanek and Yossi Rubner}, - year = 2012, - note = {R package version 0.3-1}, - url = {http://cran.r-project.org/package=emdist}, -} - - at article{Wegiel:2010:CTT:1932682.1869479, - author = {Michal Wegiel and Chandra Krintz}, - title = {Cross-language, Type-safe, and Transparent Object - Sharing for Co-located Managed Runtimes}, - journal = {SIGPLAN Not.}, - issue_date = {October 2010}, - volume = 45, - number = 10, - month = oct, + at Manual{rhipe, + title = {RHIPE: A Distributed Environment for the Analysis of + Large and Complex Datasets}, + author = {Saptarshi Guha}, year = 2010, - issn = {0362-1340}, - pages = {223--240}, - numpages = 18, - url = {http://doi.acm.org/10.1145/1932682.1869479}, - doi = {10.1145/1932682.1869479}, - acmid = 1869479, - publisher = {ACM}, - address = {New York, NY, USA}, - keywords = {collection, communication, cross-language, garbage, - managed, memory, model, object, rpc, runtimes, - shared, synchronization, transparent, type-safe}, + url = {http://www.stat.purdue.edu/~sguha/rhipe/}, } - at article{wickham2011split, - title = {The split-apply-combine strategy for data analysis}, - author = {Hadley Wickham}, - journal = {Journal of Statistical Software}, - volume = 40, - number = 1, - pages = {1--29}, - year = 2011, - publisher = {Citeseer} + at Manual{rjson, + title = {rjson: JSON for R}, + author = {Alex Couture-Beil}, + year = 2014, + note = {R package version 0.2.15}, + url = {http://CRAN.R-project.org/package=rjson}, } - at inproceedings{Sumaray:2012:CDS:2184751.2184810, - author = {Audie Sumaray and S. Kami Makki}, - title = {A Comparison of Data Serialization Formats for - Optimal Efficiency on a Mobile Platform}, - booktitle = {Proceedings of the 6th International Conference on - Ubiquitous Information Management and Communication}, - series = {ICUIMC '12}, - year = 2012, - isbn = {978-1-4503-1172-4}, - location = {Kuala Lumpur, Malaysia}, - pages = {48:1--48:6}, - articleno = 48, - numpages = 6, - url = {http://doi.acm.org/10.1145/2184751.2184810}, - doi = {10.1145/2184751.2184810}, - acmid = 2184810, - publisher = {ACM}, - address = {New York, NY, USA}, - keywords = {Android, Dalvik, JSON, ProtoBuf, XML, data - serialization, thrift}, + at Manual{rmongodb, + title = {rmongodb: R-MongoDB driver}, + author = {Gerald Lindsly}, + year = 2013, + note = {R package version 1.3.3}, + url = {http://CRAN.R-project.org/package=rmongodb}, } - at Manual{RObjectTables, - title = {User-Defined Tables in the R Search Path}, - author = {Duncan {Temple Lang}}, - year = 2012, - url = - {http://www.omegahat.org/RObjectTables/RObjectTables.pdf}, -} - @Manual{rprotobuf, title = {RProtoBuf: R Interface to the Protocol Buffers API}, author = {Romain Francois and Dirk Eddelbuettel and Murray @@ -324,110 +452,30 @@ {http://cran.r-project.org/web/packages/RProtoBuf/index.html}, } - at Manual{r, - title = {R: A Language and Environment for Statistical - Computing}, - author = {{R Core Team}}, - organization = {R Foundation for Statistical Computing}, - address = {Vienna, Austria}, - year = 2014, - url = {http://www.R-project.org/}, -} - - at article{dean2008mapreduce, - title = {MapReduce: Simplified Data Processing on Large - Clusters}, - author = {Jeffrey Dean and Sanjay Ghemawat}, - journal = {Communications of the ACM}, - volume = 51, - number = 1, - pages = {107--113}, - year = 2008, - publisher = {ACM} -} - - at article{bostock2011d3, - title = {D$^3$ Data-Driven Documents}, - author = {Michael Bostock and Vadim Ogievetsky and Jeffrey - Heer}, - journal = {Visualization and Computer Graphics, IEEE - Transactions on}, - volume = 17, - number = 12, - pages = {2301--2309}, - year = 2011, - publisher = {IEEE} -} -% celebrated article in this field. Also see the parallel paragraph. - - at article{Manku:1998:AMO:276305.276342, - author = {Gurmeet Singh Manku and Sridhar Rajagopalan and - Bruce G. Lindsay}, - title = {Approximate medians and other quantiles in one pass - and with limited memory}, - journal = {SIGMOD Rec.}, - issue_date = {June 1998}, - volume = 27, + at article{rubner2000earth, + title = {The earth mover's distance as a metric for image + retrieval}, + author = {Yossi Rubner and Carlo Tomasi and Leonidas J Guibas}, + journal = {International Journal of Computer Vision}, + volume = 40, number = 2, - month = jun, - year = 1998, - issn = {0163-5808}, - pages = {426--435}, - numpages = 10, - url = {http://doi.acm.org/10.1145/276305.276342}, - doi = {10.1145/276305.276342}, - acmid = 276342, - publisher = {ACM}, - address = {New York, NY, USA}, -} -% Has a section on protocol buffers - - at article{Pike:2005:IDP:1239655.1239658, - author = {Rob Pike and Sean Dorward and Robert Griesemer and - Sean Quinlan}, - title = {Interpreting the data: Parallel analysis with - Sawzall}, - journal = {Sci. Program.}, - issue_date = {October 2005}, - volume = 13, - number = 4, - month = oct, - year = 2005, - issn = {1058-9244}, - pages = {277--298}, - numpages = 22, - acmid = 1239658, - publisher = {IOS Press}, - address = {Amsterdam, The Netherlands, The Netherlands}, + pages = {99--121}, + year = 2000, + publisher = {Springer-Verlag} } - at Manual{protobuf, - title = {Protocol Buffers: Developer Guide}, - author = {Google}, + at inproceedings{sciencecloud, + title = {Projecting Disk Usage Based on Historical Trends in + a Cloud Environment}, + author = {Murray Stokely and Amaan Mehrabian and Christoph + Albrecht and Francois Labelle and Arif Merchant}, year = 2012, - url = - {http://code.google.com/apis/protocolbuffers/docs/overview.html} + booktitle = {ScienceCloud 2012 Proceedings of the 3rd + International Workshop on Scientific Cloud + Computing}, + pages = {63--70} } - at article{sturges1926choice, - title = {The choice of a class interval}, - author = {Herbert A Sturges}, - journal = {Journal of the American Statistical Association}, - volume = 21, - number = 153, - pages = {65--66}, - year = 1926 -} - - at Manual{histogramtools, - title = {HistogramTools: Utility Functions for R Histograms}, - author = {Murray Stokely}, - year = 2013, - note = {R package version 0.3}, - url = - {https://r-forge.r-project.org/projects/histogramtools/}, -} - @article{scott1979optimal, title = {On Optimal and Data-Based Histograms}, author = {David W Scott}, @@ -448,22 +496,14 @@ publisher = {Wiley} } - at Manual{httr, - title = {httr: Tools for Working with URLs and HTTP}, - author = {Hadley Wickham}, - year = 2015, - note = {R package version 0.6.1}, - url = {http://CRAN.R-project.org/package=httr}, + at misc{serialization, + author = {Luke Tierney}, + title = {A New Serialization Mechanism for R}, + url = + {http://www.cs.uiowa.edu/~luke/R/serialize/serialize.ps}, + year = 2003, } - at article{opencpu, - journal = {arXiv: Computation (stat.CO); Mathematical Software (cs.MS); Software Engineering (cs.SE)}, - title = {The OpenCPU System: Towards a Universal Interface for Scientific Computing through Separation of Concerns}, - author = {Jeroen Ooms}, - year = 2014, - url = {http://arxiv.org/abs/1406.4806}, -} - @article{shafranovich2005common, title = {Common Format and Mime Type for Comma-Separated Values (csv) Files}, @@ -472,54 +512,34 @@ url = {http://tools.ietf.org/html/rfc4180} } - at book{nolan2013xml, - title = {XML and Web Technologies for Data Sciences with R}, - author = {Deborah Nolan and Duncan {Temple Lang}}, - year = 2013, - publisher = {Springer-Verlag} + at article{sturges1926choice, + title = {The choice of a class interval}, + author = {Herbert A Sturges}, + journal = {Journal of the American Statistical Association}, + volume = 21, + number = 153, + pages = {65--66}, + year = 1926 } - at Manual{nlme, - title = {nlme: Linear and Nonlinear Mixed Effects Models}, - author = {Jos\'{e} Pinheiro and Douglas Bates and Saikat DebRoy and Deepayan Sarkar and {EISPACK authors} and {R Core}}, - year = 2013, - note = {R package version 3.1-113}, - url = {http://CRAN.R-project.org/package=nlme}, + at article{swain1991color, + title = {Color indexing}, + author = {Michael J Swain and Dana H Ballard}, + journal = {International journal of computer vision}, + volume = 7, + number = 1, + pages = {11--32}, + year = 1991, + publisher = {Springer-Verlag} } - at Manual{CRAN:Rserve, - title = {Rserve: Binary R server}, - author = {Simon Urbanek}, - year = 2013, - note = {R package version 1.7-3}, - url = {http://CRAN.R-Project.org/package=Rserve} + at article{wickham2011split, + title = {The split-apply-combine strategy for data analysis}, + author = {Hadley Wickham}, + journal = {Journal of Statistical Software}, + volume = 40, + number = 1, + pages = {1--29}, + year = 2011, + publisher = {Citeseer} } - - at InProceedings{Urbanek:2003:Rserve, - author = {Simon Urbanek}, - title = {{Rserve}: A Fast Way to Provide {R} Functionality to - Applications}, - booktitle = {Proceedings of the 3rd International Workshop on Distributed - Statistical Computing, Vienna, Austria}, - editor = {Kurt Hornik and Friedrich Leisch and Achim Zeileis}, - year = {2003}, - url = {http://www.ci.tuwien.ac.at/Conferences/DSC-2003/Proceedings/}, - note = {{ISSN 1609-395X}} -} - - at Misc{Apache:Avro, - author = {{Apache Software Foundation}}, - title = {Apache Avro}, - url = {http://avro.apache.org}, - note = {Data Serialization System, Version 1.7.7}, - year = 2014 -} - - at Misc{Apache:Thrift, - author = {{Apache Software Foundation}}, - title = {Apache Thrift}, - url = {http://thrift.apache.org}, - note = {Software Framework for Scalable Cross-Language Services, Version 0.9.2}, - year = 2014 -} - From noreply at r-forge.r-project.org Mon Apr 13 19:51:23 2015 From: noreply at r-forge.r-project.org (noreply at r-forge.r-project.org) Date: Mon, 13 Apr 2015 19:51:23 +0200 (CEST) Subject: [Rprotobuf-commits] r942 - papers/jss Message-ID: <20150413175123.E4FDD187826@r-forge.r-project.org> Author: edd Date: 2015-04-13 19:51:23 +0200 (Mon, 13 Apr 2015) New Revision: 942 Added: papers/jss/jss1313.Rnw papers/jss/jss1313.bib Removed: papers/jss/article.Rnw papers/jss/article.bib Modified: papers/jss/Makefile Log: renaming article.Rnw to jss1313.Rnw as requested, along with support files, commit 1 Modified: papers/jss/Makefile =================================================================== --- papers/jss/Makefile 2015-04-13 00:28:02 UTC (rev 941) +++ papers/jss/Makefile 2015-04-13 17:51:23 UTC (rev 942) @@ -1,16 +1,17 @@ -all: clean article.pdf +article=jss1313 +all: clean ${article}.pdf clean: - rm -fr article.out article.aux article.log article.bbl \ - article.blg article.brf figures/fig-0??.pdf + rm -fr ${article}.out ${article}.aux ${article}.log ${article}.bbl \ + ${article}.blg ${article}.brf figures/fig-0??.pdf -article.pdf: article.Rnw - R CMD Sweave article.Rnw - pdflatex article.tex - bibtex article - pdflatex article.tex - pdflatex article.tex - R CMD Stangle article.Rnw +${article}.pdf: ${article}.Rnw + R CMD Sweave ${article}.Rnw + pdflatex ${article}.tex + bibtex ${article} + pdflatex ${article}.tex + pdflatex ${article}.tex + R CMD Stangle ${article}.Rnw jssarchive: (cd .. && zip -r jssarchive.zip jss/) Deleted: papers/jss/article.Rnw =================================================================== --- papers/jss/article.Rnw 2015-04-13 00:28:02 UTC (rev 941) +++ papers/jss/article.Rnw 2015-04-13 17:51:23 UTC (rev 942) @@ -1,1518 +0,0 @@ -\documentclass[article]{jss} -\usepackage{booktabs} -\usepackage{listings} -\usepackage[toc,page]{appendix} - -% Line numbers for drafts. -%\usepackage[switch, modulo]{lineno} -%\linenumbers - -%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -% Spelling Standardization: -% Protocol Buffers, not protocol buffers -% large-scale, not large scale -% Oxford comma: foo, bar, and baz. - - -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -%% declarations for jss.cls %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% - -% -% Local helpers to make this more compatible with R Journal style. -% -\RequirePackage{fancyvrb} -\RequirePackage{alltt} -\DefineVerbatimEnvironment{example}{Verbatim}{} -% Articles with many authors we should shorten to FirstAuthor, et al. -\shortcites{sciencecloud,janus,dremel,nlme} -\author{Dirk Eddelbuettel\\Debian Project \And - Murray Stokely\\Google, Inc \And - Jeroen Ooms\\UCLA} -\title{\pkg{RProtoBuf}: Efficient Cross-Language Data Serialization in \proglang{R}} - -%% for pretty printing and a nice hypersummary also set: -\Plainauthor{Dirk Eddelbuettel, Murray Stokely, Jeroen Ooms} %% comma-separated -\Plaintitle{RProtoBuf: Efficient Cross-Language Data Serialization in R} -\Shorttitle{\pkg{RProtoBuf}: Protocol Buffers in \proglang{R}} %% a short title (if necessary) - -%% an abstract and keywords -\Abstract{ - Modern data collection and analysis pipelines often involve - a sophisticated mix of applications written in general purpose and - specialized programming languages. - Many formats commonly used to import and export data between - different programs or systems, such as \code{CSV} or \code{JSON}, are - verbose, inefficient, not type-safe, or tied to a specific programming language. - Protocol Buffers are a popular - method of serializing structured data between applications---while remaining - independent of programming languages or operating systems. - They offer a unique combination of features, performance, and maturity that seems - particularly well suited for data-driven applications and numerical - computing. - The \pkg{RProtoBuf} package provides a complete interface to Protocol - Buffers from the - \proglang{R} environment for statistical computing. - This paper outlines the general class of data serialization - requirements for statistical computing, describes the implementation - of the \pkg{RProtoBuf} package, and illustrates its use with - example applications in large-scale data collection pipelines and web - services. - %% TODO(ms) keep it less than 150 words. -- I think this may be 154, - %% depending how emacs is counting. -} -\Keywords{\proglang{R}, \pkg{Rcpp}, Protocol Buffers, serialization, cross-platform} -\Plainkeywords{R, Rcpp, Protocol Buffers, serialization, cross-platform} %% without formatting -%% at least one keyword must be supplied - -%% publication information -%% NOTE: Typically, this can be left commented and will be filled out by the technical editor -%% \Volume{50} -%% \Issue{9} -%% \Month{June} -%% \Year{2012} -%% \Submitdate{2012-06-04} -%% \Acceptdate{2012-06-04} - -%% The address of (at least) one author should be given -%% in the following format: -\Address{ - Dirk Eddelbuettel \\ - Debian Project \\ - River Forest, IL, USA\\ - E-mail: \email{edd at debian.org}\\ - URL: \url{http://dirk.eddelbuettel.com}\\ - \\ - Murray Stokely\\ - Google, Inc.\\ - 1600 Amphitheatre Parkway\\ - Mountain View, CA, USA\\ - E-mail: \email{mstokely at google.com}\\ - URL: \url{http://www.stokely.org/}\\ - \\ - Jeroen Ooms\\ - UCLA Department of Statistics\\ - University of California\\ - Los Angeles, CA, USA\\ - E-mail: \email{jeroen.ooms at stat.ucla.edu}\\ - URL: \url{https://jeroenooms.github.io} -} -%% It is also possible to add a telephone and fax number -%% before the e-mail in the following format: -%% Telephone: +43/512/507-7103 -%% Fax: +43/512/507-2851 - -%% for those who use Sweave please include the following line (with % symbols): -%% need no \usepackage{Sweave.sty} - -%% end of declarations %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% - - -\begin{document} -\SweaveOpts{concordance=FALSE,prefix.string=figures/fig} - - -%% include your article here, just as usual -%% Note that you should use the \pkg{}, \proglang{} and \code{} commands. - - -% We don't want a left margin for Sinput or Soutput for our table 1. -%\DefineVerbatimEnvironment{Sinput}{Verbatim} {xleftmargin=0em} -%\DefineVerbatimEnvironment{Soutput}{Verbatim}{xleftmargin=0em} -%\DefineVerbatimEnvironment{Scode}{Verbatim}{xleftmargin=2em} -% Setting the topsep to 0 reduces spacing from input to output and -% improves table 1. -\fvset{listparameters={\setlength{\topsep}{0pt}}} -\renewenvironment{Schunk}{\vspace{\topsep}}{\vspace{\topsep}} - -%% DE: I tend to have wider option(width=...) so this -%% guarantees better line breaks -<>= -## cf http://www.jstatsoft.org/style#q12 -options(prompt = "R> ", - continue = "+ ", - width = 70, - useFancyQuotes = FALSE, - digits = 4) -@ - -\maketitle - -\section{Introduction} - -Modern data collection and analysis pipelines increasingly involve collections -of decoupled components in order to better manage software complexity -through reusability, modularity, and fault isolation \citep{Wegiel:2010:CTT:1932682.1869479}. -These pipelines are frequently built using different programming -languages for the different phases of data analysis --- collection, -cleaning, modeling, analysis, post-processing, and -presentation --- in order to take advantage of the unique combination of -performance, speed of development, and library support offered by -different environments and languages. Each stage of such a data -analysis pipeline may produce intermediate results that need to be -stored in a file, or sent over the network for further processing. - -Given these requirements, how do we safely and efficiently share intermediate results -between different applications, possibly written in different -languages, and possibly running on different computer systems? -In computer programming, \emph{serialization} is the process of -translating data structures, variables, and session state into a -format that can be stored or transmitted and then reconstructed in the -original form later \citep{clinec++}. -Programming -languages such as \proglang{R}, \proglang{Julia}, \proglang{Java}, and \proglang{Python} include built-in -support for serialization, but the default formats -are usually language-specific and thereby lock the user into a single -environment. - -Data analysts and researchers often use character-separated text formats such -as \code{CSV} \citep{shafranovich2005common} to export and import -data. However, anyone who has ever used \code{CSV} files will have noticed -that this method has many limitations: it is restricted to tabular data, -lacks type-safety, and has limited precision for numeric values. Moreover, -ambiguities in the format itself frequently cause problems. For example, -conventions on which characters is used as separator or decimal point vary by -country. \emph{Extensible Markup Language} (\code{XML}) is a -well-established and widely-supported format with the ability to define just -about any arbitrarily complex schema \citep{nolan2013xml}. However, it pays -for this complexity with comparatively large and verbose messages, and added -complexity at the parsing side (these problems are somewhat mitigated by the -availability of mature libraries and parsers). Because \code{XML} is -text-based and has no native notion of numeric types or arrays, it usually not a -very practical format to store numeric data sets as they appear in statistical -applications. - - -A more modern format is \emph{JavaScript ObjectNotation} -(\code{JSON}), which is derived from the object literals of -\proglang{JavaScript}, and already widely-used on the world wide web. -Several \proglang{R} packages implement functions to parse and generate -\code{JSON} data from \proglang{R} objects \citep{rjson,RJSONIO,jsonlite}. -\code{JSON} natively supports arrays and four primitive types: numbers, strings, -booleans, and null. However, as it too is a text-based format, numbers are -stored as human-readable decimal notation which is inefficient and -leads to loss of type (double versus integer) and precision. -A number of binary formats based on \code{JSON} have been proposed -that reduce the parsing cost and improve efficiency, but these formats -are not widely supported. Furthermore, such formats lack a separate -schema for the serialized data and thus still duplicate field names -with each message sent over the network or stored in a file. - -Once the data serialization needs of an application become complex -enough, developers typically benefit from the use of an -\emph{interface description language}, or \emph{IDL}. IDLs like -Protocol Buffers \citep{protobuf}, Apache Thrift \citep{Apache:Thrift}, and Apache Avro \citep{Apache:Avro} -provide a compact well-documented schema for cross-language data -structures and efficient binary interchange formats. Since the schema -is provided separately from the data, the data can be -efficiently encoded to minimize storage costs when -compared with simple ``schema-less'' binary interchange formats. -%Many sources compare data serialization formats -%and show Protocol Buffers perform favorably to the alternatives; see -%\citet{Sumaray:2012:CDS:2184751.2184810} for one such comparison. -Protocol Buffers performs well in the comparison of such formats by -\citet{Sumaray:2012:CDS:2184751.2184810}. - -This paper describes an \proglang{R} interface to Protocol Buffers, -and is organized as follows. Section~\ref{sec:protobuf} -provides a general high-level overview of Protocol Buffers as well as a basic -motivation for their use. -Section~\ref{sec:rprotobuf-basic} describes the interactive \proglang{R} interface -provided by the \pkg{RProtoBuf} package, and introduces the two main abstractions: -\emph{Messages} and \emph{Descriptors}. Section~\ref{sec:rprotobuf-classes} -details the implementation of the main S4 classes and methods. -Section~\ref{sec:types} describes the challenges of type coercion -between \proglang{R} and other languages. Section~\ref{sec:evaluation} introduces a -general \proglang{R} language schema for serializing arbitrary \proglang{R} objects and compares it to -the serialization capabilities built directly into \proglang{R}. Sections~\ref{sec:mapreduce} -and \ref{sec:opencpu} provide real-world use cases of \pkg{RProtoBuf} -in MapReduce and web service environments, respectively, before -Section~\ref{sec:summary} concludes. - -\section{Protocol Buffers} -\label{sec:protobuf} - -Protocol Buffers are a modern, language-neutral, platform-neutral, -extensible mechanism for sharing and storing structured data. Some of their -features, particularly in the context of data analysis, are: - -\begin{itemize} -\item \emph{Portable}: Enable users to send and receive data between - applications as well as different computers or operating systems. -\item \emph{Efficient}: Data is serialized into a compact binary - representation for transmission or storage. -\item \emph{Extensible}: New fields can be added to Protocol Buffer schemas - in a forward-compatible way that does not break older applications. -\item \emph{Stable}: Protocol Buffers have been in wide use for over a - decade. -\end{itemize} - -%\begin{figure}[bp] -\begin{figure}[h!] -\begin{center} -\includegraphics[width=0.9\textwidth]{figures/protobuf-distributed-system-crop.pdf} -\end{center} -\caption{Example usage of Protocol Buffers.} -\label{fig:protobuf-distributed-usecase} -\end{figure} - -Figure~\ref{fig:protobuf-distributed-usecase} illustrates an example -communication work flow with Protocol Buffers and an interactive \proglang{R} session. -Common use cases include populating a request remote-procedure call (RPC) -Protocol Buffer in \proglang{R} that is then serialized and sent over the network to a -remote server. The server deserializes the message, acts on the -request, and responds with a new Protocol Buffer over the network. -The key difference to, say, a request to an \pkg{Rserve} -\citep{Urbanek:2003:Rserve,CRAN:Rserve} instance is that -the remote server may be implemented in any language. -%, with no dependence on \proglang{R}. - -While traditional IDLs have at times been criticized for code bloat and -complexity, Protocol Buffers are based on a simple list and records -model that is flexible and easy to use. The schema for structured -Protocol Buffer data is defined in \code{.proto} files, which may -contain one or more message types. Each message type has one or more -fields. A field is specified with a unique number (called a \emph{tag number}), a name, a value -type, and a field rule specifying whether the field is optional, -required, or repeated. The supported value types are numbers, -enumerations, booleans, strings, raw bytes, or other nested message -types. The \code{.proto} file syntax for defining the structure of Protocol -Buffer data is described comprehensively on Google Code\footnote{See -\url{http://code.google.com/apis/protocolbuffers/docs/proto.html}.}. -Table~\ref{tab:proto} shows an example \code{.proto} file that -defines the \code{tutorial.Person} type\footnote{The compound name - \code{tutorial.Person} in R is derived from the name of the - message (\emph{Person}) and the name of the package defined at the top of the - \code{.proto} file in which it is defined (\emph{tutorial}).}. The \proglang{R} code in the right -column shows an example of creating a new message of this type and -populating its fields. - -\noindent -\begin{table} -\begin{tabular}{p{0.45\textwidth}p{0.5\textwidth}} -\toprule -Schema : \code{addressbook.proto} & Example \proglang{R} session\\ -\cmidrule{1-2} -\begin{minipage}{.40\textwidth} -\vspace{2mm} -\begin{example} -package tutorial; -message Person { - required string name = 1; - required int32 id = 2; - optional string email = 3; - enum PhoneType { - MOBILE = 0; - HOME = 1; - WORK = 2; - } - message PhoneNumber { - required string number = 1; - optional PhoneType type = 2; - } - repeated PhoneNumber phone = 4; -} -\end{example} -\vspace{2mm} -\end{minipage} & \begin{minipage}{.55\textwidth} -<>= -library("RProtoBuf") -p <- new(tutorial.Person, id=1, - name="Dirk") -p$name -p$name <- "Murray" -cat(as.character(p)) -serialize(p, NULL) -class(p) -@ -\end{minipage} \\ -\bottomrule -\end{tabular} -\caption{The schema representation from a \code{.proto} file for the - \code{tutorial.Person} class (left) and simple \proglang{R} code for creating - an object of this class and accessing its fields (right).} -\label{tab:proto} -\end{table} - - -For added speed and efficiency, the \proglang{C++}, \proglang{Java}, -and \proglang{Python} bindings to -Protocol Buffers are used with a compiler that translates a Protocol -Buffer schema description file (ending in \code{.proto}) into -language-specific classes that can be used to create, read, write, and -manipulate Protocol Buffer messages. The \proglang{R} interface, in contrast, -uses a reflection-based API that makes some operations slightly -slower but which is much more convenient for interactive data analysis. -All messages in \proglang{R} have a single class -structure, but different accessor methods are created at runtime based -on the named fields of the specified message type, as described in the -next section. - -\section{Basic usage: Messages and descriptors} -\label{sec:rprotobuf-basic} - -This section describes how to use the \proglang{R} API to create and manipulate -Protocol Buffer messages in \proglang{R}, and how to read and write the -binary representation of the message (often called the \emph{payload}) to files and arbitrary binary -\proglang{R} connections. -The two fundamental building blocks of Protocol Buffers are \emph{Messages} -and \emph{Descriptors}. Messages provide a common abstract encapsulation of -structured data fields of the type specified in a Message Descriptor. -Message Descriptors are defined in \code{.proto} files and define a -schema for a particular named class of messages. - -% Note: We comment out subsections in favor of textbf blocks to save -% space and shrink down this section a little bit. -%\subsection[Importing message descriptors from .proto files]{Importing message descriptors from \code{.proto} files} - -\subsection*{Importing message descriptors from \code{.proto} files} - -To create or parse a Protocol Buffer Message, one must first read in -the message descriptor (\emph{message type}) from a \code{.proto} file. -A small number of message types are imported when the package is first -loaded, including the \code{tutorial.Person} type we saw in the last -section. -All other types must be imported from -\code{.proto} files using the \code{readProtoFiles} -function, which can either import a single file, all files in a directory, -or every \code{.proto} file provided by a particular \proglang{R} package. - -After importing proto files, the corresponding message descriptors are -available by name from the \code{RProtoBuf:DescriptorPool} environment in -the \proglang{R} search path. This environment is implemented with the -user-defined tables framework from the \pkg{RObjectTables} package -available from the OmegaHat project \citep{RObjectTables}. Instead of -being associated with a static hash table, this environment -dynamically queries the in-memory database of loaded descriptors -during normal variable lookup. This allows new descriptors to be -parsed from \code{.proto} files and added to the global -namespace.\footnote{Note that there is a significant performance - overhead with this RObjectTable implementation. Because the table - is on the search path and is not cacheable, lookups of symbols that - are behind it in the search path cannot be added to the global object - cache, and R must perform an expensive lookup through all of the - attached environments and the protocol buffer definitions to find common - symbols (most notably those in base) from the global environment. - Fortunately, proper use of namespaces and package imports reduces - the impact of this for code in packages.} - -% Commented out for now because its too detailed. Lets shorten -% section 3 per referee feedback. - -%<>= -%ls("RProtoBuf:DescriptorPool") -%@ - -% \subsection{Creating a message} - -% \\ - -\subsection*{Creating, accessing, and modifying messages.} - -New messages are created with the \code{new} function which accepts -a Message Descriptor and optionally a list of ``name = value'' pairs -to set in the message. -%The objects contained in the special environment are -%descriptors for their associated message types. Descriptors will be -%discussed in detail in another part of this document, but for the -%purpose of this section, descriptors are just used with the \code{new} -%function to create messages. - -<<>>= -p <- new(tutorial.Person, name = "Murray", id = 1) -@ - -% \subsection*{Access and modify fields of a message} - -Once the message is created, its fields can be queried -and modified using the dollar operator of \proglang{R}, making Protocol -Buffer messages seem like lists. - -<<>>= -p$name -p$id -p$email <- "murray at stokely.org" -@ - -As opposed to \proglang{R} lists, no partial matching is performed -and the name must be given entirely. -The \verb|[[| operator can also be used to query and set fields -of a messages, supplying either their name or their tag number: - -<<>>= -p[["name"]] <- "Murray Stokely" -p[[ 2 ]] <- 3 -p[["email"]] -@ - -Protocol Buffers include a 64-bit integer type, but \proglang{R} lacks native -64-bit integer support. A workaround is available and described in -Section~\ref{sec:int64} for working with large integer values. - -\subsection*{Printing, reading, and writing Messages} - -%\\ - -% \textbf{Printing, Reading, and Writing Messages} - -Protocol Buffer messages and descriptors implement \code{show} -methods that provide basic information about the message: - -<<>>= -p -@ - -%For additional information, such as for debugging purposes, -The \code{as.character} method provides a more complete ASCII -representation of the contents of a message. - -<<>>= -writeLines(as.character(p)) -@ - -% \subsection{Serializing messages} - -A primary benefit of Protocol Buffers is an efficient -binary wire-format representation. -The \code{serialize} method is implemented for -Protocol Buffer messages to serialize a message into a sequence of -bytes (raw vector) that represents the message. -The raw bytes can then be parsed back into the original message safely -as long as the message type is known and its descriptor is available. - -<<>>= -serialize(p, NULL) -@ - -The same method can be used to serialize messages to files or arbitrary binary connections: - -<<>>= -tf1 <- tempfile() -serialize(p, tf1) -readBin(tf1, raw(0), 500) -@ - -% TODO(mstokely): Comment out, combined with last statement. make this -% shorter, more succinct summary of the key features of RProtoBuf. - -%Or to arbitrary binary connections: -% -%<<>>= -%tf2 <- tempfile() -%con <- file(tf2, open = "wb") -%serialize(p, con) -%close(con) -%readBin(tf2, raw(0), 500) -%@ - -% TODO(mstokely): commentd out per referee feedback, but see if this is -% covered in the package documentation well. -% -%\code{serialize} can also be called in a more traditional -%object-oriented fashion using the dollar operator. -% -%<<>>= -%p$serialize(tf1) -%con <- file(tf2, open = "wb") -%p$serialize(con) -%close(con) -%@ -% -%Here, we first serialize to a file \code{tf1} before we serialize to a binary -%connection to file \code{tf2}. - -%\subsection{Parsing messages} - -The \pkg{RProtoBuf} package defines the \code{read} and -\code{readASCII} functions to read messages from files, raw vectors, -or arbitrary connections. \code{read} expects to read the message -payload from binary files or connections and \code{readASCII} parses -the human-readable ASCII output that is created with -\code{as.character}. - -The binary representation of the message -does not contain information that can be used to dynamically -infer the message type, so we have to provide this information -to the \code{read} function in the form of a descriptor: - -<<>>= -msg <- read(tutorial.Person, tf1) -writeLines(as.character(msg)) -@ - -The \code{input} argument of \code{read} can also be a binary -readable \proglang{R} connection, such as a binary file connection, or a raw vector of serialized bytes. - -% <<>>= -% con <- file(tf2, open = "rb") -% message <- read(tutorial.Person, con) -% close(con) -% writeLines(as.character(message)) -% @ - -% Finally, the raw vector payload of the message can be used: -% -%<<>>= -%payload <- readBin(tf1, raw(0), 5000) -%message <- read(tutorial.Person, payload) -%@ - -% TODO(mstokely): comment out and use only one style, not both per -% referee feedback. Also avoid using the term 'pseudo-method' which -% is unclear. -% -%\code{read} can also be used as a method of the descriptor -%object: -% -%<<>>= -%message <- tutorial.Person$read(tf1) -%con <- file(tf2, open = "rb") -%message <- tutorial.Person$read(con) -%close(con) -%message <- tutorial.Person$read(payload) -%@ -% -%Here we read first from a file, then from a binary connection and lastly from -%a message payload. - -\section{Under the hood: S4 classes and methods} -\label{sec:rprotobuf-classes} - -The \pkg{RProtoBuf} package uses the S4 system to store -information about descriptors and messages. -Each \proglang{R} object -contains an external pointer to an object managed by the -\code{protobuf} \proglang{C++} library, and the \proglang{R} objects make calls into more -than 100 \proglang{C++} functions that provide the -glue code between the \proglang{R} language classes and the underlying \proglang{C++} -classes. -S4 objects are immutable, and so the methods that modify field values of a message return a new copy of the object with R's usual functional copy on modify semantics\footnote{RProtoBuf was designed and implemented before Reference Classes were introduced to offer a new class system with mutable objects. If RProtoBuf were -implemented today Reference Classes would almost certainly be a better -design choice than S4 classes.}. -Using the S4 system -allows the package to dispatch methods that are not -generic in the S3 sense, such as \code{new} and -\code{serialize}. - -The \pkg{Rcpp} package -\citep{eddelbuettel2011rcpp,eddelbuettel2013seamless} is used to -facilitate this integration of the \proglang{R} and \proglang{C++} code for these objects. -Each method is wrapped individually which allows us to add -user-friendly custom error handling, type coercion, and performance -improvements at the cost of a more verbose implementation. -The \pkg{RProtoBuf} package in many ways motivated -the development of \pkg{Rcpp} Modules \citep{eddelbuettel2013exposing}, -which provide a more concise way of wrapping \proglang{C++} functions and classes -in a single entity. - -Since \pkg{RProtoBuf} users are most often switching between two or -more different languages as part of a larger data analysis pipeline, -both generic function and message passing OO style calling conventions -are supported: - -\begin{itemize} -\item The functional dispatch mechanism of the the form - \verb|method(object, arguments)| (common to \proglang{R}). -\item The message passing object-oriented notation of the form - \verb|object$method(arguments)|. -\end{itemize} - -Additionally, \pkg{RProtoBuf} supports tab completion for all -classes. Completion possibilities include method names for all -classes, plus \emph{dynamic dispatch} on names or types specific to a given -object. This functionality is implemented with the -\code{.DollarNames} S3 generic function defined in the \pkg{utils} -package that is included with \proglang{R} \citep{r}. - - -Table~\ref{class-summary-table} lists the six primary Message and -Descriptor classes in \pkg{RProtoBuf}. -The package documentation provides a complete description of the slots and methods for -each class. - -% [bp] -\begin{table} -\centering -\begin{tabular}{lccl} -\toprule -Class & Slots & Methods & Dynamic dispatch\\ -\cmidrule{2-4} -Message & 2 & 20 & yes (field names)\\ -Descriptor & 2 & 16 & yes (field names, enum types, nested types)\\ -FieldDescriptor & 4 & 18 & no\\ -EnumDescriptor & 4 & 11 & yes (enum constant names)\\ -EnumValueDescriptor & 3 & \phantom{1}6 & no\\ -FileDescriptor & 3 & \phantom{1}6 & yes (message/field definitions)\\ -\bottomrule -\end{tabular} -\caption{\label{class-summary-table}Overview of class, slot, method and - dispatch relationships.} -\end{table} - -\subsection{Messages} - -The \code{Message} S4 class represents Protocol Buffer Messages and -is the core abstraction of \pkg{RProtoBuf}. Each \code{Message} -contains a pointer to a \code{Descriptor} which defines the schema -of the data defined in the Message, as well as a number of -\code{FieldDescriptors} for the individual fields of the message. - -<<>>= -new(tutorial.Person) -@ - -\subsection{Descriptors} - -Descriptors describe the type of a Message. This includes what fields -a message contains and what the types of those fields are. Message -descriptors are represented in \proglang{R} by the \emph{Descriptor} S4 -class. The class contains the slots \code{pointer} and -\code{type}. Similarly to messages, the \verb|$| operator can be -used to retrieve descriptors that are contained in the descriptor, or -invoke methods. - -When \pkg{RProtoBuf} is first loaded it calls -\code{readProtoFiles} to read in the example \code{addressbook.proto} file -included with the package. The \code{tutorial.Person} descriptor -and all other descriptors defined in the loaded \code{.proto} files are -then available on the search path\footnote{This explains why the example in -Table~\ref{tab:proto} lacked an explicit call to -\code{readProtoFiles}.}. - -\subsubsection*{Field descriptors} -\label{subsec-field-descriptor} - -<<>>= -tutorial.Person$email -tutorial.Person$email$is_required() -tutorial.Person$email$type() -tutorial.Person$email$as.character() -class(tutorial.Person$email) -@ - -\subsubsection*{Enum and EnumValue descriptors} -\label{subsec-enum-descriptor} - -The \code{EnumDescriptor} type contains information about what values a -type defines, while the \code{EnumValueDescriptor} describes an -individual enum constant of a particular type. The \verb|$| operator -can be used to retrieve the value of enum constants contained in the -EnumDescriptor, or to invoke methods. - -<<>>= -tutorial.Person$PhoneType -tutorial.Person$PhoneType$WORK -class(tutorial.Person$PhoneType) -tutorial.Person$PhoneType$value(1) -tutorial.Person$PhoneType$value(name="HOME") -tutorial.Person$PhoneType$value(number=1) -class(tutorial.Person$PhoneType$value(1)) -@ - -\subsubsection*{File descriptors} -\label{subsec-file-descriptor} - -The class \emph{FileDescriptor} represents file descriptors in \proglang{R}. -The \verb|$| operator can be used to retrieve named fields defined in -the FileDescriptor, or to invoke methods. - -% < < > > = -% f <- tutorial.Person$fileDescriptor() -% f -% f$Person -% @ - -\begin{Schunk} -\begin{Sinput} -R> f <- tutorial.Person$fileDescriptor() -R> f -\end{Sinput} -\begin{Soutput} -file descriptor for package tutorial \ - (/usr/local/lib/R/site-library/RProtoBuf/proto/addressbook.proto) -\end{Soutput} -\begin{Sinput} -R> f$Person -\end{Sinput} -\begin{Soutput} -descriptor for type 'tutorial.Person' -\end{Soutput} -\end{Schunk} - - -\section{Type coercion} -\label{sec:types} - -One of the benefits of using an Interface Definition Language (IDL) -like Protocol Buffers is that it provides a highly portable basic type -system. This permits different language and hardware implementations to map to -the most appropriate type in different environments. - -Table~\ref{table-get-types} details the correspondence between the -field type and the type of data that is retrieved by \verb|$| and \verb|[[| -extractors. Three types in particular need further attention due to -specific differences in the \proglang{R} language: booleans, unsigned -integers, and 64-bit integers. - -\begin{table}[h] -\centering -\begin{small} -\begin{tabular}{lp{5cm}p{5.5cm}} -\toprule -Field type & \proglang{R} type (non repeated) & \proglang{R} type (repeated) \\ -\cmidrule(r){2-3} -double & \code{double} vector & \code{double} vector \\ -float & \code{double} vector & \code{double} vector \\[3mm] -uint32 & \code{double} vector & \code{double} vector \\ -fixed32 & \code{double} vector & \code{double} vector \\[3mm] -int32 & \code{integer} vector & \code{integer} vector \\ -sint32 & \code{integer} vector & \code{integer} vector \\ -sfixed32 & \code{integer} vector & \code{integer} vector \\[3mm] -int64 & \code{integer} or \code{character} -vector & \code{integer} or \code{character} vector \\ -uint64 & \code{integer} or \code{character} vector & \code{integer} or \code{character} vector \\ -sint64 & \code{integer} or \code{character} vector & \code{integer} or \code{character} vector \\ -fixed64 & \code{integer} or \code{character} vector & \code{integer} or \code{character} vector \\ -sfixed64 & \code{integer} or \code{character} vector & \code{integer} or \code{character} vector \\[3mm] -bool & \code{logical} vector & \code{logical} vector \\[3mm] -string & \code{character} vector & \code{character} vector \\ -bytes & \code{character} vector & \code{character} vector \\[3mm] -enum & \code{integer} vector & \code{integer} vector \\[3mm] -message & \code{S4} object of class \code{Message} & \code{list} of \code{S4} objects of class \code{Message} \\ -\bottomrule -\end{tabular} -\end{small} -\caption{\label{table-get-types}Correspondence between field type and - \proglang{R} type retrieved by the extractors. \proglang{R} lacks native - 64-bit integers, so the \code{RProtoBuf.int64AsString} option is - available to return large integers as characters to avoid losing [TRUNCATED] To get the complete diff run: svnlook diff /svnroot/rprotobuf -r 942 From noreply at r-forge.r-project.org Mon Apr 13 20:25:14 2015 From: noreply at r-forge.r-project.org (noreply at r-forge.r-project.org) Date: Mon, 13 Apr 2015 20:25:14 +0200 (CEST) Subject: [Rprotobuf-commits] r943 - papers/jss Message-ID: <20150413182514.38FC6187771@r-forge.r-project.org> Author: edd Date: 2015-04-13 20:25:13 +0200 (Mon, 13 Apr 2015) New Revision: 943 Modified: papers/jss/jss1313.Rnw Log: update bibliography file reference Modified: papers/jss/jss1313.Rnw =================================================================== --- papers/jss/jss1313.Rnw 2015-04-13 17:51:23 UTC (rev 942) +++ papers/jss/jss1313.Rnw 2015-04-13 18:25:13 UTC (rev 943) @@ -1512,7 +1512,7 @@ \end{verbatim} % \end{appendices} \newpage -\bibliography{article} +\bibliography{jss1313} \end{document} From noreply at r-forge.r-project.org Mon Apr 13 21:33:59 2015 From: noreply at r-forge.r-project.org (noreply at r-forge.r-project.org) Date: Mon, 13 Apr 2015 21:33:59 +0200 (CEST) Subject: [Rprotobuf-commits] r944 - papers/jss Message-ID: <20150413193359.CBCB91878EE@r-forge.r-project.org> Author: edd Date: 2015-04-13 21:33:59 +0200 (Mon, 13 Apr 2015) New Revision: 944 Added: papers/jss/jss1313.R Removed: papers/jss/article.R Modified: papers/jss/ Log: more svn ignore Property changes on: papers/jss ___________________________________________________________________ Modified: svn:ignore - article.aux article.bbl article.log article.pdf article.tex + article.aux article.bbl article.log article.pdf article.tex jss1313.aux jss1313.bbl jss1313.log jss1313.pdf jss1313.tex Deleted: papers/jss/article.R =================================================================== --- papers/jss/article.R 2015-04-13 18:25:13 UTC (rev 943) +++ papers/jss/article.R 2015-04-13 19:33:59 UTC (rev 944) @@ -1,297 +0,0 @@ -### R code from vignette source '/home/edd/svn/rprotobuf/papers/jss/article.Rnw' - -################################################### -### code chunk number 1: article.Rnw:130-136 -################################################### -## cf http://www.jstatsoft.org/style#q12 -options(prompt = "R> ", - continue = "+ ", - width = 70, - useFancyQuotes = FALSE, - digits = 4) - - -################################################### -### code chunk number 2: article.Rnw:318-326 -################################################### -library("RProtoBuf") -p <- new(tutorial.Person, id=1, - name="Dirk") -p$name -p$name <- "Murray" -cat(as.character(p)) -serialize(p, NULL) -class(p) - - -################################################### -### code chunk number 3: article.Rnw:421-422 -################################################### -p <- new(tutorial.Person, name = "Murray", id = 1) - - -################################################### -### code chunk number 4: article.Rnw:431-434 -################################################### -p$name -p$id -p$email <- "murray at stokely.org" - - -################################################### -### code chunk number 5: article.Rnw:442-445 -################################################### -p[["name"]] <- "Murray Stokely" -p[[ 2 ]] <- 3 -p[["email"]] - - -################################################### -### code chunk number 6: article.Rnw:461-462 -################################################### -p - - -################################################### -### code chunk number 7: article.Rnw:469-470 -################################################### -writeLines(as.character(p)) - - -################################################### -### code chunk number 8: article.Rnw:483-484 -################################################### -serialize(p, NULL) - - -################################################### -### code chunk number 9: article.Rnw:489-492 -################################################### -tf1 <- tempfile() -serialize(p, tf1) -readBin(tf1, raw(0), 500) - - -################################################### -### code chunk number 10: article.Rnw:538-540 -################################################### -msg <- read(tutorial.Person, tf1) -writeLines(as.character(msg)) - - -################################################### -### code chunk number 11: article.Rnw:660-661 -################################################### -new(tutorial.Person) - - -################################################### -### code chunk number 12: article.Rnw:685-690 -################################################### -tutorial.Person$email -tutorial.Person$email$is_required() -tutorial.Person$email$type() -tutorial.Person$email$as.character() -class(tutorial.Person$email) - - -################################################### -### code chunk number 13: article.Rnw:702-709 -################################################### -tutorial.Person$PhoneType -tutorial.Person$PhoneType$WORK -class(tutorial.Person$PhoneType) -tutorial.Person$PhoneType$value(1) -tutorial.Person$PhoneType$value(name="HOME") -tutorial.Person$PhoneType$value(number=1) -class(tutorial.Person$PhoneType$value(1)) - - -################################################### -### code chunk number 14: article.Rnw:805-808 -################################################### -if (!exists("JSSPaper.Example1", "RProtoBuf:DescriptorPool")) { - readProtoFiles(file="int64.proto") -} - - -################################################### -### code chunk number 15: article.Rnw:830-834 -################################################### -as.integer(2^31-1) -as.integer(2^31 - 1) + as.integer(1) -2^31 -class(2^31) - - -################################################### -### code chunk number 16: article.Rnw:846-847 -################################################### -2^53 == (2^53 + 1) - - -################################################### -### code chunk number 17: article.Rnw:898-900 -################################################### -msg <- serialize_pb(iris, NULL) -identical(iris, unserialize_pb(msg)) - - -################################################### -### code chunk number 18: article.Rnw:928-931 -################################################### -datasets <- as.data.frame(data(package="datasets")$results) -datasets$name <- sub("\\s+.*$", "", datasets$Item) -n <- nrow(datasets) - - -################################################### -### code chunk number 19: article.Rnw:949-992 -################################################### -datasets$object.size <- unname(sapply(datasets$name, function(x) object.size(eval(as.name(x))))) - -datasets$R.serialize.size <- unname(sapply(datasets$name, function(x) length(serialize(eval(as.name(x)), NULL)))) - -datasets$R.serialize.size <- unname(sapply(datasets$name, function(x) length(serialize(eval(as.name(x)), NULL)))) - -datasets$R.serialize.size.gz <- unname(sapply(datasets$name, function(x) length(memCompress(serialize(eval(as.name(x)), NULL), "gzip")))) - -datasets$RProtoBuf.serialize.size <- unname(sapply(datasets$name, function(x) length(serialize_pb(eval(as.name(x)), NULL)))) - -datasets$RProtoBuf.serialize.size.gz <- unname(sapply(datasets$name, function(x) length(memCompress(serialize_pb(eval(as.name(x)), NULL), "gzip")))) - -clean.df <- data.frame(dataset=datasets$name, - object.size=datasets$object.size, - "serialized"=datasets$R.serialize.size, - "gzipped serialized"=datasets$R.serialize.size.gz, - "RProtoBuf"=datasets$RProtoBuf.serialize.size, - "gzipped RProtoBuf"=datasets$RProtoBuf.serialize.size.gz, - "ratio.serialized" = datasets$R.serialize.size / datasets$object.size, - "ratio.rprotobuf" = datasets$RProtoBuf.serialize.size / datasets$object.size, - "ratio.serialized.gz" = datasets$R.serialize.size.gz / datasets$object.size, - "ratio.rprotobuf.gz" = datasets$RProtoBuf.serialize.size.gz / datasets$object.size, - "savings.serialized" = 1-(datasets$R.serialize.size / datasets$object.size), - "savings.rprotobuf" = 1-(datasets$RProtoBuf.serialize.size / datasets$object.size), - "savings.serialized.gz" = 1-(datasets$R.serialize.size.gz / datasets$object.size), - "savings.rprotobuf.gz" = 1-(datasets$RProtoBuf.serialize.size.gz / datasets$object.size), - check.names=FALSE) - -all.df<-data.frame(dataset="TOTAL", object.size=sum(datasets$object.size), - "serialized"=sum(datasets$R.serialize.size), - "gzipped serialized"=sum(datasets$R.serialize.size.gz), - "RProtoBuf"=sum(datasets$RProtoBuf.serialize.size), - "gzipped RProtoBuf"=sum(datasets$RProtoBuf.serialize.size.gz), - "ratio.serialized" = sum(datasets$R.serialize.size) / sum(datasets$object.size), - "ratio.rprotobuf" = sum(datasets$RProtoBuf.serialize.size) / sum(datasets$object.size), - "ratio.serialized.gz" = sum(datasets$R.serialize.size.gz) / sum(datasets$object.size), - "ratio.rprotobuf.gz" = sum(datasets$RProtoBuf.serialize.size.gz) / sum(datasets$object.size), - "savings.serialized" = 1-(sum(datasets$R.serialize.size) / sum(datasets$object.size)), - "savings.rprotobuf" = 1-(sum(datasets$RProtoBuf.serialize.size) / sum(datasets$object.size)), - "savings.serialized.gz" = 1-(sum(datasets$R.serialize.size.gz) / sum(datasets$object.size)), - "savings.rprotobuf.gz" = 1-(sum(datasets$RProtoBuf.serialize.size.gz) / sum(datasets$object.size)), - check.names=FALSE) -clean.df<-rbind(clean.df, all.df) - - -################################################### -### code chunk number 20: SER -################################################### -old.mar<-par("mar") -new.mar<-old.mar -new.mar[3]<-0 -new.mar[4]<-0 -my.cex<-1.3 -par("mar"=new.mar) -plot(clean.df$savings.serialized, clean.df$savings.rprotobuf, pch=1, col="red", las=1, xlab="Serialization Space Savings", ylab="Protocol Buffer Space Savings", xlim=c(0,1),ylim=c(0,1),cex.lab=my.cex, cex.axis=my.cex) -points(clean.df$savings.serialized.gz, clean.df$savings.rprotobuf.gz,pch=2, col="blue") -# grey dotted diagonal -abline(a=0,b=1, col="grey",lty=2,lwd=3) - -# find point furthest off the X axis. -clean.df$savings.diff <- clean.df$savings.serialized - clean.df$savings.rprotobuf -clean.df$savings.diff.gz <- clean.df$savings.serialized.gz - clean.df$savings.rprotobuf.gz - -# The one to label. -tmp.df <- clean.df[which(clean.df$savings.diff == min(clean.df$savings.diff)),] -# This minimum means most to the left of our line, so pos=2 is label to the left -text(tmp.df$savings.serialized, tmp.df$savings.rprotobuf, labels=tmp.df$dataset, pos=2, cex=my.cex) - -# Some gziped version -# text(tmp.df$savings.serialized.gz, tmp.df$savings.rprotobuf.gz, labels=tmp.df$dataset, pos=2, cex=my.cex) - -# Second one is also an outlier -tmp.df <- clean.df[which(clean.df$savings.diff == sort(clean.df$savings.diff)[2]),] -# This minimum means most to the left of our line, so pos=2 is label to the left -text(tmp.df$savings.serialized, tmp.df$savings.rprotobuf, labels=tmp.df$dataset, pos=2, cex=my.cex) -#text(tmp.df$savings.serialized.gz, tmp.df$savings.rprotobuf.gz, labels=tmp.df$dataset, pos=my.cex) - - -tmp.df <- clean.df[which(clean.df$savings.diff == max(clean.df$savings.diff)),] -# This minimum means most to the right of the diagonal, so pos=4 is label to the right -# Only show the gziped one. -#text(tmp.df$savings.serialized, tmp.df$savings.rprotobuf, labels=tmp.df$dataset, pos=4, cex=my.cex) -text(tmp.df$savings.serialized.gz, tmp.df$savings.rprotobuf.gz, labels=tmp.df$dataset, pos=4, cex=my.cex) - -#outlier.dfs <- clean.df[c(which(clean.df$savings.diff == min(clean.df$savings.diff)), - -legend("topleft", c("Raw", "Gzip Compressed"), pch=1:2, col=c("red", "blue"), cex=my.cex) - -interesting.df <- clean.df[unique(c(which(clean.df$savings.diff == min(clean.df$savings.diff)), - which(clean.df$savings.diff == max(clean.df$savings.diff)), - which(clean.df$savings.diff.gz == max(clean.df$savings.diff.gz)), - which(clean.df$dataset == "TOTAL"))),c("dataset", "object.size", "serialized", "gzipped serialized", "RProtoBuf", "gzipped RProtoBuf", "savings.serialized", "savings.serialized.gz", "savings.rprotobuf", "savings.rprotobuf.gz")] -# Print without .00 in xtable -interesting.df$object.size <- as.integer(interesting.df$object.size) -par("mar"=old.mar) - - -################################################### -### code chunk number 21: article.Rnw:1231-1235 -################################################### -require(HistogramTools) -readProtoFiles(package="HistogramTools") -hist <- HistogramTools.HistogramState$read("hist.pb") -plot(as.histogram(hist), main="") - - -################################################### -### code chunk number 22: article.Rnw:1323-1330 (eval = FALSE) -################################################### -## library("RProtoBuf") -## library("httr") -## -## req <- GET('https://demo.ocpu.io/MASS/data/Animals/pb') -## output <- unserialize_pb(req$content) -## -## identical(output, MASS::Animals) - - -################################################### -### code chunk number 23: article.Rnw:1380-1396 (eval = FALSE) -################################################### -## library("httr") -## library("RProtoBuf") -## -## args <- list(n=42, mean=100) -## payload <- serialize_pb(args, NULL) -## -## req <- POST ( -## url = "https://demo.ocpu.io/stats/R/rnorm/pb", -## body = payload, -## add_headers ( -## "Content-Type" = "application/x-protobuf" -## ) -## ) -## -## output <- unserialize_pb(req$content) -## print(output) - - -################################################### -### code chunk number 24: article.Rnw:1400-1403 (eval = FALSE) -################################################### -## fnargs <- unserialize_pb(inputmsg) -## val <- do.call(stats::rnorm, fnargs) -## outputmsg <- serialize_pb(val) - - Copied: papers/jss/jss1313.R (from rev 943, papers/jss/article.R) =================================================================== --- papers/jss/jss1313.R (rev 0) +++ papers/jss/jss1313.R 2015-04-13 19:33:59 UTC (rev 944) @@ -0,0 +1,297 @@ +### R code from vignette source '/home/edd/svn/rprotobuf/papers/jss/article.Rnw' + +################################################### +### code chunk number 1: article.Rnw:130-136 +################################################### +## cf http://www.jstatsoft.org/style#q12 +options(prompt = "R> ", + continue = "+ ", + width = 70, + useFancyQuotes = FALSE, + digits = 4) + + +################################################### +### code chunk number 2: article.Rnw:318-326 +################################################### +library("RProtoBuf") +p <- new(tutorial.Person, id=1, + name="Dirk") +p$name +p$name <- "Murray" +cat(as.character(p)) +serialize(p, NULL) +class(p) + + +################################################### +### code chunk number 3: article.Rnw:421-422 +################################################### +p <- new(tutorial.Person, name = "Murray", id = 1) + + +################################################### +### code chunk number 4: article.Rnw:431-434 +################################################### +p$name +p$id +p$email <- "murray at stokely.org" + + +################################################### +### code chunk number 5: article.Rnw:442-445 +################################################### +p[["name"]] <- "Murray Stokely" +p[[ 2 ]] <- 3 +p[["email"]] + + +################################################### +### code chunk number 6: article.Rnw:461-462 +################################################### +p + + +################################################### +### code chunk number 7: article.Rnw:469-470 +################################################### +writeLines(as.character(p)) + + +################################################### +### code chunk number 8: article.Rnw:483-484 +################################################### +serialize(p, NULL) + + +################################################### +### code chunk number 9: article.Rnw:489-492 +################################################### +tf1 <- tempfile() +serialize(p, tf1) +readBin(tf1, raw(0), 500) + + +################################################### +### code chunk number 10: article.Rnw:538-540 +################################################### +msg <- read(tutorial.Person, tf1) +writeLines(as.character(msg)) + + +################################################### +### code chunk number 11: article.Rnw:660-661 +################################################### +new(tutorial.Person) + + +################################################### +### code chunk number 12: article.Rnw:685-690 +################################################### +tutorial.Person$email +tutorial.Person$email$is_required() +tutorial.Person$email$type() +tutorial.Person$email$as.character() +class(tutorial.Person$email) + + +################################################### +### code chunk number 13: article.Rnw:702-709 +################################################### +tutorial.Person$PhoneType +tutorial.Person$PhoneType$WORK +class(tutorial.Person$PhoneType) +tutorial.Person$PhoneType$value(1) +tutorial.Person$PhoneType$value(name="HOME") +tutorial.Person$PhoneType$value(number=1) +class(tutorial.Person$PhoneType$value(1)) + + +################################################### +### code chunk number 14: article.Rnw:805-808 +################################################### +if (!exists("JSSPaper.Example1", "RProtoBuf:DescriptorPool")) { + readProtoFiles(file="int64.proto") +} + + +################################################### +### code chunk number 15: article.Rnw:830-834 +################################################### +as.integer(2^31-1) +as.integer(2^31 - 1) + as.integer(1) +2^31 +class(2^31) + + +################################################### +### code chunk number 16: article.Rnw:846-847 +################################################### +2^53 == (2^53 + 1) + + +################################################### +### code chunk number 17: article.Rnw:898-900 +################################################### +msg <- serialize_pb(iris, NULL) +identical(iris, unserialize_pb(msg)) + + +################################################### +### code chunk number 18: article.Rnw:928-931 +################################################### +datasets <- as.data.frame(data(package="datasets")$results) +datasets$name <- sub("\\s+.*$", "", datasets$Item) +n <- nrow(datasets) + + +################################################### +### code chunk number 19: article.Rnw:949-992 +################################################### +datasets$object.size <- unname(sapply(datasets$name, function(x) object.size(eval(as.name(x))))) + +datasets$R.serialize.size <- unname(sapply(datasets$name, function(x) length(serialize(eval(as.name(x)), NULL)))) + +datasets$R.serialize.size <- unname(sapply(datasets$name, function(x) length(serialize(eval(as.name(x)), NULL)))) + +datasets$R.serialize.size.gz <- unname(sapply(datasets$name, function(x) length(memCompress(serialize(eval(as.name(x)), NULL), "gzip")))) + +datasets$RProtoBuf.serialize.size <- unname(sapply(datasets$name, function(x) length(serialize_pb(eval(as.name(x)), NULL)))) + +datasets$RProtoBuf.serialize.size.gz <- unname(sapply(datasets$name, function(x) length(memCompress(serialize_pb(eval(as.name(x)), NULL), "gzip")))) + +clean.df <- data.frame(dataset=datasets$name, + object.size=datasets$object.size, + "serialized"=datasets$R.serialize.size, + "gzipped serialized"=datasets$R.serialize.size.gz, + "RProtoBuf"=datasets$RProtoBuf.serialize.size, + "gzipped RProtoBuf"=datasets$RProtoBuf.serialize.size.gz, + "ratio.serialized" = datasets$R.serialize.size / datasets$object.size, + "ratio.rprotobuf" = datasets$RProtoBuf.serialize.size / datasets$object.size, + "ratio.serialized.gz" = datasets$R.serialize.size.gz / datasets$object.size, + "ratio.rprotobuf.gz" = datasets$RProtoBuf.serialize.size.gz / datasets$object.size, + "savings.serialized" = 1-(datasets$R.serialize.size / datasets$object.size), + "savings.rprotobuf" = 1-(datasets$RProtoBuf.serialize.size / datasets$object.size), + "savings.serialized.gz" = 1-(datasets$R.serialize.size.gz / datasets$object.size), + "savings.rprotobuf.gz" = 1-(datasets$RProtoBuf.serialize.size.gz / datasets$object.size), + check.names=FALSE) + +all.df<-data.frame(dataset="TOTAL", object.size=sum(datasets$object.size), + "serialized"=sum(datasets$R.serialize.size), + "gzipped serialized"=sum(datasets$R.serialize.size.gz), + "RProtoBuf"=sum(datasets$RProtoBuf.serialize.size), + "gzipped RProtoBuf"=sum(datasets$RProtoBuf.serialize.size.gz), + "ratio.serialized" = sum(datasets$R.serialize.size) / sum(datasets$object.size), + "ratio.rprotobuf" = sum(datasets$RProtoBuf.serialize.size) / sum(datasets$object.size), + "ratio.serialized.gz" = sum(datasets$R.serialize.size.gz) / sum(datasets$object.size), + "ratio.rprotobuf.gz" = sum(datasets$RProtoBuf.serialize.size.gz) / sum(datasets$object.size), + "savings.serialized" = 1-(sum(datasets$R.serialize.size) / sum(datasets$object.size)), + "savings.rprotobuf" = 1-(sum(datasets$RProtoBuf.serialize.size) / sum(datasets$object.size)), + "savings.serialized.gz" = 1-(sum(datasets$R.serialize.size.gz) / sum(datasets$object.size)), + "savings.rprotobuf.gz" = 1-(sum(datasets$RProtoBuf.serialize.size.gz) / sum(datasets$object.size)), + check.names=FALSE) +clean.df<-rbind(clean.df, all.df) + + +################################################### +### code chunk number 20: SER +################################################### +old.mar<-par("mar") +new.mar<-old.mar +new.mar[3]<-0 +new.mar[4]<-0 +my.cex<-1.3 +par("mar"=new.mar) +plot(clean.df$savings.serialized, clean.df$savings.rprotobuf, pch=1, col="red", las=1, xlab="Serialization Space Savings", ylab="Protocol Buffer Space Savings", xlim=c(0,1),ylim=c(0,1),cex.lab=my.cex, cex.axis=my.cex) +points(clean.df$savings.serialized.gz, clean.df$savings.rprotobuf.gz,pch=2, col="blue") +# grey dotted diagonal +abline(a=0,b=1, col="grey",lty=2,lwd=3) + +# find point furthest off the X axis. +clean.df$savings.diff <- clean.df$savings.serialized - clean.df$savings.rprotobuf +clean.df$savings.diff.gz <- clean.df$savings.serialized.gz - clean.df$savings.rprotobuf.gz + +# The one to label. +tmp.df <- clean.df[which(clean.df$savings.diff == min(clean.df$savings.diff)),] +# This minimum means most to the left of our line, so pos=2 is label to the left +text(tmp.df$savings.serialized, tmp.df$savings.rprotobuf, labels=tmp.df$dataset, pos=2, cex=my.cex) + +# Some gziped version +# text(tmp.df$savings.serialized.gz, tmp.df$savings.rprotobuf.gz, labels=tmp.df$dataset, pos=2, cex=my.cex) + +# Second one is also an outlier +tmp.df <- clean.df[which(clean.df$savings.diff == sort(clean.df$savings.diff)[2]),] +# This minimum means most to the left of our line, so pos=2 is label to the left +text(tmp.df$savings.serialized, tmp.df$savings.rprotobuf, labels=tmp.df$dataset, pos=2, cex=my.cex) +#text(tmp.df$savings.serialized.gz, tmp.df$savings.rprotobuf.gz, labels=tmp.df$dataset, pos=my.cex) + + +tmp.df <- clean.df[which(clean.df$savings.diff == max(clean.df$savings.diff)),] +# This minimum means most to the right of the diagonal, so pos=4 is label to the right +# Only show the gziped one. +#text(tmp.df$savings.serialized, tmp.df$savings.rprotobuf, labels=tmp.df$dataset, pos=4, cex=my.cex) +text(tmp.df$savings.serialized.gz, tmp.df$savings.rprotobuf.gz, labels=tmp.df$dataset, pos=4, cex=my.cex) + +#outlier.dfs <- clean.df[c(which(clean.df$savings.diff == min(clean.df$savings.diff)), + +legend("topleft", c("Raw", "Gzip Compressed"), pch=1:2, col=c("red", "blue"), cex=my.cex) + +interesting.df <- clean.df[unique(c(which(clean.df$savings.diff == min(clean.df$savings.diff)), + which(clean.df$savings.diff == max(clean.df$savings.diff)), + which(clean.df$savings.diff.gz == max(clean.df$savings.diff.gz)), + which(clean.df$dataset == "TOTAL"))),c("dataset", "object.size", "serialized", "gzipped serialized", "RProtoBuf", "gzipped RProtoBuf", "savings.serialized", "savings.serialized.gz", "savings.rprotobuf", "savings.rprotobuf.gz")] +# Print without .00 in xtable +interesting.df$object.size <- as.integer(interesting.df$object.size) +par("mar"=old.mar) + + +################################################### +### code chunk number 21: article.Rnw:1231-1235 +################################################### +require(HistogramTools) +readProtoFiles(package="HistogramTools") +hist <- HistogramTools.HistogramState$read("hist.pb") +plot(as.histogram(hist), main="") + + +################################################### +### code chunk number 22: article.Rnw:1323-1330 (eval = FALSE) +################################################### +## library("RProtoBuf") +## library("httr") +## +## req <- GET('https://demo.ocpu.io/MASS/data/Animals/pb') +## output <- unserialize_pb(req$content) +## +## identical(output, MASS::Animals) + + +################################################### +### code chunk number 23: article.Rnw:1380-1396 (eval = FALSE) +################################################### +## library("httr") +## library("RProtoBuf") +## +## args <- list(n=42, mean=100) +## payload <- serialize_pb(args, NULL) +## +## req <- POST ( +## url = "https://demo.ocpu.io/stats/R/rnorm/pb", +## body = payload, +## add_headers ( +## "Content-Type" = "application/x-protobuf" +## ) +## ) +## +## output <- unserialize_pb(req$content) +## print(output) + + +################################################### +### code chunk number 24: article.Rnw:1400-1403 (eval = FALSE) +################################################### +## fnargs <- unserialize_pb(inputmsg) +## val <- do.call(stats::rnorm, fnargs) +## outputmsg <- serialize_pb(val) + + From noreply at r-forge.r-project.org Mon Apr 13 21:51:21 2015 From: noreply at r-forge.r-project.org (noreply at r-forge.r-project.org) Date: Mon, 13 Apr 2015 21:51:21 +0200 (CEST) Subject: [Rprotobuf-commits] r945 - in papers/jss: . JSSemails Message-ID: <20150413195121.DB6611876F7@r-forge.r-project.org> Author: edd Date: 2015-04-13 21:51:21 +0200 (Mon, 13 Apr 2015) New Revision: 945 Added: papers/jss/JSSemails/ papers/jss/JSSemails/conditional-acceptance-email-2015-04.txt papers/jss/JSSemails/conditional-acceptance-notes-2015-04.txt Removed: papers/jss/conditional-acceptance-email-2015-04.txt papers/jss/conditional-acceptance-notes-2015-04.txt Modified: papers/jss/Makefile papers/jss/jss1313.R Log: some file cleanups Copied: papers/jss/JSSemails/conditional-acceptance-email-2015-04.txt (from rev 943, papers/jss/conditional-acceptance-email-2015-04.txt) =================================================================== --- papers/jss/JSSemails/conditional-acceptance-email-2015-04.txt (rev 0) +++ papers/jss/JSSemails/conditional-acceptance-email-2015-04.txt 2015-04-13 19:51:21 UTC (rev 945) @@ -0,0 +1,61 @@ +From: Editor of the Journal of Statistical Software +To: "edd at debian.org" +Subject: JSS 1313 Conditional Acceptance +Date: Mon, 6 Apr 2015 19:30:32 +0000 + +Dear author, + +Your submission + + JSS 1313 + +is conditionally accepted for publication in JSS. + +Your manuscript has just finished the post-processing stage. In order to +continue in the process there are a few changes that need to be made. Attached +to this email is a comments file where you can find all the necessary changes. + +For further questions please see FAQ at http://www.jstatsoft.org/style. + + +Please send the full sources for your submission to the technical editor ( +editor at jstatsoft.org). It should contain: + + + 1. The full sources of the latest version of the software. (Binary versions + can be provided in addition.) + 2. The .tex, .bib, and all graphics for the manuscript: where the names of + your files should be - your .tex file should be called jssxxx.tex and your + .bib file should be called jssxxx.bib. As well as a complied .pdf version + of your manuscript. + 3. Information on how to replicate the examples in the manuscript. (Typically, + this is a self-contained standalone script file that loads/calls/sources + the software package from (1). If data or other external files are needed, + that are not yet provided with the software package, include these as + well.) Please use subdirectories for Figures/ and Code/ + 4. Please wrap all these files into a single .zip (or .tar.gz) file. + 5. Please make sure the .zip files only contains the necessary files. That is, + please do not include .aux, .log, etc. files and any unused files such + as jss.cls, jss.bst, jsslogo.jpg, etc. + + +Note for R authors: If you have prepared your manuscript using Sweave, the +files in (2) can be produced by Sweave, those in (3) by Stangle (possibly +enhancing the comments). Also indicate in your e-mail that Sweave was used and +the technical editor will provide you with further Sweave-specific information. + +Thanks for choosing JSS and contributing to free statistical software. + +Best regards, + +Jan de Leeuw +Bettina Gr?n +Achim Zeileis + + + + + + +---------------------------------------------------------------------- +xplain text (us-ascii): JSS 1313 post comments.txt, JSS 1313 [display] Copied: papers/jss/JSSemails/conditional-acceptance-notes-2015-04.txt (from rev 943, papers/jss/conditional-acceptance-notes-2015-04.txt) =================================================================== --- papers/jss/JSSemails/conditional-acceptance-notes-2015-04.txt (rev 0) +++ papers/jss/JSSemails/conditional-acceptance-notes-2015-04.txt 2015-04-13 19:51:21 UTC (rev 945) @@ -0,0 +1,85 @@ +JSS 1313: Eddelbuettel, Stokely, Ooms + +RProtoBuf: Efficient Cross-Language Data Serialization in R + +--------------------------------------------------------- +For further instruction on JSS style requirements please see the JSS style manual (in particular section 2.1 Style Checklist) at http://www.jstatsoft.org/downloads/JSSstyle.zip + + ## START DEdd: Inserted per copy/paste from jss.pdf: + + 2.1 Style checklist + + A quick check for the most important aspects of the JSS style is given + below. Authors should make sure that all of them are addressed in the ?nal + version. More details can be found in the remainder of this manual. + ? The manuscript can be compiled by pdfLATEX. + ? \proglang, \pkg and \code have been used for highlighting throughout the paper + (including titles and references), except where explicitly escaped. + ? References are provided in a .bib BibTEX database and included in the text by \cite, + \citep, \citet, etc. + ? Titles and headers are formatted properly: + ? \title in title style, + ? \section etc. in sentence style, + ? all titles in the BibTEX ?le in title style. + ? Figures, tables and equations are marked with a \label and referred to by \ref, e.g., + ?Figure~\ref{...}?. + ? Software packages are \cite{}d properly. + + ## END DEdd: Inserted per copy/paste from jss.pdf: + +Also see FAQ at: http://www.jstatsoft.org/style + +For further references please see RECENT JSS papers for detailed documentation and examples. +--------------------------------------------------------- + + +From the editorial team: + +o From one reviewer: As far as I can see there's only one difference between +the two columns of Table 3. It would be nice to highlight this. + + ## DEdd: Done, added a sentence below table and tightened wording in that + Table note. + + +Manuscript style comments: + +o Code should have enough spaces to facilitate reading. Please include spaces before and after operators and after commas (unless spaces have syntactical meaning). + + ## DEdd: No change, we were good already + +o The table in Figure 2 should have row/column labels in sentence +style. (Only the first word of a label should be capitalized). + + ## DEdd: Done (not sure I like it better) + +o In all cases, code input/output must fit within the normal text width of the manuscript. Thus, code input should have appropriate line breaks and code output should preferably be generated with a suitable width (or otherwise edited). E.g., see p. 9. + + ## DEdd: Replaces the Sweave code with its latex output and manually broke the long line + +o For bullet lists/itemized lists please use either a comma, semi-colon, or period at the end of each item. + + ## DEdd: Done; one small change + +o As a reminder, please make sure that: + - \proglang, \pkg and \code have been used for highlighting throughout the paper (including titles and references), except where explicitly escaped. + + +References: + +o John Wiley & Sons (not: Wiley, John Wiley & Sons Inc.) + + ## DEdd We only had one 'Wiley' where I removed a stray ".com" + +o As a reminder, + - Please make sure that all software packages are \cite{}'d properly. + + - All references should be in title style. + + - See FAQ for specific reference instructions. + + ## DEdd Update bibliography to current version numbers, and title styled + +Code: + +o As a reminder, please make sure that the files needed to replicate all code/examples within the manuscript are included in a standalone replication script. Modified: papers/jss/Makefile =================================================================== --- papers/jss/Makefile 2015-04-13 19:33:59 UTC (rev 944) +++ papers/jss/Makefile 2015-04-13 19:51:21 UTC (rev 945) @@ -3,7 +3,7 @@ clean: rm -fr ${article}.out ${article}.aux ${article}.log ${article}.bbl \ - ${article}.blg ${article}.brf figures/fig-0??.pdf + ${article}.blg ${article}.brf figures/fig-???.pdf ${article}.pdf: ${article}.Rnw R CMD Sweave ${article}.Rnw Deleted: papers/jss/conditional-acceptance-email-2015-04.txt =================================================================== --- papers/jss/conditional-acceptance-email-2015-04.txt 2015-04-13 19:33:59 UTC (rev 944) +++ papers/jss/conditional-acceptance-email-2015-04.txt 2015-04-13 19:51:21 UTC (rev 945) @@ -1,61 +0,0 @@ -From: Editor of the Journal of Statistical Software -To: "edd at debian.org" -Subject: JSS 1313 Conditional Acceptance -Date: Mon, 6 Apr 2015 19:30:32 +0000 - -Dear author, - -Your submission - - JSS 1313 - -is conditionally accepted for publication in JSS. - -Your manuscript has just finished the post-processing stage. In order to -continue in the process there are a few changes that need to be made. Attached -to this email is a comments file where you can find all the necessary changes. - -For further questions please see FAQ at http://www.jstatsoft.org/style. - - -Please send the full sources for your submission to the technical editor ( -editor at jstatsoft.org). It should contain: - - - 1. The full sources of the latest version of the software. (Binary versions - can be provided in addition.) - 2. The .tex, .bib, and all graphics for the manuscript: where the names of - your files should be - your .tex file should be called jssxxx.tex and your - .bib file should be called jssxxx.bib. As well as a complied .pdf version - of your manuscript. - 3. Information on how to replicate the examples in the manuscript. (Typically, - this is a self-contained standalone script file that loads/calls/sources - the software package from (1). If data or other external files are needed, - that are not yet provided with the software package, include these as - well.) Please use subdirectories for Figures/ and Code/ - 4. Please wrap all these files into a single .zip (or .tar.gz) file. - 5. Please make sure the .zip files only contains the necessary files. That is, - please do not include .aux, .log, etc. files and any unused files such - as jss.cls, jss.bst, jsslogo.jpg, etc. - - -Note for R authors: If you have prepared your manuscript using Sweave, the -files in (2) can be produced by Sweave, those in (3) by Stangle (possibly -enhancing the comments). Also indicate in your e-mail that Sweave was used and -the technical editor will provide you with further Sweave-specific information. - -Thanks for choosing JSS and contributing to free statistical software. - -Best regards, - -Jan de Leeuw -Bettina Gr?n -Achim Zeileis - - - - - - ----------------------------------------------------------------------- -xplain text (us-ascii): JSS 1313 post comments.txt, JSS 1313 [display] Deleted: papers/jss/conditional-acceptance-notes-2015-04.txt =================================================================== --- papers/jss/conditional-acceptance-notes-2015-04.txt 2015-04-13 19:33:59 UTC (rev 944) +++ papers/jss/conditional-acceptance-notes-2015-04.txt 2015-04-13 19:51:21 UTC (rev 945) @@ -1,85 +0,0 @@ -JSS 1313: Eddelbuettel, Stokely, Ooms - -RProtoBuf: Efficient Cross-Language Data Serialization in R - ---------------------------------------------------------- -For further instruction on JSS style requirements please see the JSS style manual (in particular section 2.1 Style Checklist) at http://www.jstatsoft.org/downloads/JSSstyle.zip - - ## START DEdd: Inserted per copy/paste from jss.pdf: - - 2.1 Style checklist - - A quick check for the most important aspects of the JSS style is given - below. Authors should make sure that all of them are addressed in the ?nal - version. More details can be found in the remainder of this manual. - ? The manuscript can be compiled by pdfLATEX. - ? \proglang, \pkg and \code have been used for highlighting throughout the paper - (including titles and references), except where explicitly escaped. - ? References are provided in a .bib BibTEX database and included in the text by \cite, - \citep, \citet, etc. - ? Titles and headers are formatted properly: - ? \title in title style, - ? \section etc. in sentence style, - ? all titles in the BibTEX ?le in title style. - ? Figures, tables and equations are marked with a \label and referred to by \ref, e.g., - ?Figure~\ref{...}?. - ? Software packages are \cite{}d properly. - - ## END DEdd: Inserted per copy/paste from jss.pdf: - -Also see FAQ at: http://www.jstatsoft.org/style - -For further references please see RECENT JSS papers for detailed documentation and examples. ---------------------------------------------------------- - - -From the editorial team: - -o From one reviewer: As far as I can see there's only one difference between -the two columns of Table 3. It would be nice to highlight this. - - ## DEdd: Done, added a sentence below table and tightened wording in that - Table note. - - -Manuscript style comments: - -o Code should have enough spaces to facilitate reading. Please include spaces before and after operators and after commas (unless spaces have syntactical meaning). - - ## DEdd: No change, we were good already - -o The table in Figure 2 should have row/column labels in sentence -style. (Only the first word of a label should be capitalized). - - ## DEdd: Done (not sure I like it better) - -o In all cases, code input/output must fit within the normal text width of the manuscript. Thus, code input should have appropriate line breaks and code output should preferably be generated with a suitable width (or otherwise edited). E.g., see p. 9. - - ## DEdd: Replaces the Sweave code with its latex output and manually broke the long line - -o For bullet lists/itemized lists please use either a comma, semi-colon, or period at the end of each item. - - ## DEdd: Done; one small change - -o As a reminder, please make sure that: - - \proglang, \pkg and \code have been used for highlighting throughout the paper (including titles and references), except where explicitly escaped. - - -References: - -o John Wiley & Sons (not: Wiley, John Wiley & Sons Inc.) - - ## DEdd We only had one 'Wiley' where I removed a stray ".com" - -o As a reminder, - - Please make sure that all software packages are \cite{}'d properly. - - - All references should be in title style. - - - See FAQ for specific reference instructions. - - ## DEdd Update bibliography to current version numbers, and title styled - -Code: - -o As a reminder, please make sure that the files needed to replicate all code/examples within the manuscript are included in a standalone replication script. Modified: papers/jss/jss1313.R =================================================================== --- papers/jss/jss1313.R 2015-04-13 19:33:59 UTC (rev 944) +++ papers/jss/jss1313.R 2015-04-13 19:51:21 UTC (rev 945) @@ -1,7 +1,7 @@ -### R code from vignette source '/home/edd/svn/rprotobuf/papers/jss/article.Rnw' +### R code from vignette source '/home/edd/svn/rprotobuf/papers/jss/jss1313.Rnw' ################################################### -### code chunk number 1: article.Rnw:130-136 +### code chunk number 1: jss1313.Rnw:130-136 ################################################### ## cf http://www.jstatsoft.org/style#q12 options(prompt = "R> ", @@ -12,7 +12,7 @@ ################################################### -### code chunk number 2: article.Rnw:318-326 +### code chunk number 2: jss1313.Rnw:318-326 ################################################### library("RProtoBuf") p <- new(tutorial.Person, id=1, @@ -25,13 +25,13 @@ ################################################### -### code chunk number 3: article.Rnw:421-422 +### code chunk number 3: jss1313.Rnw:421-422 ################################################### p <- new(tutorial.Person, name = "Murray", id = 1) ################################################### -### code chunk number 4: article.Rnw:431-434 +### code chunk number 4: jss1313.Rnw:431-434 ################################################### p$name p$id @@ -39,7 +39,7 @@ ################################################### -### code chunk number 5: article.Rnw:442-445 +### code chunk number 5: jss1313.Rnw:442-445 ################################################### p[["name"]] <- "Murray Stokely" p[[ 2 ]] <- 3 @@ -47,25 +47,25 @@ ################################################### -### code chunk number 6: article.Rnw:461-462 +### code chunk number 6: jss1313.Rnw:461-462 ################################################### p ################################################### -### code chunk number 7: article.Rnw:469-470 +### code chunk number 7: jss1313.Rnw:469-470 ################################################### writeLines(as.character(p)) ################################################### -### code chunk number 8: article.Rnw:483-484 +### code chunk number 8: jss1313.Rnw:483-484 ################################################### serialize(p, NULL) ################################################### -### code chunk number 9: article.Rnw:489-492 +### code chunk number 9: jss1313.Rnw:489-492 ################################################### tf1 <- tempfile() serialize(p, tf1) @@ -73,20 +73,20 @@ ################################################### -### code chunk number 10: article.Rnw:538-540 +### code chunk number 10: jss1313.Rnw:538-540 ################################################### msg <- read(tutorial.Person, tf1) writeLines(as.character(msg)) ################################################### -### code chunk number 11: article.Rnw:660-661 +### code chunk number 11: jss1313.Rnw:660-661 ################################################### new(tutorial.Person) ################################################### -### code chunk number 12: article.Rnw:685-690 +### code chunk number 12: jss1313.Rnw:685-690 ################################################### tutorial.Person$email tutorial.Person$email$is_required() @@ -96,7 +96,7 @@ ################################################### -### code chunk number 13: article.Rnw:702-709 +### code chunk number 13: jss1313.Rnw:702-709 ################################################### tutorial.Person$PhoneType tutorial.Person$PhoneType$WORK @@ -108,7 +108,7 @@ ################################################### -### code chunk number 14: article.Rnw:805-808 +### code chunk number 14: jss1313.Rnw:805-808 ################################################### if (!exists("JSSPaper.Example1", "RProtoBuf:DescriptorPool")) { readProtoFiles(file="int64.proto") @@ -116,7 +116,7 @@ ################################################### -### code chunk number 15: article.Rnw:830-834 +### code chunk number 15: jss1313.Rnw:830-834 ################################################### as.integer(2^31-1) as.integer(2^31 - 1) + as.integer(1) @@ -125,20 +125,20 @@ ################################################### -### code chunk number 16: article.Rnw:846-847 +### code chunk number 16: jss1313.Rnw:846-847 ################################################### 2^53 == (2^53 + 1) ################################################### -### code chunk number 17: article.Rnw:898-900 +### code chunk number 17: jss1313.Rnw:898-900 ################################################### msg <- serialize_pb(iris, NULL) identical(iris, unserialize_pb(msg)) ################################################### -### code chunk number 18: article.Rnw:928-931 +### code chunk number 18: jss1313.Rnw:928-931 ################################################### datasets <- as.data.frame(data(package="datasets")$results) datasets$name <- sub("\\s+.*$", "", datasets$Item) @@ -146,7 +146,7 @@ ################################################### -### code chunk number 19: article.Rnw:949-992 +### code chunk number 19: jss1313.Rnw:949-992 ################################################### datasets$object.size <- unname(sapply(datasets$name, function(x) object.size(eval(as.name(x))))) @@ -246,7 +246,7 @@ ################################################### -### code chunk number 21: article.Rnw:1231-1235 +### code chunk number 21: jss1313.Rnw:1231-1235 ################################################### require(HistogramTools) readProtoFiles(package="HistogramTools") @@ -255,7 +255,7 @@ ################################################### -### code chunk number 22: article.Rnw:1323-1330 (eval = FALSE) +### code chunk number 22: jss1313.Rnw:1323-1330 (eval = FALSE) ################################################### ## library("RProtoBuf") ## library("httr") @@ -267,7 +267,7 @@ ################################################### -### code chunk number 23: article.Rnw:1380-1396 (eval = FALSE) +### code chunk number 23: jss1313.Rnw:1380-1396 (eval = FALSE) ################################################### ## library("httr") ## library("RProtoBuf") @@ -288,7 +288,7 @@ ################################################### -### code chunk number 24: article.Rnw:1400-1403 (eval = FALSE) +### code chunk number 24: jss1313.Rnw:1400-1403 (eval = FALSE) ################################################### ## fnargs <- unserialize_pb(inputmsg) ## val <- do.call(stats::rnorm, fnargs) From noreply at r-forge.r-project.org Mon Apr 13 21:53:43 2015 From: noreply at r-forge.r-project.org (noreply at r-forge.r-project.org) Date: Mon, 13 Apr 2015 21:53:43 +0200 (CEST) Subject: [Rprotobuf-commits] r946 - in papers/jss: . JSSemails Message-ID: <20150413195343.8BE20187741@r-forge.r-project.org> Author: edd Date: 2015-04-13 21:53:43 +0200 (Mon, 13 Apr 2015) New Revision: 946 Added: papers/jss/JSSemails/article-submitted-2014-03.pdf papers/jss/JSSemails/response-to-reviewers.tex Removed: papers/jss/article-submitted-2014-03.pdf papers/jss/response-to-reviewers.tex Log: some file cleanups, part two Copied: papers/jss/JSSemails/article-submitted-2014-03.pdf (from rev 943, papers/jss/article-submitted-2014-03.pdf) =================================================================== (Binary files differ) Copied: papers/jss/JSSemails/response-to-reviewers.tex (from rev 943, papers/jss/response-to-reviewers.tex) =================================================================== --- papers/jss/JSSemails/response-to-reviewers.tex (rev 0) +++ papers/jss/JSSemails/response-to-reviewers.tex 2015-04-13 19:53:43 UTC (rev 946) @@ -0,0 +1,455 @@ + +\documentclass[10pt]{article} +\usepackage{url} +\usepackage{vmargin} +\setpapersize{USletter} +% left top right bottom -- headheight headsep footheight footskop +\setmarginsrb{1in}{1in}{1in}{0.5in}{0pt}{0mm}{10pt}{0.5in} +\usepackage{charter} + +\setlength{\parskip}{1ex plus1ex minus1ex} +\setlength{\parindent}{0pt} + +\newcommand{\proglang}[1]{\textsf{#1}} +\newcommand{\pkg}[1]{{\fontseries{b}\selectfont #1}} + +\newcommand{\pointRaised}[2]{\smallskip %\hrule + \textsl{{\fontseries{b}\selectfont #1}: #2}\newline} +\newcommand{\simplePointRaised}[1]{\bigskip \hrule\textsl{#1} } +\newcommand{\reply}[1]{\textbf{Reply}:\ #1 \smallskip } %\hrule \smallskip} + +\begin{document} + +\author{Dirk Eddelbuettel\\Debian Project \and + Murray Stokely\\Google, Inc \and + Jeroen Ooms\\UCLA} +\title{Submission JSS 1313: \\ Response to Reviewers' Comments} +\maketitle +\thispagestyle{empty} + +Thank you for reviewing our manuscript, and for giving us an opportunity to +rewrite, extend and and tighten both the paper and the underlying package. + +\smallskip +We truly appreciate the comments and suggestions. Below, we have regrouped the sets +of comments, and have provided detailed point-by-point replies. +% +We hope that this satisfies the request for changes necessary to proceed with +the publication of the revised and updated manuscript, along with the revised +and updated package (which was recently resubmitted to CRAN as version 0.4.2). + +\section*{Response to Reviewer \#1} + +\pointRaised{Comment 1}{Overall, I think this is a strong paper. Cross-language communication + is a challenging problem, and good solutions for R are important to + establish R as a well-behaved member of a data analysis pipeline. The + paper is well written, and I recommend that it be accepted subject to + the suggestions below.} +\reply{Thank you. We are providing a point-by-point reply below.} + +\subsubsection*{More big picture, less details} + +\pointRaised{Comment 2}{Overall, I think the paper provides too much detail on + relatively unimportant topics and not enough on the reasoning behind + important design decisions. I think you could comfortably reduce the paper + by 5-10 pages, referring the interested reader to the documentation for + more detail.} +\reply{The paper is now six pages shorter at just 23 pages. + Sections 3 - 8 (all but Section 1 (``Introduction''), Section 2 (``Protocol Buffers''), + and Section 9 (``Conclusion'') have been thoroughly rewritten to address the specific and + general feedback in these reviews.} + +\pointRaised{Comment 3}{I'd recommend shrinking section 3 to ~2 pages, and removing the + subheadings. This section should quickly orient the reader to the + RProtobuf API so they understand the big picture before learning more + details in the subsequent sections. I'd recommend picking one OO style + and sticking to it in this section - two is confusing.} +\reply{We followed this recommendation, reduced section 3 to about + $2\frac{1}{2}$ pages, removed the subheadings and tightened the exposition.} + +\pointRaised{Comment 3}{Section 4 dives into the details without giving a good overview and + motivation. Why use S4 and not RC? How are the objects made mutable? + Why do you provide both generic function and message passing OO + styles? What does \$ do in this context? What the heck is a + pseudo-method? Spend more time on those big issues rather than + describing each class in detail. Reduce class descriptions to a + bulleted list giving a high-level overview, then encourage the reader + to refer to the documentation for further details. Similarly, Tables + 3-5 belong in the documentation, not in a vignette/paper.} +\reply{Done. RProtoBuf was designed and implemented before RC were + available, and this is now noted explicitly in a new footnote. Explanation of how + they are made mutable has been added. Better explanation of the + two styles and '\$' as been added. We are no longer using the + confusing term 'pseudo-method' anywhere. We also moved Tables 3-5 into the + documentation and out of the paper, as suggested.} + +\pointRaised{Comment 4}{Section 7 is weak. I think the important message is that RProtobuf is + being used in practice at large scale for for large data, and is + useful for communicating between R and Python. How can you make that + message stronger while avoiding (for the purposes of this paper) the + relatively unimportant details of the map-reduce setup?} +\reply{Done. Rewritten with more motivation taking into account this feedback.} + +\subsubsection*{R to/from Protobuf translation} + +\pointRaised{Comment 5}{The discussion of R to/from Protobuf could be improved. Table 9 would be + much simpler if instead of Message, you provided a "vectorised" + Messages class (this would also make the interface more consistent and + hence the package easier to use).} +\reply{This is a good observation that only became clear to us after + significant usage of \texttt{RProtoBuf}. Providing a full ``vectorized'' Messages class would require slicing + operators that let you quickly extract a given field from each + element of the message vector in order to be really useful. This + would require significant amounts of C++ code for efficient + manipulation on the order of data.table or other similar large C++ R + packages on CRAN. There is another package called Motobuf by other authors + that takes this approach but in practice (at least for the several hundred + users at Google), the ease-of-use provided by the simple Message interface of RProtoBuf + has won with users. It is still future work to keep the simple + interactive interface of RProtoBuf with the vectorized efficiency of + Motobuf. For now, users typically do their slicing of vectors like + this through a distributed database (NewSQL is the term of the day?) + like Dremel or other system and then just get the response Protocol + Buffers in return to the request.} + +\pointRaised{Comment 6}{Along these lines, I think it would make sense to combine sections 5 + and 6 and discuss translation challenges in both direction + simultaneously. At the minimum, add the equivalent for Table 9 that + shows how important R classes are converted to their protobuf + equivalents.} +\reply{Done. We have updated these sections to make it clearer that the main + distinction is between schema-based datastructures (Section 5) and + schema-less use where a catch-all \texttt{.proto} is used (Section 6). + Neither section is meant to focus on only a single direction of the + conversion, but how conversion works when you have a schema or not. + How important R classes are converted to their protobuf equivalents + isn't super useful as a C++, Java, or Python program is unlikely to + want to read in an R data.frame exactly as it is defined. Much more + likely is an application-specific message format is defined between the + two services, such as the HistogramTools example in the next section. + Much more detail has been added to an interesting part of section 6 -- + which datasets exactly are better served with RProtoBuf than + \texttt{base::serialize} and why?} + +\pointRaised{Comment 7}{You should discuss how missing values are handled for strings and + integers, and why enums are not equivalent to factors. I think you + could make explicit how coercion of factors, dates, times and matrices + occurs, and the implications of this on sharing data structures + between programming languages. For example, how do you share date/time + data between R and python using RProtoBuf?} +\reply{All of these details are application-specific, whereas + RProtoBuf is an infrastructure package. Distributed systems define + their own interfaces, with their own date/time fields, usually as + a double of fractional seconds since the unix epoch for the systems I + have worked on. An example is given for Histograms in the next + section. Factors could be represented as repeated enums in Protocol + Buffers, certainly, if that is how one wanted to define a schema.} + +\pointRaised{Comment 8}{Table 10 is dying to be a plot, and a natural companion would be to + show how long it takes to serialise data frames using both RProtoBuf + and R's native serialisation. Is there a performance penalty to using + protobufs?} +\reply{Done. Table 10 has been replaced with a plot, the outliers are + labeled, and the text now includes some interesting explanation + about the outliers. Page 4 explains that the R implementation of + Protocol Buffers uses reflection to make operations slower but makes + it more convenient for interactive data analysis. None of the + built-in datasets are large enough for performance to really come up + as an issue, and for any serialization method examples could be + found that significantly favor one over another in runtime, so we + don't think there will be benefit to adding anything here. } + +\subsubsection*{RObjectTables magic} + +\pointRaised{Comment 9}{The use of RObjectTables magic makes me uneasy. It doesn't seem like a + good fit for an infrastructure package and it's not clear what + advantages it has over explicitly loading a protobuf definition into + an object.} +\reply{Done. More information about the advantages and disadvantages of this + approach have been added.} + +\pointRaised{Comment 10}{Using global state makes understanding code much harder. In Table 1, + it's not obvious where \texttt{tutorial.Person} comes from. Is it loaded by + default by RProtobuf? This need some explanation. In Section 7, what + does \texttt{readProtoFiles()} do? Why does \texttt{RProtobuf} need to be attached + as well as \texttt{HistogramTools}? This needs more explanation, and a + comment on the implications of this approach on CRAN packages and + namespaces.} +\reply{Done. We followed this recommendation and added explanation for how + \texttt{tutorial.Person} is loaded, specifically : \emph{A small number of message types are imported when the + package is first loaded, including the tutorial.Person type we saw in + the last section.} Thank you also for spotting the superfluous attach + of \texttt{RProtoBuf}, it has been removed from the example.} + +\pointRaised{Comment 11}{ + I'd prefer you eliminate this magic from the magic, but failing that, + you need a good explanation of why.} +\reply{Done. We've added more explanation about this.} + +\subsubsection*{Code comments} + +\pointRaised{Comment 12}{Using \texttt{file.create()} to determine the absolute path seems like a bad idea.} +\reply{Done. We followed this recommendation and removed two instances of + \texttt{file.create()} for this purpose with calls to + \texttt{normalizePath} with \texttt{mustWork=FALSE}.} + +\subsubsection*{Minor niggles} + +\pointRaised{Comment 13}{Don't refer to the message passing style of OO as traditional.} +\reply{Done. We don't refer to this style as traditional anywhere in + the manuscript anymore.} + +\pointRaised{Comment 14}{In Section 3.4, if messages isn't a vectorised class, the default + print method should use \texttt{cat()} to eliminate the confusing \texttt{[1]}.} +\reply{Done, thanks.} + +\pointRaised{Comment 15}{The REXP definition would have been better defined using an enum that + matches R's SEXPTYPE "enum". But I guess that ship has sailed.} +\reply{Acknowledged. We chose to maintain compatibility with RHIPE here. The main +use of RProtoBuf is not with \texttt{rexp.proto} however -- it with +application-specific schemas in \texttt{.proto} files for sending data between +applications. Users that want to do something very R-specific are +welcome to use their own \texttt{.proto} files with an enum to represent R SEXPTYPEs.} + +\pointRaised{Comment 16}{Why does \texttt{serialize\_pb(CO2, NULL)} fail silently? Shouldn't it at least + warn that the serialization is partial?} +\reply{Done. We fixed this and \texttt{serialize\_pb} now works for all built-in datatypes in R + and no longer fails silently if it encounters something it can't serialize.} + +\section*{Response to Reviewer \#2} + +\pointRaised{Comment 1}{The paper gives an overview of the RProtoBuf package which implements an + R interface to the Protocol Buffers library for an efficient + serialization of objects. The paper is well written and easy to read. + Introductory code is clear and the package provides objects to play with + immediately without the need to jump through hoops, making it appealing. + The software implementation is executed well.} +\reply{Thank you.} + +\pointRaised{Comment 2}{There are, however, a few inconsistencies in the implementation and some + issues with specific sections in the paper. In the following both issues + will be addressed sequentially by their occurrence in the paper.} +\reply{Done. These and others have been identified and addressed. Thank you + for taking the time to enumerate these issues.} + +\pointRaised{Comment 3}{p.4 illustrates the use of messages. The class implements list-like + access via \texttt{[[} and \$, but strangely \texttt{names()} return NULL and \texttt{length() } + doesn't correspond to the number of fields leading to startling results like +the following:} + +\begin{verbatim} + > p +[1] "message of type 'tutorial.Person' with 2 fields set" + > length(p) +[1] 2 + > p[[3]] +[1] "" +\end{verbatim} +\reply{Done. We have corrected the list-like accessor, fixed \texttt{length()} to + correspond to the number of set fields, and added \texttt{names()}:} +\begin{verbatim} +> p +message of type 'tutorial.Person' with 0 fields set +> length(p) +[1] 0 +> p[[3]] +[1] "" +> p$id <- 1 +> length(p) +[1] 1 +> names(p) +[1] "name" "id" "email" "phone" +\end{verbatim} + +\pointRaised{Comment 3 cont.}{The inconsistencies get even more bizarre with descriptors (p.9):} + +\begin{verbatim} + > tutorial.Person$email +[1] "descriptor for field 'email' of type 'tutorial.Person' " + > tutorial.Person[["email"]] +Error in tutorial.Person[["email"]] : this S4 class is not subsettable + > names(tutorial.Person) +NULL + > length(tutorial.Person) +[1] 1 +\end{verbatim} +\reply{Done. We agree, and have addressed this inconsistency. Thank you for + catching this.} +\begin{verbatim} +> tutorial.Person$email +descriptor for field 'email' of type 'tutorial.Person' +> tutorial.Person[["email"]] +descriptor for field 'email' of type 'tutorial.Person' +> names(tutorial.Person) +[1] "name" "id" "email" "phone" "PhoneNumber" +[6] "PhoneType" +> length(tutorial.Person) +[1] 6 +\end{verbatim} + +\pointRaised{Comment 3 cont.}{It appears that there is no way to find out the fields of a descriptor + directly (although the low-level object methods seem to be exposed as + \texttt{\$field\_count()} and \texttt{\$fields()} - but that seems extremely cumbersome). + Again, implementing names() and subsetting may help here.} +\reply{Done. We have implemented \texttt{names} and subsetting. Thank you for the + suggestion.} +\begin{verbatim} +> tutorial.Person[[1]] +descriptor for field 'name' of type 'tutorial.Person' +> tutorial.Person[[2]] +descriptor for field 'id' of type 'tutorial.Person' +\end{verbatim} + +\pointRaised{Comment 4}{Another inconsistency concerns the \texttt{as.list()} method which by design + coerces objects to lists (see \texttt{?as.list}), but the implementation for + EnumDescriptor breaks that contract and returns a vector instead:} + +\begin{verbatim} + > is.list(as.list(tutorial.Person$PhoneType)) +[1] FALSE + > str(as.list(tutorial.Person$PhoneType)) + Named int [1:3] 0 1 2 + - attr(*, "names")= chr [1:3] "MOBILE" "HOME" "WORK" +\end{verbatim} + +\reply{Done, thank you. New output below:} +\begin{verbatim} +> is.list(as.list(tutorial.Person$PhoneType)) +[1] TRUE +> str(as.list(tutorial.Person$PhoneType)) +List of 3 + $ MOBILE: int 0 + $ HOME : int 1 + $ WORK : int 2 +\end{verbatim} + +\pointRaised{Comment 4 cont}{As with the other interfaces, names() returns NULL so it is again quite + difficult to perform even simple operations such as finding out the + values. It may be natural use some of the standard methods like names(), + levels() or similar. As with the previous cases, the lack of [[ support + makes it impossible to map named enum values to codes and vice-versa.} +\reply{Done, thank you. New output:} +\begin{verbatim} +> names(tutorial.Person$PhoneType) +[1] "MOBILE" "HOME" "WORK" +> tutorial.Person$PhoneType[["HOME"]] +[1] 1 +\end{verbatim} + +\pointRaised{Comment 5}{In general, the package would benefit from one pass of checks to assess + the consistency of the API. Since the authors intend direct interaction + with the objects via basic standard R methods, the classes should behave + consistently.} +\reply{We made several passes, correcting issues as documented in the + \texttt{ChangeLog} and now present in our latest 0.4.2 release on CRAN.} + +\pointRaised{Comment 6}{Finally, most classes implement coercion to characters, which is not + mentioned and is not quite intuitive for some objects. For example, one + may think that \texttt{as.character()} on a file descriptor returns let's say the + filename, but we get:} + +\begin{verbatim} + > cat(as.character(tutorial.Person$fileDescriptor())) +syntax = "proto2"; + +package tutorial; + +option java_package = "com.example.tutorial"; +option java_outer_classname = "AddressBookProtos"; +[...] +\end{verbatim} +\reply{The behavior is documented in the package documentation but + seemed like a minor detail not important for an already-long paper. + In choosing the debug output for a file descriptor we agree + that \texttt{filename} is a reasonable thing to expect, but we also + think that the contents of the \texttt{.proto} file is also + reasonable, but more useful. We document this in the help for + ``FileDescriptor-class'', the vignette, and other sources. + \texttt{@filename} is one of the slots of the FileDescriptor class + and so very easy to find. The contents of the \texttt{.proto} are + not as easily accessible in a slot, however, and so we find it much + more useful to be output with \texttt{as.character()}.} + +\pointRaised{Comment 7}{It is not necessary clear what java\_package has to do with a file + descriptor in R. Depending on the intention here, it may be useful to + explain this feature. +} +\reply{Done. This snippet has been removed as part of the general move of + less relevant details to the package documentation, but for + reference the \texttt{.proto} file syntax is defined in the Protocol Buffers + language guide which is referenced earlier. It is a cross platform + library and so this syntax specifies some parameters when Java code + is used to access the structures defined in this file. No such + special syntax is required in the \texttt{.proto} files for R + language code and so this line about java\_package was not relevant + or needed in any way for RProtoBuf and is documented elsewhere.} + +\subsubsection*{Other comments:} + +\pointRaised{Comment 8}{p.17: "does not support ... function, language or environment. Such + objects have no native equivalent type in Protocol Buffers, and have + little meaning outside the context or R" + That is certainly false. Native mirror of environments are hash tables - + a very useful type indeed. Language objects are just lists, so there is + no reason to not include them - they can be useful to store expressions + that may not be necessary specific to R. Further on p. 18 your run into + the same problem that could be fixed so easily.} +\reply{Acknowledged. Environments are more than just hash + tables because they include other configuration parameters that are + necessary to serialize as well to make sure + serialization/unserialization is indempotent, but we agree it is + cleaner and the package and the exposition in the paper to just make + sure we serialize everything. We can now fall back to + \texttt{base::serialize()} and storing the bits in a rawString type of + RProtoBuf to make the R schema-less serialization more complete.} + +\pointRaised{Comment 9}{The examples in sections 7 and 8 are somewhat weak. It does not seem + clear why one would wish to unleash the power of PB just to transfer + breaks and counts for plotting - even a simple ASCII file would do that + just fine. The main point in the example is presumably that there are + code generation methods for Hadoop based on PB IDL such that Hadoop can + be made aware of the data types, thus making a histogram a proper record + that won't be split, can be combined etc. -- yet that is not mentioned + nor a way presented how that can be leveraged in practice. The Python + example code simply uses a static example with constants to simulate the + output of a reducer so it doesn't illustrate the point - the reader is + left confused why something as trivial would require PB while a savvy + reader is not able to replicate the illustrated process. Possibly + explaining the benefits and providing more details on how one would + write such a job would make it much more relevant.} +\reply{Yes, we added more detail about the advantages of using a + proper data type for the histograms in this example that you mentioned here -- the + ability to write combiners, prevent arbitrary splitting of the + records, etc that can greatly improve performance. We agree with + the other reviewer that we don't want to get bogged down in details + about a particular MapReduce implementation (such as Hadoop) and so + now we specifically mention that goal here. + I think we make a better connection now between the + abstract MapReduce example given, and then the simpler Python + example code with a static example.} + +\pointRaised{Comment 10}{Section 8 is not very well motivated. It is much easier to use other + formats for HTTP exchange - JSON is probably the most popular, but even + CSV works in simple settings. PB is a much less common standard. The + main advantage of PB is the performance over the alternatives, but HTTP + services are not necessarily known for their high-throughput so why one + would sacrifice interoperability by using PB (they are still more hassle + and require special installations)? It would be useful if the reason + could be made explicit here or a better example chosen.} +\reply{Done. This section has been reworded to make it shorter and more + crisp, with fewer extraneous details about OpenCPU. Protocol + Buffers is an efficient protocol used between distributed systems at + many of the world's largest internet companies (Twitter, Sony, + Google, etc.) but the design and implementation of a large + enterprise-scale distributed system with a complex RPC system and + serialization needs is well beyond the scope of what we can add to a + paper about RProtoBuf. We chose this example because it is a much + more accessible example that any reader can use to easily + send/receive RPCs and parse the results with RProtoBuf.} + +\end{document} + +%%% Local Variables: +%%% mode: latex +%%% TeX-master: t +%%% End: Deleted: papers/jss/article-submitted-2014-03.pdf =================================================================== (Binary files differ) Deleted: papers/jss/response-to-reviewers.tex =================================================================== --- papers/jss/response-to-reviewers.tex 2015-04-13 19:51:21 UTC (rev 945) +++ papers/jss/response-to-reviewers.tex 2015-04-13 19:53:43 UTC (rev 946) @@ -1,455 +0,0 @@ - -\documentclass[10pt]{article} -\usepackage{url} -\usepackage{vmargin} -\setpapersize{USletter} -% left top right bottom -- headheight headsep footheight footskop -\setmarginsrb{1in}{1in}{1in}{0.5in}{0pt}{0mm}{10pt}{0.5in} -\usepackage{charter} - -\setlength{\parskip}{1ex plus1ex minus1ex} -\setlength{\parindent}{0pt} - -\newcommand{\proglang}[1]{\textsf{#1}} -\newcommand{\pkg}[1]{{\fontseries{b}\selectfont #1}} - -\newcommand{\pointRaised}[2]{\smallskip %\hrule - \textsl{{\fontseries{b}\selectfont #1}: #2}\newline} -\newcommand{\simplePointRaised}[1]{\bigskip \hrule\textsl{#1} } -\newcommand{\reply}[1]{\textbf{Reply}:\ #1 \smallskip } %\hrule \smallskip} - -\begin{document} - -\author{Dirk Eddelbuettel\\Debian Project \and - Murray Stokely\\Google, Inc \and - Jeroen Ooms\\UCLA} -\title{Submission JSS 1313: \\ Response to Reviewers' Comments} -\maketitle -\thispagestyle{empty} - -Thank you for reviewing our manuscript, and for giving us an opportunity to -rewrite, extend and and tighten both the paper and the underlying package. - -\smallskip -We truly appreciate the comments and suggestions. Below, we have regrouped the sets -of comments, and have provided detailed point-by-point replies. -% -We hope that this satisfies the request for changes necessary to proceed with -the publication of the revised and updated manuscript, along with the revised -and updated package (which was recently resubmitted to CRAN as version 0.4.2). - -\section*{Response to Reviewer \#1} - -\pointRaised{Comment 1}{Overall, I think this is a strong paper. Cross-language communication - is a challenging problem, and good solutions for R are important to - establish R as a well-behaved member of a data analysis pipeline. The - paper is well written, and I recommend that it be accepted subject to - the suggestions below.} -\reply{Thank you. We are providing a point-by-point reply below.} - -\subsubsection*{More big picture, less details} - -\pointRaised{Comment 2}{Overall, I think the paper provides too much detail on - relatively unimportant topics and not enough on the reasoning behind - important design decisions. I think you could comfortably reduce the paper - by 5-10 pages, referring the interested reader to the documentation for - more detail.} -\reply{The paper is now six pages shorter at just 23 pages. - Sections 3 - 8 (all but Section 1 (``Introduction''), Section 2 (``Protocol Buffers''), - and Section 9 (``Conclusion'') have been thoroughly rewritten to address the specific and - general feedback in these reviews.} - -\pointRaised{Comment 3}{I'd recommend shrinking section 3 to ~2 pages, and removing the - subheadings. This section should quickly orient the reader to the - RProtobuf API so they understand the big picture before learning more - details in the subsequent sections. I'd recommend picking one OO style - and sticking to it in this section - two is confusing.} -\reply{We followed this recommendation, reduced section 3 to about - $2\frac{1}{2}$ pages, removed the subheadings and tightened the exposition.} - -\pointRaised{Comment 3}{Section 4 dives into the details without giving a good overview and - motivation. Why use S4 and not RC? How are the objects made mutable? - Why do you provide both generic function and message passing OO - styles? What does \$ do in this context? What the heck is a - pseudo-method? Spend more time on those big issues rather than - describing each class in detail. Reduce class descriptions to a - bulleted list giving a high-level overview, then encourage the reader - to refer to the documentation for further details. Similarly, Tables - 3-5 belong in the documentation, not in a vignette/paper.} -\reply{Done. RProtoBuf was designed and implemented before RC were - available, and this is now noted explicitly in a new footnote. Explanation of how - they are made mutable has been added. Better explanation of the - two styles and '\$' as been added. We are no longer using the - confusing term 'pseudo-method' anywhere. We also moved Tables 3-5 into the - documentation and out of the paper, as suggested.} - -\pointRaised{Comment 4}{Section 7 is weak. I think the important message is that RProtobuf is - being used in practice at large scale for for large data, and is - useful for communicating between R and Python. How can you make that - message stronger while avoiding (for the purposes of this paper) the - relatively unimportant details of the map-reduce setup?} -\reply{Done. Rewritten with more motivation taking into account this feedback.} - -\subsubsection*{R to/from Protobuf translation} - -\pointRaised{Comment 5}{The discussion of R to/from Protobuf could be improved. Table 9 would be - much simpler if instead of Message, you provided a "vectorised" - Messages class (this would also make the interface more consistent and - hence the package easier to use).} -\reply{This is a good observation that only became clear to us after - significant usage of \texttt{RProtoBuf}. Providing a full ``vectorized'' Messages class would require slicing - operators that let you quickly extract a given field from each - element of the message vector in order to be really useful. This - would require significant amounts of C++ code for efficient - manipulation on the order of data.table or other similar large C++ R - packages on CRAN. There is another package called Motobuf by other authors - that takes this approach but in practice (at least for the several hundred - users at Google), the ease-of-use provided by the simple Message interface of RProtoBuf - has won with users. It is still future work to keep the simple - interactive interface of RProtoBuf with the vectorized efficiency of - Motobuf. For now, users typically do their slicing of vectors like - this through a distributed database (NewSQL is the term of the day?) - like Dremel or other system and then just get the response Protocol - Buffers in return to the request.} - -\pointRaised{Comment 6}{Along these lines, I think it would make sense to combine sections 5 - and 6 and discuss translation challenges in both direction - simultaneously. At the minimum, add the equivalent for Table 9 that - shows how important R classes are converted to their protobuf - equivalents.} -\reply{Done. We have updated these sections to make it clearer that the main - distinction is between schema-based datastructures (Section 5) and - schema-less use where a catch-all \texttt{.proto} is used (Section 6). - Neither section is meant to focus on only a single direction of the - conversion, but how conversion works when you have a schema or not. - How important R classes are converted to their protobuf equivalents - isn't super useful as a C++, Java, or Python program is unlikely to - want to read in an R data.frame exactly as it is defined. Much more - likely is an application-specific message format is defined between the - two services, such as the HistogramTools example in the next section. - Much more detail has been added to an interesting part of section 6 -- - which datasets exactly are better served with RProtoBuf than - \texttt{base::serialize} and why?} - -\pointRaised{Comment 7}{You should discuss how missing values are handled for strings and - integers, and why enums are not equivalent to factors. I think you - could make explicit how coercion of factors, dates, times and matrices - occurs, and the implications of this on sharing data structures - between programming languages. For example, how do you share date/time - data between R and python using RProtoBuf?} -\reply{All of these details are application-specific, whereas - RProtoBuf is an infrastructure package. Distributed systems define - their own interfaces, with their own date/time fields, usually as - a double of fractional seconds since the unix epoch for the systems I - have worked on. An example is given for Histograms in the next - section. Factors could be represented as repeated enums in Protocol - Buffers, certainly, if that is how one wanted to define a schema.} - -\pointRaised{Comment 8}{Table 10 is dying to be a plot, and a natural companion would be to - show how long it takes to serialise data frames using both RProtoBuf - and R's native serialisation. Is there a performance penalty to using - protobufs?} -\reply{Done. Table 10 has been replaced with a plot, the outliers are - labeled, and the text now includes some interesting explanation - about the outliers. Page 4 explains that the R implementation of - Protocol Buffers uses reflection to make operations slower but makes - it more convenient for interactive data analysis. None of the - built-in datasets are large enough for performance to really come up - as an issue, and for any serialization method examples could be - found that significantly favor one over another in runtime, so we - don't think there will be benefit to adding anything here. } - -\subsubsection*{RObjectTables magic} - [TRUNCATED] To get the complete diff run: svnlook diff /svnroot/rprotobuf -r 946 From noreply at r-forge.r-project.org Mon Apr 13 21:54:15 2015 From: noreply at r-forge.r-project.org (noreply at r-forge.r-project.org) Date: Mon, 13 Apr 2015 21:54:15 +0200 (CEST) Subject: [Rprotobuf-commits] r947 - in papers/jss: . JSSemails Message-ID: <20150413195415.9F8C8187741@r-forge.r-project.org> Author: edd Date: 2015-04-13 21:54:15 +0200 (Mon, 13 Apr 2015) New Revision: 947 Added: papers/jss/JSSemails/JSS_1313_comments.txt Removed: papers/jss/JSS_1313_comments.txt Log: some file cleanups, part three Deleted: papers/jss/JSS_1313_comments.txt =================================================================== --- papers/jss/JSS_1313_comments.txt 2015-04-13 19:53:43 UTC (rev 946) +++ papers/jss/JSS_1313_comments.txt 2015-04-13 19:54:15 UTC (rev 947) @@ -1,224 +0,0 @@ -This submission is important, but needs some work on both the paper and -the software before it can be accepted. The authors should address the -concerns of the two reviewers (below). - - -Overall, I think this is a strong paper. Cross-language communication -is a challenging problem, and good solutions for R are important to -establish R as a well-behaved member of a data analysis pipeline. The -paper is well written, and I recommend that it be accepted subject to -the suggestions below. - -# More big picture, less details - -Overall, I think the paper provides too much detail on relatively -unimportant topics and not enough on the reasoning behind important -design decisions. I think you could comfortably reduce the paper by -5-10 pages, referring the interested reader to the documentation for -more detail. - -I'd recommend shrinking section 3 to ~2 pages, and removing the -subheadings. This section should quickly orient the reader to the -RProtobuf API so they understand the big picture before learning more -details in the subsequent sections. I'd recommend picking one OO style -and sticking to it in this section - two is confusing. - -Section 4 dives into the details without giving a good overview and -motivation. Why use S4 and not RC? How are the objects made mutable? -Why do you provide both generic function and message passing OO -styles? What does `$` do in this context? What the heck is a -pseudo-method? Spend more time on those big issues rather than -describing each class in detail. Reduce class descriptions to a -bulleted list giving a high-level overview, then encourage the reader -to refer to the documentation for further details. Similarly, Tables -3-5 belong in the documentation, not in a vignette/paper. - -Section 7 is weak. I think the important message is that RProtobuf is -being used in practice at large scale for for large data, and is -useful for communicating between R and Python. How can you make that -message stronger while avoiding (for the purposes of this paper) the -relatively unimportant details of the map-reduce setup? - -# R <-> Protobuf translation - -The discussion of R <-> Protobuf could be improved. Table 9 would be -much simpler if instead of Message, you provided a "vectorised" -Messages class (this would also make the interface more consistent and -hence the package easier to use). - -Along these lines, I think it would make sense to combine sections 5 -and 6 and discuss translation challenges in both direction -simultaneously. At the minimum, add the equivalent for Table 9 that -shows how important R classes are converted to their protobuf -equivalents. - -You should discuss how missing values are handled for strings and -integers, and why enums are not equivalent to factors. I think you -could make explicit how coercion of factors, dates, times and matrices -occurs, and the implications of this on sharing data structures -between programming languages. For example, how do you share date/time -data between R and python using RProtoBuf? - -Table 10 is dying to be a plot, and a natural companion would be to -show how long it takes to serialise data frames using both RProtoBuf -and R's native serialisation. Is there a performance penalty to using -protobufs? - -# RObjectTables magic - -The use of RObjectTables magic makes me uneasy. It doesn't seem like a -good fit for an infrastructure package and it's not clear what -advantages it has over explicitly loading a protobuf definition into -an object. - -Using global state makes understanding code much harder. In Table 1, -it's not obvious where `tutorial.Person` comes from. Is it loaded by -default by RProtobuf? This need some explanation. In Section 7, what -does `readProtoFiles()` do? Why does `RProtobuf` need to be attached -as well as `HistogramTools`? This needs more explanation, and a -comment on the implications of this approach on CRAN packages and -namespaces. - -I'd prefer you eliminate this magic from the magic, but failing that, -you need a good explanation of why. - -# Code comments - -* Using `file.create()` to determine the absolute path seems like a bad -idea. - - -# Minor niggles - -* Don't refer to the message passing style of OO as traditional. - -* In Section 3.4, if messages isn't a vectorised class, the default - print method should use `cat()` to eliminate the confusing `[1]`. - -* The REXP definition would have been better defined using an enum that - matches R's SEXPTYPE "enum". But I guess that ship has sailed. - -* Why does `serialize_pb(CO2, NULL)` fail silently? Shouldn't it at least - warn that the serialization is partial? - - - -??????????????????????????????????????????????????????? -??????????????????????????????????????????????????????? - - - -The paper gives an overview of the RProtoBuf package which implements an -R interface to the Protocol Buffers library for an efficient -serialization of objects. The paper is well written and easy to read. -Introductory code is clear and the package provides objects to play with -immediately without the need to jump through hoops, making it appealing. -The software implementation is executed well. - -There are, however, a few inconsistencies in the implementation and some -issues with specific sections in the paper. In the following both issues -will be addressed sequentially by their occurrence in the paper. - - -p.4 illustrates the use of messages. The class implements list-like -access via [[ and $, but strangely names() return NULL and length() -doesn't correspond to the number of fields leading to startling results like - - > p -[1] "message of type 'tutorial.Person' with 2 fields set" - > length(p) -[1] 2 - > p[[3]] -[1] "" - -The inconsistencies get even more bizarre with descriptors (p.9): - - > tutorial.Person$email -[1] "descriptor for field 'email' of type 'tutorial.Person' " - > tutorial.Person[["email"]] -Error in tutorial.Person[["email"]] : this S4 class is not subsettable - > names(tutorial.Person) -NULL - > length(tutorial.Person) -[1] 1 - -It appears that there is no way to find out the fields of a descriptor -directly (although the low-level object methods seem to be exposed as -$field_count() and $fields() - but that seems extremely cumbersome). -Again, implementing names() and subsetting may help here. - -Another inconsistency concerns the as.list() method which by design -coerces objects to lists (see ?as.list), but the implementation for -EnumDescriptor breaks that contract and returns a vector instead: - - > is.list(as.list(tutorial.Person$PhoneType)) -[1] FALSE - > str(as.list(tutorial.Person$PhoneType)) - Named int [1:3] 0 1 2 - - attr(*, "names")= chr [1:3] "MOBILE" "HOME" "WORK" - -As with the other interfaces, names() returns NULL so it is again quite -difficult to perform even simple operations such as finding out the -values. It may be natural use some of the standard methods like names(), -levels() or similar. As with the previous cases, the lack of [[ support -makes it impossible to map named enum values to codes and vice-versa. - -In general, the package would benefit from one pass of checks to assess -the consistency of the API. Since the authors intend direct interaction -with the objects via basic standard R methods, the classes should behave -consistently. - -Finally, most classes implement coercion to characters, which is not -mentioned and is not quite intuitive for some objects. For example, one -may think that as.character() on a file descriptor returns let's say the -filename, but we get: - - > cat(as.character(tutorial.Person$fileDescriptor())) -syntax = "proto2"; - -package tutorial; - -option java_package = "com.example.tutorial"; -option java_outer_classname = "AddressBookProtos"; -[...] - -It is not necessary clear what java_package has to do with a file -descriptor in R. Depending on the intention here, it may be useful to -explain this feature. - -Other comments: - -p.17: "does not support ... function, language or environment. Such -objects have no native equivalent type in Protocol Buffers, and have -little meaning outside the context or R" -That is certainly false. Native mirror of environments are hash tables - -a very useful type indeed. Language objects are just lists, so there is -no reason to not include them - they can be useful to store expressions -that may not be necessary specific to R. Further on p. 18 your run into -the same problem that could be fixed so easily. - -The examples in sections 7 and 8 are somewhat weak. It does not seem -clear why one would wish to unleash the power of PB just to transfer -breaks and counts for plotting - even a simple ASCII file would do that -just fine. The main point in the example is presumably that there are -code generation methods for Hadoop based on PB IDL such that Hadoop can -be made aware of the data types, thus making a histogram a proper record -that won't be split, can be combined etc. -- yet that is not mentioned -nor a way presented how that can be leveraged in practice. The Python -example code simply uses a static example with constants to simulate the -output of a reducer so it doesn't illustrate the point - the reader is -left confused why something as trivial would require PB while a savvy -reader is not able to replicate the illustrated process. Possibly -explaining the benefits and providing more details on how one would -write such a job would make it much more relevant. - -Section 8 is not very well motivated. It is much easier to use other -formats for HTTP exchange - JSON is probably the most popular, but even -CSV works in simple settings. PB is a much less common standard. The -main advantage of PB is the performance over the alternatives, but HTTP -services are not necessarily known for their high-throughput so why one -would sacrifice interoperability by using PB (they are still more hassle -and require special installations)? It would be useful if the reason -could be made explicit here or a better example chosen. - - Copied: papers/jss/JSSemails/JSS_1313_comments.txt (from rev 943, papers/jss/JSS_1313_comments.txt) =================================================================== --- papers/jss/JSSemails/JSS_1313_comments.txt (rev 0) +++ papers/jss/JSSemails/JSS_1313_comments.txt 2015-04-13 19:54:15 UTC (rev 947) @@ -0,0 +1,224 @@ +This submission is important, but needs some work on both the paper and +the software before it can be accepted. The authors should address the +concerns of the two reviewers (below). + + +Overall, I think this is a strong paper. Cross-language communication +is a challenging problem, and good solutions for R are important to +establish R as a well-behaved member of a data analysis pipeline. The +paper is well written, and I recommend that it be accepted subject to +the suggestions below. + +# More big picture, less details + +Overall, I think the paper provides too much detail on relatively +unimportant topics and not enough on the reasoning behind important +design decisions. I think you could comfortably reduce the paper by +5-10 pages, referring the interested reader to the documentation for +more detail. + +I'd recommend shrinking section 3 to ~2 pages, and removing the +subheadings. This section should quickly orient the reader to the +RProtobuf API so they understand the big picture before learning more +details in the subsequent sections. I'd recommend picking one OO style +and sticking to it in this section - two is confusing. + +Section 4 dives into the details without giving a good overview and +motivation. Why use S4 and not RC? How are the objects made mutable? +Why do you provide both generic function and message passing OO +styles? What does `$` do in this context? What the heck is a +pseudo-method? Spend more time on those big issues rather than +describing each class in detail. Reduce class descriptions to a +bulleted list giving a high-level overview, then encourage the reader +to refer to the documentation for further details. Similarly, Tables +3-5 belong in the documentation, not in a vignette/paper. + +Section 7 is weak. I think the important message is that RProtobuf is +being used in practice at large scale for for large data, and is +useful for communicating between R and Python. How can you make that +message stronger while avoiding (for the purposes of this paper) the +relatively unimportant details of the map-reduce setup? + +# R <-> Protobuf translation + +The discussion of R <-> Protobuf could be improved. Table 9 would be +much simpler if instead of Message, you provided a "vectorised" +Messages class (this would also make the interface more consistent and +hence the package easier to use). + +Along these lines, I think it would make sense to combine sections 5 +and 6 and discuss translation challenges in both direction +simultaneously. At the minimum, add the equivalent for Table 9 that +shows how important R classes are converted to their protobuf +equivalents. + +You should discuss how missing values are handled for strings and +integers, and why enums are not equivalent to factors. I think you +could make explicit how coercion of factors, dates, times and matrices +occurs, and the implications of this on sharing data structures +between programming languages. For example, how do you share date/time +data between R and python using RProtoBuf? + +Table 10 is dying to be a plot, and a natural companion would be to +show how long it takes to serialise data frames using both RProtoBuf +and R's native serialisation. Is there a performance penalty to using +protobufs? + +# RObjectTables magic + +The use of RObjectTables magic makes me uneasy. It doesn't seem like a +good fit for an infrastructure package and it's not clear what +advantages it has over explicitly loading a protobuf definition into +an object. + +Using global state makes understanding code much harder. In Table 1, +it's not obvious where `tutorial.Person` comes from. Is it loaded by +default by RProtobuf? This need some explanation. In Section 7, what +does `readProtoFiles()` do? Why does `RProtobuf` need to be attached +as well as `HistogramTools`? This needs more explanation, and a +comment on the implications of this approach on CRAN packages and +namespaces. + +I'd prefer you eliminate this magic from the magic, but failing that, +you need a good explanation of why. + +# Code comments + +* Using `file.create()` to determine the absolute path seems like a bad +idea. + + +# Minor niggles + +* Don't refer to the message passing style of OO as traditional. + +* In Section 3.4, if messages isn't a vectorised class, the default + print method should use `cat()` to eliminate the confusing `[1]`. + +* The REXP definition would have been better defined using an enum that + matches R's SEXPTYPE "enum". But I guess that ship has sailed. + +* Why does `serialize_pb(CO2, NULL)` fail silently? Shouldn't it at least + warn that the serialization is partial? + + + +??????????????????????????????????????????????????????? +??????????????????????????????????????????????????????? + + + +The paper gives an overview of the RProtoBuf package which implements an +R interface to the Protocol Buffers library for an efficient +serialization of objects. The paper is well written and easy to read. +Introductory code is clear and the package provides objects to play with +immediately without the need to jump through hoops, making it appealing. +The software implementation is executed well. + +There are, however, a few inconsistencies in the implementation and some +issues with specific sections in the paper. In the following both issues +will be addressed sequentially by their occurrence in the paper. + + +p.4 illustrates the use of messages. The class implements list-like +access via [[ and $, but strangely names() return NULL and length() +doesn't correspond to the number of fields leading to startling results like + + > p +[1] "message of type 'tutorial.Person' with 2 fields set" + > length(p) +[1] 2 + > p[[3]] +[1] "" + +The inconsistencies get even more bizarre with descriptors (p.9): + + > tutorial.Person$email +[1] "descriptor for field 'email' of type 'tutorial.Person' " + > tutorial.Person[["email"]] +Error in tutorial.Person[["email"]] : this S4 class is not subsettable + > names(tutorial.Person) +NULL + > length(tutorial.Person) +[1] 1 + +It appears that there is no way to find out the fields of a descriptor +directly (although the low-level object methods seem to be exposed as +$field_count() and $fields() - but that seems extremely cumbersome). +Again, implementing names() and subsetting may help here. + +Another inconsistency concerns the as.list() method which by design +coerces objects to lists (see ?as.list), but the implementation for +EnumDescriptor breaks that contract and returns a vector instead: + + > is.list(as.list(tutorial.Person$PhoneType)) +[1] FALSE + > str(as.list(tutorial.Person$PhoneType)) + Named int [1:3] 0 1 2 + - attr(*, "names")= chr [1:3] "MOBILE" "HOME" "WORK" + +As with the other interfaces, names() returns NULL so it is again quite +difficult to perform even simple operations such as finding out the +values. It may be natural use some of the standard methods like names(), +levels() or similar. As with the previous cases, the lack of [[ support +makes it impossible to map named enum values to codes and vice-versa. + +In general, the package would benefit from one pass of checks to assess +the consistency of the API. Since the authors intend direct interaction +with the objects via basic standard R methods, the classes should behave +consistently. + +Finally, most classes implement coercion to characters, which is not +mentioned and is not quite intuitive for some objects. For example, one +may think that as.character() on a file descriptor returns let's say the +filename, but we get: + + > cat(as.character(tutorial.Person$fileDescriptor())) +syntax = "proto2"; + +package tutorial; + +option java_package = "com.example.tutorial"; +option java_outer_classname = "AddressBookProtos"; +[...] + +It is not necessary clear what java_package has to do with a file +descriptor in R. Depending on the intention here, it may be useful to +explain this feature. + +Other comments: + +p.17: "does not support ... function, language or environment. Such +objects have no native equivalent type in Protocol Buffers, and have +little meaning outside the context or R" +That is certainly false. Native mirror of environments are hash tables - +a very useful type indeed. Language objects are just lists, so there is +no reason to not include them - they can be useful to store expressions +that may not be necessary specific to R. Further on p. 18 your run into +the same problem that could be fixed so easily. + +The examples in sections 7 and 8 are somewhat weak. It does not seem +clear why one would wish to unleash the power of PB just to transfer +breaks and counts for plotting - even a simple ASCII file would do that +just fine. The main point in the example is presumably that there are +code generation methods for Hadoop based on PB IDL such that Hadoop can +be made aware of the data types, thus making a histogram a proper record +that won't be split, can be combined etc. -- yet that is not mentioned +nor a way presented how that can be leveraged in practice. The Python +example code simply uses a static example with constants to simulate the +output of a reducer so it doesn't illustrate the point - the reader is +left confused why something as trivial would require PB while a savvy +reader is not able to replicate the illustrated process. Possibly +explaining the benefits and providing more details on how one would +write such a job would make it much more relevant. + +Section 8 is not very well motivated. It is much easier to use other +formats for HTTP exchange - JSON is probably the most popular, but even +CSV works in simple settings. PB is a much less common standard. The +main advantage of PB is the performance over the alternatives, but HTTP +services are not necessarily known for their high-throughput so why one +would sacrifice interoperability by using PB (they are still more hassle +and require special installations)? It would be useful if the reason +could be made explicit here or a better example chosen. + + From noreply at r-forge.r-project.org Mon Apr 13 21:59:43 2015 From: noreply at r-forge.r-project.org (noreply at r-forge.r-project.org) Date: Mon, 13 Apr 2015 21:59:43 +0200 (CEST) Subject: [Rprotobuf-commits] r948 - papers/jss Message-ID: <20150413195943.91300187849@r-forge.r-project.org> Author: edd Date: 2015-04-13 21:59:43 +0200 (Mon, 13 Apr 2015) New Revision: 948 Modified: papers/jss/Makefile Log: finishing Modified: papers/jss/Makefile =================================================================== --- papers/jss/Makefile 2015-04-13 19:54:15 UTC (rev 947) +++ papers/jss/Makefile 2015-04-13 19:59:43 UTC (rev 948) @@ -14,4 +14,4 @@ R CMD Stangle ${article}.Rnw jssarchive: - (cd .. && zip -r jssarchive.zip jss/) + (cd .. && zip -r jss1313_submission_$$(date "+%Y-%m-%d").zip jss/ -x "jss/JSSstyle/*" -x "jss/JSSemails/*") From noreply at r-forge.r-project.org Tue Apr 14 05:05:38 2015 From: noreply at r-forge.r-project.org (noreply at r-forge.r-project.org) Date: Tue, 14 Apr 2015 05:05:38 +0200 (CEST) Subject: [Rprotobuf-commits] r949 - in papers/jss: . JSSemails Message-ID: <20150414030538.93B3F18798C@r-forge.r-project.org> Author: edd Date: 2015-04-14 05:05:29 +0200 (Tue, 14 Apr 2015) New Revision: 949 Added: papers/jss/JSSemails/response-to-reviewers.pdf Removed: papers/jss/jss1313.R Log: one more for JSSemails/ dir Added: papers/jss/JSSemails/response-to-reviewers.pdf =================================================================== (Binary files differ) Property changes on: papers/jss/JSSemails/response-to-reviewers.pdf ___________________________________________________________________ Added: svn:mime-type + application/octet-stream Deleted: papers/jss/jss1313.R =================================================================== --- papers/jss/jss1313.R 2015-04-13 19:59:43 UTC (rev 948) +++ papers/jss/jss1313.R 2015-04-14 03:05:29 UTC (rev 949) @@ -1,297 +0,0 @@ -### R code from vignette source '/home/edd/svn/rprotobuf/papers/jss/jss1313.Rnw' - -################################################### -### code chunk number 1: jss1313.Rnw:130-136 -################################################### -## cf http://www.jstatsoft.org/style#q12 -options(prompt = "R> ", - continue = "+ ", - width = 70, - useFancyQuotes = FALSE, - digits = 4) - - -################################################### -### code chunk number 2: jss1313.Rnw:318-326 -################################################### -library("RProtoBuf") -p <- new(tutorial.Person, id=1, - name="Dirk") -p$name -p$name <- "Murray" -cat(as.character(p)) -serialize(p, NULL) -class(p) - - -################################################### -### code chunk number 3: jss1313.Rnw:421-422 -################################################### -p <- new(tutorial.Person, name = "Murray", id = 1) - - -################################################### -### code chunk number 4: jss1313.Rnw:431-434 -################################################### -p$name -p$id -p$email <- "murray at stokely.org" - - -################################################### -### code chunk number 5: jss1313.Rnw:442-445 -################################################### -p[["name"]] <- "Murray Stokely" -p[[ 2 ]] <- 3 -p[["email"]] - - -################################################### -### code chunk number 6: jss1313.Rnw:461-462 -################################################### -p - - -################################################### -### code chunk number 7: jss1313.Rnw:469-470 -################################################### -writeLines(as.character(p)) - - -################################################### -### code chunk number 8: jss1313.Rnw:483-484 -################################################### -serialize(p, NULL) - - -################################################### -### code chunk number 9: jss1313.Rnw:489-492 -################################################### -tf1 <- tempfile() -serialize(p, tf1) -readBin(tf1, raw(0), 500) - - -################################################### -### code chunk number 10: jss1313.Rnw:538-540 -################################################### -msg <- read(tutorial.Person, tf1) -writeLines(as.character(msg)) - - -################################################### -### code chunk number 11: jss1313.Rnw:660-661 -################################################### -new(tutorial.Person) - - -################################################### -### code chunk number 12: jss1313.Rnw:685-690 -################################################### -tutorial.Person$email -tutorial.Person$email$is_required() -tutorial.Person$email$type() -tutorial.Person$email$as.character() -class(tutorial.Person$email) - - -################################################### -### code chunk number 13: jss1313.Rnw:702-709 -################################################### -tutorial.Person$PhoneType -tutorial.Person$PhoneType$WORK -class(tutorial.Person$PhoneType) -tutorial.Person$PhoneType$value(1) -tutorial.Person$PhoneType$value(name="HOME") -tutorial.Person$PhoneType$value(number=1) -class(tutorial.Person$PhoneType$value(1)) - - -################################################### -### code chunk number 14: jss1313.Rnw:805-808 -################################################### -if (!exists("JSSPaper.Example1", "RProtoBuf:DescriptorPool")) { - readProtoFiles(file="int64.proto") -} - - -################################################### -### code chunk number 15: jss1313.Rnw:830-834 -################################################### -as.integer(2^31-1) -as.integer(2^31 - 1) + as.integer(1) -2^31 -class(2^31) - - -################################################### -### code chunk number 16: jss1313.Rnw:846-847 -################################################### -2^53 == (2^53 + 1) - - -################################################### -### code chunk number 17: jss1313.Rnw:898-900 -################################################### -msg <- serialize_pb(iris, NULL) -identical(iris, unserialize_pb(msg)) - - -################################################### -### code chunk number 18: jss1313.Rnw:928-931 -################################################### -datasets <- as.data.frame(data(package="datasets")$results) -datasets$name <- sub("\\s+.*$", "", datasets$Item) -n <- nrow(datasets) - - -################################################### -### code chunk number 19: jss1313.Rnw:949-992 -################################################### -datasets$object.size <- unname(sapply(datasets$name, function(x) object.size(eval(as.name(x))))) - -datasets$R.serialize.size <- unname(sapply(datasets$name, function(x) length(serialize(eval(as.name(x)), NULL)))) - -datasets$R.serialize.size <- unname(sapply(datasets$name, function(x) length(serialize(eval(as.name(x)), NULL)))) - -datasets$R.serialize.size.gz <- unname(sapply(datasets$name, function(x) length(memCompress(serialize(eval(as.name(x)), NULL), "gzip")))) - -datasets$RProtoBuf.serialize.size <- unname(sapply(datasets$name, function(x) length(serialize_pb(eval(as.name(x)), NULL)))) - -datasets$RProtoBuf.serialize.size.gz <- unname(sapply(datasets$name, function(x) length(memCompress(serialize_pb(eval(as.name(x)), NULL), "gzip")))) - -clean.df <- data.frame(dataset=datasets$name, - object.size=datasets$object.size, - "serialized"=datasets$R.serialize.size, - "gzipped serialized"=datasets$R.serialize.size.gz, - "RProtoBuf"=datasets$RProtoBuf.serialize.size, - "gzipped RProtoBuf"=datasets$RProtoBuf.serialize.size.gz, - "ratio.serialized" = datasets$R.serialize.size / datasets$object.size, - "ratio.rprotobuf" = datasets$RProtoBuf.serialize.size / datasets$object.size, - "ratio.serialized.gz" = datasets$R.serialize.size.gz / datasets$object.size, - "ratio.rprotobuf.gz" = datasets$RProtoBuf.serialize.size.gz / datasets$object.size, - "savings.serialized" = 1-(datasets$R.serialize.size / datasets$object.size), - "savings.rprotobuf" = 1-(datasets$RProtoBuf.serialize.size / datasets$object.size), - "savings.serialized.gz" = 1-(datasets$R.serialize.size.gz / datasets$object.size), - "savings.rprotobuf.gz" = 1-(datasets$RProtoBuf.serialize.size.gz / datasets$object.size), - check.names=FALSE) - -all.df<-data.frame(dataset="TOTAL", object.size=sum(datasets$object.size), - "serialized"=sum(datasets$R.serialize.size), - "gzipped serialized"=sum(datasets$R.serialize.size.gz), - "RProtoBuf"=sum(datasets$RProtoBuf.serialize.size), - "gzipped RProtoBuf"=sum(datasets$RProtoBuf.serialize.size.gz), - "ratio.serialized" = sum(datasets$R.serialize.size) / sum(datasets$object.size), - "ratio.rprotobuf" = sum(datasets$RProtoBuf.serialize.size) / sum(datasets$object.size), - "ratio.serialized.gz" = sum(datasets$R.serialize.size.gz) / sum(datasets$object.size), - "ratio.rprotobuf.gz" = sum(datasets$RProtoBuf.serialize.size.gz) / sum(datasets$object.size), - "savings.serialized" = 1-(sum(datasets$R.serialize.size) / sum(datasets$object.size)), - "savings.rprotobuf" = 1-(sum(datasets$RProtoBuf.serialize.size) / sum(datasets$object.size)), - "savings.serialized.gz" = 1-(sum(datasets$R.serialize.size.gz) / sum(datasets$object.size)), - "savings.rprotobuf.gz" = 1-(sum(datasets$RProtoBuf.serialize.size.gz) / sum(datasets$object.size)), - check.names=FALSE) -clean.df<-rbind(clean.df, all.df) - - -################################################### -### code chunk number 20: SER -################################################### -old.mar<-par("mar") -new.mar<-old.mar -new.mar[3]<-0 -new.mar[4]<-0 -my.cex<-1.3 -par("mar"=new.mar) -plot(clean.df$savings.serialized, clean.df$savings.rprotobuf, pch=1, col="red", las=1, xlab="Serialization Space Savings", ylab="Protocol Buffer Space Savings", xlim=c(0,1),ylim=c(0,1),cex.lab=my.cex, cex.axis=my.cex) -points(clean.df$savings.serialized.gz, clean.df$savings.rprotobuf.gz,pch=2, col="blue") -# grey dotted diagonal -abline(a=0,b=1, col="grey",lty=2,lwd=3) - -# find point furthest off the X axis. -clean.df$savings.diff <- clean.df$savings.serialized - clean.df$savings.rprotobuf -clean.df$savings.diff.gz <- clean.df$savings.serialized.gz - clean.df$savings.rprotobuf.gz - -# The one to label. -tmp.df <- clean.df[which(clean.df$savings.diff == min(clean.df$savings.diff)),] -# This minimum means most to the left of our line, so pos=2 is label to the left -text(tmp.df$savings.serialized, tmp.df$savings.rprotobuf, labels=tmp.df$dataset, pos=2, cex=my.cex) - -# Some gziped version -# text(tmp.df$savings.serialized.gz, tmp.df$savings.rprotobuf.gz, labels=tmp.df$dataset, pos=2, cex=my.cex) - -# Second one is also an outlier -tmp.df <- clean.df[which(clean.df$savings.diff == sort(clean.df$savings.diff)[2]),] -# This minimum means most to the left of our line, so pos=2 is label to the left -text(tmp.df$savings.serialized, tmp.df$savings.rprotobuf, labels=tmp.df$dataset, pos=2, cex=my.cex) -#text(tmp.df$savings.serialized.gz, tmp.df$savings.rprotobuf.gz, labels=tmp.df$dataset, pos=my.cex) - - -tmp.df <- clean.df[which(clean.df$savings.diff == max(clean.df$savings.diff)),] -# This minimum means most to the right of the diagonal, so pos=4 is label to the right -# Only show the gziped one. -#text(tmp.df$savings.serialized, tmp.df$savings.rprotobuf, labels=tmp.df$dataset, pos=4, cex=my.cex) -text(tmp.df$savings.serialized.gz, tmp.df$savings.rprotobuf.gz, labels=tmp.df$dataset, pos=4, cex=my.cex) - -#outlier.dfs <- clean.df[c(which(clean.df$savings.diff == min(clean.df$savings.diff)), - -legend("topleft", c("Raw", "Gzip Compressed"), pch=1:2, col=c("red", "blue"), cex=my.cex) - -interesting.df <- clean.df[unique(c(which(clean.df$savings.diff == min(clean.df$savings.diff)), - which(clean.df$savings.diff == max(clean.df$savings.diff)), - which(clean.df$savings.diff.gz == max(clean.df$savings.diff.gz)), - which(clean.df$dataset == "TOTAL"))),c("dataset", "object.size", "serialized", "gzipped serialized", "RProtoBuf", "gzipped RProtoBuf", "savings.serialized", "savings.serialized.gz", "savings.rprotobuf", "savings.rprotobuf.gz")] -# Print without .00 in xtable -interesting.df$object.size <- as.integer(interesting.df$object.size) -par("mar"=old.mar) - - -################################################### -### code chunk number 21: jss1313.Rnw:1231-1235 -################################################### -require(HistogramTools) -readProtoFiles(package="HistogramTools") -hist <- HistogramTools.HistogramState$read("hist.pb") -plot(as.histogram(hist), main="") - - -################################################### -### code chunk number 22: jss1313.Rnw:1323-1330 (eval = FALSE) -################################################### -## library("RProtoBuf") -## library("httr") -## -## req <- GET('https://demo.ocpu.io/MASS/data/Animals/pb') -## output <- unserialize_pb(req$content) -## -## identical(output, MASS::Animals) - - -################################################### -### code chunk number 23: jss1313.Rnw:1380-1396 (eval = FALSE) -################################################### -## library("httr") -## library("RProtoBuf") -## -## args <- list(n=42, mean=100) -## payload <- serialize_pb(args, NULL) -## -## req <- POST ( -## url = "https://demo.ocpu.io/stats/R/rnorm/pb", -## body = payload, -## add_headers ( -## "Content-Type" = "application/x-protobuf" -## ) -## ) -## -## output <- unserialize_pb(req$content) -## print(output) - - -################################################### -### code chunk number 24: jss1313.Rnw:1400-1403 (eval = FALSE) -################################################### -## fnargs <- unserialize_pb(inputmsg) -## val <- do.call(stats::rnorm, fnargs) -## outputmsg <- serialize_pb(val) - - From noreply at r-forge.r-project.org Tue Apr 14 05:09:21 2015 From: noreply at r-forge.r-project.org (noreply at r-forge.r-project.org) Date: Tue, 14 Apr 2015 05:09:21 +0200 (CEST) Subject: [Rprotobuf-commits] r950 - papers/jss/JSSemails Message-ID: <20150414030921.67A5318500B@r-forge.r-project.org> Author: edd Date: 2015-04-14 05:09:20 +0200 (Tue, 14 Apr 2015) New Revision: 950 Added: papers/jss/JSSemails/jss-reply-1.txt Removed: papers/jss/JSSemails/response-to-reviewers.pdf Log: and one more Added: papers/jss/JSSemails/jss-reply-1.txt =================================================================== --- papers/jss/JSSemails/jss-reply-1.txt (rev 0) +++ papers/jss/JSSemails/jss-reply-1.txt 2015-04-14 03:09:20 UTC (rev 950) @@ -0,0 +1,73 @@ +From: Josh EmBree +To: Dirk Eddelbuettel +Subject: JSS 1313 has finished pre-screening +Date: Tue, 11 Mar 2014 11:23:54 -0700 + +Dear author + +Your submission + + JSS 1313 + +has just finished the pre-screening stage. + +As it does not meet all submission requirements of JSS, see + http://www.jstatsoft.org/instructions + http://www.jstatsoft.org/style +we currently cannot process it further. + +In order to continue in the process there are a few changes that need to be +made. Attached in this email is a comments file where you can find all the +necessary changes. + +Thank you for considering JSS and contributing to free statistical software. + +Best regards, + +Jan de Leeuw +Bettina Gr?n +Achim Zeileis + +---------------------------------------------------------------------- +JSS 1313: Eddelbuettel, Stokely, Ooms + +RProtoBuf: Efficient Cross-Language Data Serialization in R + + +For further instruction on JSS style requirements please see the JSS style +manual(in particular section 2.1 Style Checklist) at +http://www.jstatsoft.org/downloads/JSSstyle.zip + +Please see FAQ at: http://www.jstatsoft.org/style +And for further references please see RECENT JSS papers for detailed +documentation and examples. + +From the editorial team: + +o Not all software mentioned is actually cited. Please correct this. + +o Also the replication materials could be +made more accessible. + + +Manuscript: + +o \section, \subsection, etc. in sentence style +o Annotations of figures/tables (including captions) should be in sentence style (*include period at end of caption). +o Do not use additional formatting for specific words unless explicitly required by the JSS style guide, e.g., +o All table row/column headers should also be in sentence style. + +o For R-related manuscripts: The first argument of data() and library() should always be quoted, e.g., library("foo"). +o The code presented in the manuscript should not contain comments within the verbatim code. Instead the comments should be made in the normal LaTeX text. +o In all cases, code input/output must fit within the normal text width of the manuscript. Thus, code input should have appropriate line breaks and code output should preferably be generated with a suitable width (or otherwise edited). + +o As a reminder, please make sure that: + - \proglang, \pkg and \code have been used for highlighting throughout +the paper (including titles and references), except where explicitly escaped. + +References: +o Springer-Verlag (not: Springer) +o Please make sure that all Software packages are \cite{}'d properly. +o All references should be in title style. See FAQ . + +---------------------------------------------------------------------- Deleted: papers/jss/JSSemails/response-to-reviewers.pdf =================================================================== (Binary files differ) From noreply at r-forge.r-project.org Tue Apr 14 05:15:20 2015 From: noreply at r-forge.r-project.org (noreply at r-forge.r-project.org) Date: Tue, 14 Apr 2015 05:15:20 +0200 (CEST) Subject: [Rprotobuf-commits] r951 - papers Message-ID: <20150414031520.47F521878E1@r-forge.r-project.org> Author: edd Date: 2015-04-14 05:15:19 +0200 (Tue, 14 Apr 2015) New Revision: 951 Added: papers/jss1313_submission_2015-04-13.zip Log: final (?) submission Added: papers/jss1313_submission_2015-04-13.zip =================================================================== (Binary files differ) Property changes on: papers/jss1313_submission_2015-04-13.zip ___________________________________________________________________ Added: svn:mime-type + application/octet-stream