[Rcpp-commits] r3680 - in pkg/RcppCNPy: . inst man vignettes

noreply at r-forge.r-project.org noreply at r-forge.r-project.org
Sat Jul 7 19:02:45 CEST 2012


Author: edd
Date: 2012-07-07 19:02:45 +0200 (Sat, 07 Jul 2012)
New Revision: 3680

Modified:
   pkg/RcppCNPy/ChangeLog
   pkg/RcppCNPy/DESCRIPTION
   pkg/RcppCNPy/inst/NEWS.Rd
   pkg/RcppCNPy/man/RcppCNPy-package.Rd
   pkg/RcppCNPy/vignettes/RcppCNPy-intro.Rnw
   pkg/RcppCNPy/vignettes/RcppCNPy-intro.pdf
Log:
Release 0.1.0


Modified: pkg/RcppCNPy/ChangeLog
===================================================================
--- pkg/RcppCNPy/ChangeLog	2012-07-07 16:57:50 UTC (rev 3679)
+++ pkg/RcppCNPy/ChangeLog	2012-07-07 17:02:45 UTC (rev 3680)
@@ -1,3 +1,9 @@
+2012-07-07  Dirk Eddelbuettel  <edd at debian.org>
+
+	* vignettes/RcppCNPy-intro.Rnw: Added vignette documentation
+
+	* demo/timings.R: Added simple timing benchmark demo
+
 2012-07-06  Dirk Eddelbuettel  <edd at debian.org>
 
 	* src/cnpy.h: Include cstdint for int64_t if C++11 has been enabled

Modified: pkg/RcppCNPy/DESCRIPTION
===================================================================
--- pkg/RcppCNPy/DESCRIPTION	2012-07-07 16:57:50 UTC (rev 3679)
+++ pkg/RcppCNPy/DESCRIPTION	2012-07-07 17:02:45 UTC (rev 3680)
@@ -1,14 +1,15 @@
 Package: RcppCNPy
 Type: Package
 Title: Rcpp bindings for NumPy files
-Version: 0.0.2
+Version: 0.1.0
 Date: $Date$
 Author: Dirk Eddelbuettel
 Maintainer: Dirk Eddelbuettel <edd at debian.org>
 Description: This package provides access to the cnpy library by Carl Rogers
  which provides read and write facilities for files created with (or for) the
- NumPY extension for Python.  Vectors and matrices of either numeric or
- integer types can be read or written. Compressed files can be read as well.
+ NumPY extension for Python.  Vectors and matrices of numeric types can be
+ read or written; compressed files can be read as well. Support for integer
+ files is available if the package (and Rcpp) are compiled with -std=c++11.
 License: GPL (>= 2)
 LazyLoad: yes
 Depends: methods, Rcpp (>= 0.9.13)

Modified: pkg/RcppCNPy/inst/NEWS.Rd
===================================================================
--- pkg/RcppCNPy/inst/NEWS.Rd	2012-07-07 16:57:50 UTC (rev 3679)
+++ pkg/RcppCNPy/inst/NEWS.Rd	2012-07-07 17:02:45 UTC (rev 3680)
@@ -2,7 +2,7 @@
 \title{News for Package \pkg{RcppCNPy}}
 \newcommand{\cpkg}{\href{http://CRAN.R-project.org/package=#1}{\pkg{#1}}}
 
-\section{Changes in version 0.0.2 (2012-07-xx)}{
+\section{Changes in version 0.1.0 (2012-07-07)}{
   \itemize{
     \item Added automatic use of transpose to automagically account for
     Fortran-vs-C major storage defaults between Python and R.
@@ -12,6 +12,7 @@
     \item Added support for reading gzip'ed files ending in ".npy.gz"
     \item Added regression tests in directory \code{tests/}
     \item Added a vignette describing the package}
+    \item Added a timing benchmark in demo/timings.R}
   }
 }
 \section{Changes in version 0.0.1 (2012-07-04)}{

Modified: pkg/RcppCNPy/man/RcppCNPy-package.Rd
===================================================================
--- pkg/RcppCNPy/man/RcppCNPy-package.Rd	2012-07-07 16:57:50 UTC (rev 3679)
+++ pkg/RcppCNPy/man/RcppCNPy-package.Rd	2012-07-07 17:02:45 UTC (rev 3680)
@@ -12,8 +12,10 @@
   which provides read and write facilities for files created with (or for) the
   NumPy extension for Python.
 
-  Support is currently still pretty limited to reading and writing of
-  either vectors or matrices of either numeric or integer type.
+  Support is currently limited to reading and writing of either vectors
+  or matrices of numeric types. Integer support can be added if the
+  package, as well \pkg{Rcpp} are recompiled using the \code{-std=c++11}
+  flag.
 
   Files with \code{gzip} compression can be transparently read as well.
 }
@@ -33,27 +35,16 @@
   \tabular{ll}{
     Package: \tab RcppCNPy\cr
     Type: \tab Package\cr
-    Version: \tab 0.0.1\cr
-    Date: \tab 2012-07-04\cr
-    License: \tab What license is it under?\cr
-    LazyLoad: \tab yes\cr
+    Version: \tab 0.1.0\cr
+    Date: \tab 2012-07-07\cr
+    License: \tab GPL (>= 2)\cr
   }
 
-  The package uses Rcpp modules to provide R bindings \code{npyLoadNM()}
-  and \code{npyLoadIM()} which wrap the \code{npy_load()}
-  function. Currently, only two-dimensional matrices are suppported but
-  this can be extended easily to vectors.
-
-  The following minor changes were made to \code{cnpy}:
-  \itemize{
-    \item the \code{printf(...); abort()} combination was replaced in
-    three instances with \code{REprintf(...)} per CRAN Policy guidelines.
-    \item \code{long long} was commented out in two places (which we can revert once
-    CRAN switches to a new compiler and c++11 becomes standard) and one
-    \code{unsigned long long} was replaced by \code{unsigned long}.
-    \item several unused variables were commented out.
-  }
-  
+  The package uses Rcpp modules to provide R bindings \code{npyLoad()}
+  and \code{npySave()} which wrap the \code{npy_load()} and
+  \code{npy_save()} functions. Currently, only one- and two-dimensional
+  vectors and matrices are suppported; higher-dimensional arrays could
+  be added.
 }
 \author{
   Dirk Eddelbuettel provide the binding to R (using the Rcpp package).
@@ -73,5 +64,5 @@
   \code{\link[Rcpp:Rcpp-package]{Rcpp}} 
 }
 \examples{
-  ## TODO
+  ## TODO, but see demo()
 }

Modified: pkg/RcppCNPy/vignettes/RcppCNPy-intro.Rnw
===================================================================
--- pkg/RcppCNPy/vignettes/RcppCNPy-intro.Rnw	2012-07-07 16:57:50 UTC (rev 3679)
+++ pkg/RcppCNPy/vignettes/RcppCNPy-intro.Rnw	2012-07-07 17:02:45 UTC (rev 3680)
@@ -16,7 +16,7 @@
 %\usepackage[T1]{fontenc}
 
 
-\usepackage{color,alltt,url}
+\usepackage{color,alltt,url,booktabs}
 \usepackage[authoryear,round,longnamesfirst]{natbib}
 \usepackage[colorlinks]{hyperref}
 \definecolor{link}{rgb}{0,0,0.3}	%% next few lines courtesy of RJournal.sty
@@ -276,11 +276,8 @@
   \hlstd{}\hlopt{{[}}\hlstd{}\hlnum{1}\hlstd{}\hlopt{,{]}}\hlstd{\ \ }\hlopt{}\hlstd{}\hlnum{0.0}\hlstd{\ \ }\hlnum{1.1}\hlstd{\ \ }\hlnum{2.2}\hlstd{\ \ }\hlnum{3.3}\hspace*{\fill}\\
   \hlstd{}\hlopt{{[}}\hlstd{}\hlnum{2}\hlstd{}\hlopt{,{]}}\hlstd{\ \ }\hlopt{}\hlstd{}\hlnum{4.4}\hlstd{\ \ }\hlnum{5.5}\hlstd{\ \ }\hlnum{6.6}\hlstd{\ \ }\hlnum{7.7}\hspace*{\fill}\\
   \hlstd{}\hlopt{{[}}\hlstd{}\hlnum{3}\hlstd{}\hlopt{,{]}}\hlstd{\ \ }\hlopt{}\hlstd{}\hlnum{8.8}\hlstd{\ \ }\hlnum{9.9\ 11.0\ 12.1}\hspace*{\fill}\\
-  \hlstd{R}\hlopt{$>$\ }\hlstd{}\hspace*{\fill}\\
-  \mbox{}
   \normalfont
   \normalsize
-
 \end{quote}
 
 Support for compressed file is currently limited to reading, but could be
@@ -327,7 +324,6 @@
   \hlstd{R}\hlopt{$>$\ }\hlstd{v}\hspace*{\fill}\\
   \hlopt{{[}}\hlstd{}\hlnum{1}\hlstd{}\hlopt{{]}\ }\hlstd{}\hlnum{10\ 11\ 12}\hspace*{\fill}\\
   \hlstd{R}\hlopt{$>$\ }\hlstd{}\hlkwd{npySave}\hlstd{}\hlopt{(}\hlstd{}\hlstr{"simplevec.npy"}\hlstd{}\hlopt{,\ }\hlstd{v}\hlopt{)}\hspace*{\fill}\\
-  \hlstd{R}\hlopt{$>$\ }\hlstd{}\hspace*{\fill}\\
   \mbox{}
   \normalfont
   \normalsize
@@ -349,6 +345,7 @@
 
 \subsection{Data reading in \Python}
 
+Reading the data back in \Python is straightforward too:
 \begin{quote}
   \small
 
@@ -363,35 +360,85 @@
   \hlstd{}\hlopt{$>$$>$$>$\ }\hlstd{v\ }\hlopt{=\ }\hlstd{np}\hlopt{.}\hlstd{}\hlkwd{load}\hlstd{}\hlopt{(}\hlstd{}\hlstr{"simpleve.npy"}\hlstd{}\hlopt{)}\hspace*{\fill}\\
   \hlstd{}\hlopt{$>$$>$$>$\ }\hlstd{v}\hspace*{\fill}\\
   \hlkwd{array}\hlstd{}\hlopt{({[}\ }\hlstd{}\hlnum{10}\hlstd{}\hlopt{.,}\hlstd{\ \ }\hlopt{}\hlstd{}\hlnum{11}\hlstd{}\hlopt{.,}\hlstd{\ \ }\hlopt{}\hlstd{}\hlnum{12}\hlstd{}\hlopt{.{]})}\hspace*{\fill}\\
-  \hlstd{}\hlopt{$>$$>$$>$\ }\hlstd{}\hspace*{\fill}\\
-  \mbox{}
   \normalfont
   \normalsize
 \end{quote}
 
+\section{Performance}
+
+The \R script \code{timing} in the \code{demo/} directory of package
+\pkg{RcppCNPy} provides a sinple benchmark.  Given two values $n$ and $k$, a
+matrix of size $n \times k$ is created with $n$ rows and $k$ columns. It is
+written to temporary files in
+% \begin{enumerate}
+% \item ascii format using \code{write.table()};
+% \item \code{NumPy} format using \code{npySave()}; and
+% \item \code{NumPy} format using \code{npySave()} followed by a call to \code{gzip}.
+% \end{enumerate}
+i) ascii format using \code{write.table()};
+ii) \code{NumPy} format using \code{npySave()}; and
+iii) \code{NumPy} format using \code{npySave()} followed by a call to \code{gzip}.
+
+Table~\ref{tab:benchmark} shows some timing comparisons for a matrix with
+five million elements.  Reading the \code{npy} is clearly fastest as it
+required only parsing of the header, followed by a single large binary read
+(and the tranpose required to translate the representation used by \Rns). The
+compressed file requires only one-fourth of the disk space, but takes
+approximately 2.5 times as long to read as the binary stream has be
+transformed.  Lastly, the default ascii reading mode is clearly by far the
+slowest.
+
+\begin{table}[bt]
+  \begin{center}
+    \begin{small}
+      \begin{tabular}{rrr}
+        \toprule
+        {\bf Access method \phantom{X}} & {\bf Time in sec.} & {\bf Relative to best} \\
+        \cmidrule(r){1-3}
+     \code{npyLoad(pyfile)}   &    1.95 &  1.00 \\
+   \code{npyLoad(pygzfile)}   &    4.92 &  2.53 \\
+ \code{read.table(}txtfile)   &  128.85 & 66.24 \\
+        \bottomrule
+      \end{tabular}
+    \end{small}
+    \caption{Performance comparison of data reads using a matrix of size
+      $10^5 \times 50$. File size are 39.7mb for ascii, 40.0mb for npy and
+      10.8mb for npy.gz. Ten replications were performaned, and total times
+      are shown.}
+    \label{tab:benchmark}
+  \end{center}
+\end{table}
+
+
+
 \section{Limitations}
 
 \subsection{Integer support}
 
-Support for integer data types is available, but conditional on use of the
-\code{-std=c++11} compiler extension. Only the newer standard supports the
-\code{long long int} type needed to represent \code{int64} data on a 32-bit
-OS.  So until \R switches to allowing \code{-std=c++11} on CRAN packages,
-users will need to rebuild both \pkg{Rcpp} and \pkg{RcppCNPy} with the switch
-enabled.
+Support for integer data types is conditional on use of the \code{-std=c++11}
+compiler extension. Only the newer standard supports the \code{long long int}
+type needed to represent \code{int64} data on a 32-bit OS.  So until \R
+switches to allowing \code{-std=c++11} on CRAN packages, users will need to
+rebuild both \pkg{Rcpp} and \pkg{RcppCNPy} with the switch enabled. As shown
+in the previous examples, integers also transparently convert to float types.
 
-As shown in the previous examples, integers also transparently convert to
-float types.
-
 \subsection{Higher-dimensional arrays}
 
 \pkg{Rcpp} supports three-dimensional arrays, this could be support in
 \pkg{RcppCNPy} as well.
 
+\subsection{\code{npz} files}
+
+The \pkg{cnpy} library supports reading and writing of sets of arrays; this
+feature could also be exported.
+
 \section{Summary}
 
 The \pkg{RcppCNPy} package provides simple reading and writing of \pkg{NumPy}
-files, using the \pkg{cnpy} library.  Reading of compressed files is also
-supported as an extension.
+files, using the \pkg{cnpy} library.
 
+Reading of compressed files is also supported as an extension. This offers
+users a balance between more compact storage at the prices of slightly longer
+read times.
+
 \end{document}

Modified: pkg/RcppCNPy/vignettes/RcppCNPy-intro.pdf
===================================================================
(Binary files differ)



More information about the Rcpp-commits mailing list