[Rcpp-commits] r3680 - in pkg/RcppCNPy: . inst man vignettes
noreply at r-forge.r-project.org
noreply at r-forge.r-project.org
Sat Jul 7 19:02:45 CEST 2012
Author: edd
Date: 2012-07-07 19:02:45 +0200 (Sat, 07 Jul 2012)
New Revision: 3680
Modified:
pkg/RcppCNPy/ChangeLog
pkg/RcppCNPy/DESCRIPTION
pkg/RcppCNPy/inst/NEWS.Rd
pkg/RcppCNPy/man/RcppCNPy-package.Rd
pkg/RcppCNPy/vignettes/RcppCNPy-intro.Rnw
pkg/RcppCNPy/vignettes/RcppCNPy-intro.pdf
Log:
Release 0.1.0
Modified: pkg/RcppCNPy/ChangeLog
===================================================================
--- pkg/RcppCNPy/ChangeLog 2012-07-07 16:57:50 UTC (rev 3679)
+++ pkg/RcppCNPy/ChangeLog 2012-07-07 17:02:45 UTC (rev 3680)
@@ -1,3 +1,9 @@
+2012-07-07 Dirk Eddelbuettel <edd at debian.org>
+
+ * vignettes/RcppCNPy-intro.Rnw: Added vignette documentation
+
+ * demo/timings.R: Added simple timing benchmark demo
+
2012-07-06 Dirk Eddelbuettel <edd at debian.org>
* src/cnpy.h: Include cstdint for int64_t if C++11 has been enabled
Modified: pkg/RcppCNPy/DESCRIPTION
===================================================================
--- pkg/RcppCNPy/DESCRIPTION 2012-07-07 16:57:50 UTC (rev 3679)
+++ pkg/RcppCNPy/DESCRIPTION 2012-07-07 17:02:45 UTC (rev 3680)
@@ -1,14 +1,15 @@
Package: RcppCNPy
Type: Package
Title: Rcpp bindings for NumPy files
-Version: 0.0.2
+Version: 0.1.0
Date: $Date$
Author: Dirk Eddelbuettel
Maintainer: Dirk Eddelbuettel <edd at debian.org>
Description: This package provides access to the cnpy library by Carl Rogers
which provides read and write facilities for files created with (or for) the
- NumPY extension for Python. Vectors and matrices of either numeric or
- integer types can be read or written. Compressed files can be read as well.
+ NumPY extension for Python. Vectors and matrices of numeric types can be
+ read or written; compressed files can be read as well. Support for integer
+ files is available if the package (and Rcpp) are compiled with -std=c++11.
License: GPL (>= 2)
LazyLoad: yes
Depends: methods, Rcpp (>= 0.9.13)
Modified: pkg/RcppCNPy/inst/NEWS.Rd
===================================================================
--- pkg/RcppCNPy/inst/NEWS.Rd 2012-07-07 16:57:50 UTC (rev 3679)
+++ pkg/RcppCNPy/inst/NEWS.Rd 2012-07-07 17:02:45 UTC (rev 3680)
@@ -2,7 +2,7 @@
\title{News for Package \pkg{RcppCNPy}}
\newcommand{\cpkg}{\href{http://CRAN.R-project.org/package=#1}{\pkg{#1}}}
-\section{Changes in version 0.0.2 (2012-07-xx)}{
+\section{Changes in version 0.1.0 (2012-07-07)}{
\itemize{
\item Added automatic use of transpose to automagically account for
Fortran-vs-C major storage defaults between Python and R.
@@ -12,6 +12,7 @@
\item Added support for reading gzip'ed files ending in ".npy.gz"
\item Added regression tests in directory \code{tests/}
\item Added a vignette describing the package}
+ \item Added a timing benchmark in demo/timings.R}
}
}
\section{Changes in version 0.0.1 (2012-07-04)}{
Modified: pkg/RcppCNPy/man/RcppCNPy-package.Rd
===================================================================
--- pkg/RcppCNPy/man/RcppCNPy-package.Rd 2012-07-07 16:57:50 UTC (rev 3679)
+++ pkg/RcppCNPy/man/RcppCNPy-package.Rd 2012-07-07 17:02:45 UTC (rev 3680)
@@ -12,8 +12,10 @@
which provides read and write facilities for files created with (or for) the
NumPy extension for Python.
- Support is currently still pretty limited to reading and writing of
- either vectors or matrices of either numeric or integer type.
+ Support is currently limited to reading and writing of either vectors
+ or matrices of numeric types. Integer support can be added if the
+ package, as well \pkg{Rcpp} are recompiled using the \code{-std=c++11}
+ flag.
Files with \code{gzip} compression can be transparently read as well.
}
@@ -33,27 +35,16 @@
\tabular{ll}{
Package: \tab RcppCNPy\cr
Type: \tab Package\cr
- Version: \tab 0.0.1\cr
- Date: \tab 2012-07-04\cr
- License: \tab What license is it under?\cr
- LazyLoad: \tab yes\cr
+ Version: \tab 0.1.0\cr
+ Date: \tab 2012-07-07\cr
+ License: \tab GPL (>= 2)\cr
}
- The package uses Rcpp modules to provide R bindings \code{npyLoadNM()}
- and \code{npyLoadIM()} which wrap the \code{npy_load()}
- function. Currently, only two-dimensional matrices are suppported but
- this can be extended easily to vectors.
-
- The following minor changes were made to \code{cnpy}:
- \itemize{
- \item the \code{printf(...); abort()} combination was replaced in
- three instances with \code{REprintf(...)} per CRAN Policy guidelines.
- \item \code{long long} was commented out in two places (which we can revert once
- CRAN switches to a new compiler and c++11 becomes standard) and one
- \code{unsigned long long} was replaced by \code{unsigned long}.
- \item several unused variables were commented out.
- }
-
+ The package uses Rcpp modules to provide R bindings \code{npyLoad()}
+ and \code{npySave()} which wrap the \code{npy_load()} and
+ \code{npy_save()} functions. Currently, only one- and two-dimensional
+ vectors and matrices are suppported; higher-dimensional arrays could
+ be added.
}
\author{
Dirk Eddelbuettel provide the binding to R (using the Rcpp package).
@@ -73,5 +64,5 @@
\code{\link[Rcpp:Rcpp-package]{Rcpp}}
}
\examples{
- ## TODO
+ ## TODO, but see demo()
}
Modified: pkg/RcppCNPy/vignettes/RcppCNPy-intro.Rnw
===================================================================
--- pkg/RcppCNPy/vignettes/RcppCNPy-intro.Rnw 2012-07-07 16:57:50 UTC (rev 3679)
+++ pkg/RcppCNPy/vignettes/RcppCNPy-intro.Rnw 2012-07-07 17:02:45 UTC (rev 3680)
@@ -16,7 +16,7 @@
%\usepackage[T1]{fontenc}
-\usepackage{color,alltt,url}
+\usepackage{color,alltt,url,booktabs}
\usepackage[authoryear,round,longnamesfirst]{natbib}
\usepackage[colorlinks]{hyperref}
\definecolor{link}{rgb}{0,0,0.3} %% next few lines courtesy of RJournal.sty
@@ -276,11 +276,8 @@
\hlstd{}\hlopt{{[}}\hlstd{}\hlnum{1}\hlstd{}\hlopt{,{]}}\hlstd{\ \ }\hlopt{}\hlstd{}\hlnum{0.0}\hlstd{\ \ }\hlnum{1.1}\hlstd{\ \ }\hlnum{2.2}\hlstd{\ \ }\hlnum{3.3}\hspace*{\fill}\\
\hlstd{}\hlopt{{[}}\hlstd{}\hlnum{2}\hlstd{}\hlopt{,{]}}\hlstd{\ \ }\hlopt{}\hlstd{}\hlnum{4.4}\hlstd{\ \ }\hlnum{5.5}\hlstd{\ \ }\hlnum{6.6}\hlstd{\ \ }\hlnum{7.7}\hspace*{\fill}\\
\hlstd{}\hlopt{{[}}\hlstd{}\hlnum{3}\hlstd{}\hlopt{,{]}}\hlstd{\ \ }\hlopt{}\hlstd{}\hlnum{8.8}\hlstd{\ \ }\hlnum{9.9\ 11.0\ 12.1}\hspace*{\fill}\\
- \hlstd{R}\hlopt{$>$\ }\hlstd{}\hspace*{\fill}\\
- \mbox{}
\normalfont
\normalsize
-
\end{quote}
Support for compressed file is currently limited to reading, but could be
@@ -327,7 +324,6 @@
\hlstd{R}\hlopt{$>$\ }\hlstd{v}\hspace*{\fill}\\
\hlopt{{[}}\hlstd{}\hlnum{1}\hlstd{}\hlopt{{]}\ }\hlstd{}\hlnum{10\ 11\ 12}\hspace*{\fill}\\
\hlstd{R}\hlopt{$>$\ }\hlstd{}\hlkwd{npySave}\hlstd{}\hlopt{(}\hlstd{}\hlstr{"simplevec.npy"}\hlstd{}\hlopt{,\ }\hlstd{v}\hlopt{)}\hspace*{\fill}\\
- \hlstd{R}\hlopt{$>$\ }\hlstd{}\hspace*{\fill}\\
\mbox{}
\normalfont
\normalsize
@@ -349,6 +345,7 @@
\subsection{Data reading in \Python}
+Reading the data back in \Python is straightforward too:
\begin{quote}
\small
@@ -363,35 +360,85 @@
\hlstd{}\hlopt{$>$$>$$>$\ }\hlstd{v\ }\hlopt{=\ }\hlstd{np}\hlopt{.}\hlstd{}\hlkwd{load}\hlstd{}\hlopt{(}\hlstd{}\hlstr{"simpleve.npy"}\hlstd{}\hlopt{)}\hspace*{\fill}\\
\hlstd{}\hlopt{$>$$>$$>$\ }\hlstd{v}\hspace*{\fill}\\
\hlkwd{array}\hlstd{}\hlopt{({[}\ }\hlstd{}\hlnum{10}\hlstd{}\hlopt{.,}\hlstd{\ \ }\hlopt{}\hlstd{}\hlnum{11}\hlstd{}\hlopt{.,}\hlstd{\ \ }\hlopt{}\hlstd{}\hlnum{12}\hlstd{}\hlopt{.{]})}\hspace*{\fill}\\
- \hlstd{}\hlopt{$>$$>$$>$\ }\hlstd{}\hspace*{\fill}\\
- \mbox{}
\normalfont
\normalsize
\end{quote}
+\section{Performance}
+
+The \R script \code{timing} in the \code{demo/} directory of package
+\pkg{RcppCNPy} provides a sinple benchmark. Given two values $n$ and $k$, a
+matrix of size $n \times k$ is created with $n$ rows and $k$ columns. It is
+written to temporary files in
+% \begin{enumerate}
+% \item ascii format using \code{write.table()};
+% \item \code{NumPy} format using \code{npySave()}; and
+% \item \code{NumPy} format using \code{npySave()} followed by a call to \code{gzip}.
+% \end{enumerate}
+i) ascii format using \code{write.table()};
+ii) \code{NumPy} format using \code{npySave()}; and
+iii) \code{NumPy} format using \code{npySave()} followed by a call to \code{gzip}.
+
+Table~\ref{tab:benchmark} shows some timing comparisons for a matrix with
+five million elements. Reading the \code{npy} is clearly fastest as it
+required only parsing of the header, followed by a single large binary read
+(and the tranpose required to translate the representation used by \Rns). The
+compressed file requires only one-fourth of the disk space, but takes
+approximately 2.5 times as long to read as the binary stream has be
+transformed. Lastly, the default ascii reading mode is clearly by far the
+slowest.
+
+\begin{table}[bt]
+ \begin{center}
+ \begin{small}
+ \begin{tabular}{rrr}
+ \toprule
+ {\bf Access method \phantom{X}} & {\bf Time in sec.} & {\bf Relative to best} \\
+ \cmidrule(r){1-3}
+ \code{npyLoad(pyfile)} & 1.95 & 1.00 \\
+ \code{npyLoad(pygzfile)} & 4.92 & 2.53 \\
+ \code{read.table(}txtfile) & 128.85 & 66.24 \\
+ \bottomrule
+ \end{tabular}
+ \end{small}
+ \caption{Performance comparison of data reads using a matrix of size
+ $10^5 \times 50$. File size are 39.7mb for ascii, 40.0mb for npy and
+ 10.8mb for npy.gz. Ten replications were performaned, and total times
+ are shown.}
+ \label{tab:benchmark}
+ \end{center}
+\end{table}
+
+
+
\section{Limitations}
\subsection{Integer support}
-Support for integer data types is available, but conditional on use of the
-\code{-std=c++11} compiler extension. Only the newer standard supports the
-\code{long long int} type needed to represent \code{int64} data on a 32-bit
-OS. So until \R switches to allowing \code{-std=c++11} on CRAN packages,
-users will need to rebuild both \pkg{Rcpp} and \pkg{RcppCNPy} with the switch
-enabled.
+Support for integer data types is conditional on use of the \code{-std=c++11}
+compiler extension. Only the newer standard supports the \code{long long int}
+type needed to represent \code{int64} data on a 32-bit OS. So until \R
+switches to allowing \code{-std=c++11} on CRAN packages, users will need to
+rebuild both \pkg{Rcpp} and \pkg{RcppCNPy} with the switch enabled. As shown
+in the previous examples, integers also transparently convert to float types.
-As shown in the previous examples, integers also transparently convert to
-float types.
-
\subsection{Higher-dimensional arrays}
\pkg{Rcpp} supports three-dimensional arrays, this could be support in
\pkg{RcppCNPy} as well.
+\subsection{\code{npz} files}
+
+The \pkg{cnpy} library supports reading and writing of sets of arrays; this
+feature could also be exported.
+
\section{Summary}
The \pkg{RcppCNPy} package provides simple reading and writing of \pkg{NumPy}
-files, using the \pkg{cnpy} library. Reading of compressed files is also
-supported as an extension.
+files, using the \pkg{cnpy} library.
+Reading of compressed files is also supported as an extension. This offers
+users a balance between more compact storage at the prices of slightly longer
+read times.
+
\end{document}
Modified: pkg/RcppCNPy/vignettes/RcppCNPy-intro.pdf
===================================================================
(Binary files differ)
More information about the Rcpp-commits
mailing list