[Rcpp-devel] Dirk's benchmarking of code for creation of random binary matrices

Tue Sep 4 18:39:53 CEST 2012

Dirk Eddelbuettel recently blogged on "Faster creation of binomial
matrices" (http://dirk.eddelbuettel.com/blog/code/snippets/) comparing
various approaches in R and using C++ callable from R with the Rcpp
package. I used cut-and-paste to create a script from the code in his
posting  but omitting the arma and sugar functions that return the
wrong answer (uniform(0,1) random variates rather than random binary
(0/1) responses with equal probability).  I added another function
using RcppEigen with disappointing results.  It came back slower than
sugarFloor and slower than David Smith's pure R versilon.  The results
are enclosed.

The armaFloor version, which is fastest, uses a different random
number generator, and thus doesn't offer the same controls as the
other versions, e.g. using set.seed in R to create a reproducible
experiment.

I decided to compare these results with Julia (and I think I got Dirk
a bit irritated by doing so).  Again, there is the caveat that these
results are using a different uniform random number generator from R's
so you should not take comparative timings too seriously.  In Julia
(www.julialang.org) all functions are generics using multiple dispatch
(like S4 generics and methods in R) and a 500 by 100 matrix of uniform
(0,1) random variates can be created as rand(500, 100).  Applying
iround to this produces 0/1 responses stored as integers.
julia> iround(rand(500, 100))
500x100 Int64 Array:
 0  0  1  0  0  1  0  1  1  0  1  1  1  :  0  0  1  1  1  0  1  0  1  0  0  1
 0  1  0  0  0  1  0  1  0  1  0  1  0     0  1  0  1  1  1  0  0  0  0  1  1
 1  1  1  1  0  1  0  1  0  0  0  1  1     1  1  1  0  0  1  1  1  1  0  0  1
 0  1  1  1  1  1  1  1  1  1  0  1  0     0  0  1  0  0  1  0  1  0  1  0  0
 1  0  1  0  0  0  0  1  1  1  1  0  1     0  1  1  1  1  0  0  0  0  0  0  0
 1  1  1  1  1  0  0  0  1  0  0  0  0  :  1  1  0  1  1  0  0  0  0  1  1  1
 0  0  0  1  1  1  0  1  0  1  0  1  0     1  0  0  0  1  1  0  1  1  0  0  1
 1  1  0  1  1  1  1  0  0  1  0  1  1     1  0  1  0  0  1  0  1  0  0  0  0
 1  1  0  1  1  1  1  0  1  0  0  0  1     1  0  1  1  0  1  1  0  1  1  0  1
 0  1  0  0  1  0  0  0  1  0  1  0  1     0  1  0  1  1  1  0  0  1  1  1  1
 :              :              :
 1  0  0  0  0  1  1  0  0  1  1  1  0     1  0  0  1  1  1  1  0  1  0  0  1
 1  1  1  1  0  0  0  0  1  1  1  0  0     0  0  1  1  0  1  1  0  0  1  1  1
 1  1  0  0  1  0  0  0  0  1  1  1  0     1  0  0  0  1  1  1  0  1  0  1  1
 1  1  1  0  1  1  1  0  0  1  0  0  0     1  1  1  0  1  1  0  1  0  1  0  1
 1  1  0  1  0  1  1  0  1  0  0  1  0  :  1  1  1  0  0  1  1  0  1  1  0  0
 1  0  1  1  0  1  0  1  0  1  1  0  0     1  0  1  1  1  0  1  1  0  0  0  1
 1  0  1  1  0  0  1  0  1  1  0  0  0     0  0  0  0  0  1  1  1  1  0  1  0
 0  0  0  0  0  1  1  1  1  0  1  1  1     1  0  1  0  0  1  1  1  0  0  1  0
 1  1  1  1  0  1  0  0  0  0  1  1  0     1  1  0  1  0  0  1  1  1  0  0  0

Timing 100 replications of this and then repeating the timing 10 times
for consistency produces
julia> [@elapsed sum([@elapsed iround(rand(500, 100)) for k in 1:100])
for l in 1:10 ]
10-element Float64 Array:
 0.0764122
 0.0762029
 0.075958
 0.0767078
 0.07639
 0.0763202
 0.078331
 0.07691
 0.085947
 0.076138

julia> [@elapsed sum([@elapsed iround(rand(500, 100)) for k in 1:100])
for l in 1:10 ]
10-element Float64 Array:
 0.078619
 0.078573
 0.0764699
 0.079267
 0.0765851
 0.0762711
 0.077647
 0.0764351
 0.0777521
 0.078192

There is a slightly faster version using a comprehension, which is a
way of creating an array that essentially folds the loops inside an
expression.

julia> [iround(rand()) for i in 1:500, j in 1:100]
500x100 Int64 Array:
 0  0  0  0  0  1  0  0  1  0  0  1  1  :  1  1  0  1  1  1  0  1  1  1  1  0
 0  1  1  1  1  1  1  1  0  1  1  1  0     0  1  0  1  1  0  0  0  0  1  1  0
 0  1  1  1  0  0  0  0  0  0  1  0  0     1  0  1  1  0  0  1  0  1  1  1  1
 1  1  0  0  1  0  0  1  1  1  1  1  1     1  0  1  1  0  1  0  0  1  1  1  0
 0  0  1  1  0  1  0  1  0  0  0  0  1     0  0  1  0  1  1  1  1  1  0  0  1
 0  0  0  0  0  0  0  1  0  1  1  0  1  :  1  0  1  1  0  1  1  1  1  0  1  1
 1  0  1  1  0  1  0  0  1  0  1  0  1     0  0  1  1  1  1  0  0  0  0  1  0
 0  1  1  1  1  1  0  0  0  0  0  0  0     1  0  0  0  1  0  1  0  1  1  0  0
 0  0  0  0  1  0  0  0  0  0  0  0  1     0  0  1  0  1  1  1  0  0  0  0  1
 1  1  1  1  1  1  1  0  1  0  1  0  0     0  1  1  1  0  1  1  0  1  1  1  1
 :              :              :
 0  0  1  0  0  1  1  1  1  0  1  0  1     1  1  0  1  1  1  0  1  1  1  0  1
 0  1  0  0  0  0  0  0  1  1  0  1  1     0  0  0  1  0  0  1  1  0  1  0  0
 1  0  0  0  1  1  1  0  0  1  0  0  1     0  0  1  1  0  0  1  0  1  0  1  0
 1  1  0  1  0  1  1  1  0  1  1  0  0     0  0  1  1  1  1  1  1  1  1  0  1
 1  0  1  0  0  0  1  1  1  0  0  1  0  :  1  1  1  0  1  0  0  1  1  0  0  0
 0  1  0  1  0  0  1  1  1  0  1  0  0     1  0  1  1  1  1  1  0  0  0  0  1
 1  1  1  0  0  0  1  1  1  0  0  0  0     1  1  1  0  1  0  0  1  0  1  1  1
 0  0  0  1  1  0  0  0  1  1  0  1  1     1  0  1  0  1  0  0  0  1  0  1  1
 0  1  1  0  1  1  0  1  0  1  0  0  1     1  1  0  1  1  1  1  0  0  0  1  0

The reason the comprehension is slightly faster (emphasis on the
"slightly") is because it does not create a temporary array that must
be written to memory and read back in.  The scalar function,
iround(rand()), is compiled then applied directly to generate the 500
by 100 array.

julia> [ @elapsed sum([@elapsed [iround(rand()) for i in 1:500, j in
1:100] for k in 1:100]) for l in 1:10 ]
10-element Float64 Array:
 0.0591459
 0.068851
 0.0588531
 0.0701399
 0.059118
 0.0688679
 0.0588412
 0.0696108
 0.0589628
 0.0688081

It ends up approximately twice as fast as the armaFloor version but
such comparisons are to be taken with a grain of salt because most of
the time will be spend in the random number generator so we are not
comparing apples with apples.  Also, it is a rather silly benchmark
because the bigger question is "what are you going to do with this
matrix of random binary values?".
-------------- next part --------------

R version 2.15.1 (2012-06-22) -- "Roasted Marshmallows"
Copyright (C) 2012 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(inline)
> library(compiler)
> library(rbenchmark)
> 
> n <- 500
> k <- 100
> scott <- function(N, K) {
+     mm <- matrix(0, N, K)
+     apply(mm, c(1, 2), function(x) sample(c(0, 1), 1))
+ }
> scottComp <- cmpfun(scott)
> ted <- function(N, K) {
+     matrix(rbinom(N * K, 1, 0.5), ncol = K, nrow = N)
+ }
> david <- function(m, n) {
+     matrix(sample(0:1, m * n, replace = TRUE), m, n)
+ }
> luis <- function(m, n) {
+      round(matrix(runif(m * n), m, n))
+ }
> armaFloor <- cxxfunction(signature(ns="integer", ks="integer"), plugin = "RcppArmadillo", body='
+    int n = Rcpp::as<int>(ns);
+    int k = Rcpp::as<int>(ks);
+    return wrap(arma::floor(arma::randu(n, k) + 0.5));
+ ')
> sugarFloor <- cxxfunction(signature(ns="integer", ks="integer"), plugin = "Rcpp", body='
+    int n = Rcpp::as<int>(ns);
+    int k = Rcpp::as<int>(ks);
+    Rcpp::RNGScope tmp;
+    Rcpp::NumericVector draws = Rcpp::floor(Rcpp::runif(n*k)+0.5);
+    return Rcpp::NumericMatrix(n, k, draws.begin());
+ ')
> eigenFloor <- cxxfunction(signature(ns="integer", ks="integer"), plugin = "RcppEigen", body='
+    int n(Rf_asInteger(ns)), k(Rf_asInteger(ks));
+    Rcpp::RNGScope tmp;
+    return Rcpp::wrap(Eigen::ArrayXXd(n, k).unaryExpr(std::ptr_fun(rbinary)));
+ ', includes='
+    double rbinary(double x) {return std::floor(Rf_runif(0.5, 1.5));}
+ ')
Loading required package: lattice

Attaching package: ?RcppEigen?

The following object(s) are masked from ?package:RcppArmadillo?:

    fastLm, fastLmPure

> res <- benchmark(scott(n, k), scottComp(n,k),
+                  ted(n, k), david(n, k), luis(n, k),
+                  armaFloor(n, k), sugarFloor(n, k), eigenFloor(n, k),
+                  order="relative", replications=100)
> print(res[,1:4])
              test replications elapsed   relative
6  armaFloor(n, k)          100   0.130   1.000000
7 sugarFloor(n, k)          100   0.159   1.223077
4      david(n, k)          100   0.215   1.653846
8 eigenFloor(n, k)          100   0.266   2.046154
3        ted(n, k)          100   0.565   4.346154
5       luis(n, k)          100   0.756   5.815385
1      scott(n, k)          100  45.012 346.246154
2  scottComp(n, k)          100  45.054 346.569231
> 
> proc.time()
   user  system elapsed 
108.318   1.032 109.449