[Rcpp-devel] When does using iterators for subscripting help?

Hadley Wickham hadley at rice.edu
Wed Jan 4 15:16:03 CET 2012


Hi all,

Slightly less dense question (hopefully).  In the code below I have
two versions of the same function - one uses operator[] and the other
uses iterators.  Following the Rcpp introduction, I had expected the
iterator version to be substantially faster, but I'm only seeing a
minor improvement (~10%).  Why doesn't using iterators help me much
here?  Possible explanations:

* I'm using iterators incorrectly in my code

* Iterators help most when the vector access is sequential, and here
the counts index is bouncing all over the place, so I shouldn't expect
much improvement.

Any ideas would be much appreciated.  Thanks!

Hadley


library(inline)

count_bin <- cxxfunction(signature(x = "numeric", binwidth =
"numeric", origin = "numeric", nbins = "integer"), '
  int nbins_ =  as<int>(nbins);
  double binwidth_ = as<double>(binwidth);
  double origin_ = as<double>(origin);

  Rcpp::NumericVector counts(nbins_);
  Rcpp::NumericVector x_(x);

  int n = x_.size();

  for(int i = 0; i < n; i++) {
    counts[(int) ((x_[i] - origin_) / binwidth_)]++;
  }

  return counts;
', plugin = "Rcpp")

count_bini <- cxxfunction(signature(x = "numeric", binwidth =
"numeric", origin = "numeric", nbins = "integer"), '
  int nbins_ =  as<int>(nbins);
  double binwidth_ = as<double>(binwidth);
  double origin_ = as<double>(origin);

  Rcpp::NumericVector counts(nbins_);
  Rcpp::NumericVector x_(x);

  int n = x_.size();

  Rcpp::NumericVector::iterator x_i = x_.begin();
  Rcpp::NumericVector::iterator counts_i = counts.begin();

  for(int i = 0; i < n; i++) {
    counts_i[(int) ((x_i[i] - origin_) / binwidth_)]++;
  }

  return counts;
', plugin = "Rcpp")

x <- rnorm(1e7, sd = 3)
origin <- min(x)
binwidth <- 1
n <- ceiling((max(x) - origin) / binwidth)

system.time(y1 <- count_bin(x, binwidth, origin, nbins = n))
system.time(y2 <- count_bini(x, binwidth, origin, nbins = n))
all.equal(y1, y2)

library(microbenchmark)
microbenchmark(
  operator = count_bin(x, binwidth, origin, nbins = n),
  iterator = count_bini(x, binwidth, origin, nbins = n))
)

# The real reason I'm exploring this is as a more efficient version
# of tabulate for doing equal bin counts.  The Rcpp version is about 10x
# faster, mainly (I think) because it avoids creating a modified copy of the
# vector

system.time(y3 <- tabulate((x - origin) / binwidth + 1, nbins = n))
all.equal(y1, y3)

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/


More information about the Rcpp-devel mailing list