[Rcpp-devel] When does using iterators for subscripting help?
Romain François
romain at r-enthusiasts.com
Wed Jan 4 15:32:52 CET 2012
Hi,
NumericVector:::iterator is actually alias to double*. Here is a trick
(probably does not work on not gcc compilers):
> cxxfunction( , 'Rprintf( "%s", DEMANGLE(NumericVector::iterator)); ',
plugin = "Rcpp" )()
double*NULL
We did good on the NumericVector::operator[] to optimize it as much as
possible, in theory, with proper inlining we should not even see the
difference.
I don't see a wrong use of iterators or a way to make it fly faster still.
Romain
Le 04/01/12 15:16, Hadley Wickham a écrit :
> Hi all,
>
> Slightly less dense question (hopefully). In the code below I have
> two versions of the same function - one uses operator[] and the other
> uses iterators. Following the Rcpp introduction, I had expected the
> iterator version to be substantially faster, but I'm only seeing a
> minor improvement (~10%). Why doesn't using iterators help me much
> here? Possible explanations:
>
> * I'm using iterators incorrectly in my code
>
> * Iterators help most when the vector access is sequential, and here
> the counts index is bouncing all over the place, so I shouldn't expect
> much improvement.
>
> Any ideas would be much appreciated. Thanks!
>
> Hadley
>
>
> library(inline)
>
> count_bin<- cxxfunction(signature(x = "numeric", binwidth =
> "numeric", origin = "numeric", nbins = "integer"), '
> int nbins_ = as<int>(nbins);
> double binwidth_ = as<double>(binwidth);
> double origin_ = as<double>(origin);
>
> Rcpp::NumericVector counts(nbins_);
> Rcpp::NumericVector x_(x);
>
> int n = x_.size();
>
> for(int i = 0; i< n; i++) {
> counts[(int) ((x_[i] - origin_) / binwidth_)]++;
> }
>
> return counts;
> ', plugin = "Rcpp")
>
> count_bini<- cxxfunction(signature(x = "numeric", binwidth =
> "numeric", origin = "numeric", nbins = "integer"), '
> int nbins_ = as<int>(nbins);
> double binwidth_ = as<double>(binwidth);
> double origin_ = as<double>(origin);
>
> Rcpp::NumericVector counts(nbins_);
> Rcpp::NumericVector x_(x);
>
> int n = x_.size();
>
> Rcpp::NumericVector::iterator x_i = x_.begin();
> Rcpp::NumericVector::iterator counts_i = counts.begin();
>
> for(int i = 0; i< n; i++) {
> counts_i[(int) ((x_i[i] - origin_) / binwidth_)]++;
> }
>
> return counts;
> ', plugin = "Rcpp")
>
> x<- rnorm(1e7, sd = 3)
> origin<- min(x)
> binwidth<- 1
> n<- ceiling((max(x) - origin) / binwidth)
>
> system.time(y1<- count_bin(x, binwidth, origin, nbins = n))
> system.time(y2<- count_bini(x, binwidth, origin, nbins = n))
> all.equal(y1, y2)
>
> library(microbenchmark)
> microbenchmark(
> operator = count_bin(x, binwidth, origin, nbins = n),
> iterator = count_bini(x, binwidth, origin, nbins = n))
> )
>
> # The real reason I'm exploring this is as a more efficient version
> # of tabulate for doing equal bin counts. The Rcpp version is about 10x
> # faster, mainly (I think) because it avoids creating a modified copy of the
> # vector
>
> system.time(y3<- tabulate((x - origin) / binwidth + 1, nbins = n))
> all.equal(y1, y3)
>
--
Romain Francois
Professional R Enthusiast
http://romainfrancois.blog.free.fr
More information about the Rcpp-devel
mailing list