[Rcpp-devel] Struggling with Rcpp sugar

Romain Francois romain at r-enthusiasts.com
Sat Nov 17 15:47:54 CET 2012


Le 17/11/12 14:42, Hadley Wickham a écrit :
> Hi all,
>
> I've included what seems to be a simple application of Rcpp sugar
> below, but I'm getting some very strange results.  Any help would be
> much appreciate!
>
> Thanks,
>
> Hadley
>
> library(Rcpp)
> library(microbenchmark)
>
> # Compute distance between single point and vector of points
> pdist1 <- function(x, ys) {
>    (x - ys) ^ 2
> }
>
> cppFunction('
>    NumericVector pdist2(double x, NumericVector ys) {
>      int n = ys.size();
>      NumericVector out(n);
>
>      for(int i = 0; i < n; ++i) {
>        out[i] = pow(ys[i] - x, 2);
>      }
>      return out;
>    }
> ')
>
> ys <- runif(1e4)
> all.equal(pdist1(0.5, ys), pdist2(0.5, ys))
>
> library(microbenchmark)
> microbenchmark(
>    pdist1(0.5, ys),
>    pdist2(0.5, ys)
> )
> # C++ version about twice as fast, presumably because it avoids a
> # complete vector allocation.
>
>
> # Sugar version:
> cppFunction('
>    NumericVector pdist3(double x, NumericVector ys) {
>      return pow((x - ys), 2);
>    }
> ')
> all.equal(pdist1(0.5, ys), pdist3(0.5, ys))
>
> microbenchmark(
>    pdist1(0.5, ys),
>    pdist2(0.5, ys),
>    pdist3(0.5, ys)
> )
> # 10-fold slower??  Maybe it's because I'm using a double instead of
> # a numeric vector?

That's weird. Not sure where the ineffiency is. Some hunting.
My guess is that sugar pow needs a refresh.

> cppFunction('
>    NumericVector pdist4(NumericVector x, NumericVector ys) {
>      return pow((x - ys), 2);
>    }
> ')
> all.equal(pdist1(0.5, ys), pdist4(0.5, ys))
>
> # Is this a bug in sugar? Should recycle to length of longest vector.
> # Let's try flipping the order of operations:

There's no recycling. We pick the size of "x" and consider that y is of 
the same length. Something worth mentionning in documentation. Recycling 
would add some expense.

> cppFunction('
>    NumericVector pdist5(NumericVector x, NumericVector ys) {
>      return pow((ys - x), 2);
>    }
> ')
> all.equal(pdist1(0.5, ys), pdist5(0.5, ys))
> # Where are the missing values coming from??

Same here. We pick the length of ys (the lhs of the binary operation).

Both pdist4 and pdist5 are undeterminate behaviour.


I would prefer to keep it that way and document it that vector should 
have the same size, ... rather than introduce the extra cost of recyling.



-- 
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30

R Graph Gallery: http://gallery.r-enthusiasts.com
`- http://bit.ly/SweN1Z : SuperStorm Sandy

blog:            http://romainfrancois.blog.free.fr
|- http://bit.ly/RE6sYH : OOP with Rcpp modules
`- http://bit.ly/Thw7IK : Rcpp modules more flexible



More information about the Rcpp-devel mailing list