[Rcpp-devel] Using pointers with Numeric Vectors

Romain Francois romain at r-enthusiasts.com
Fri Dec 21 09:04:37 CET 2012


Hello,

I'm arriving after the battle here.

Let's clear out some misconceptions. The copy constructor of 
NumericVector is very cheap, it just copies a pointer to the underlying 
R object, i.e. the data is not copied. This is the way Rcpp classes are 
implemented, they encapsulate an R object and provide nicer syntax, but 
they work with the data of the R object.

So when you call e.g.

double get_var(NumericVector v,double m,int l)

which triggers a copy of v, this is cheap, you don't have to worry about 
it. I would still recommend to use reference semantics, like others have 
pointers and delcare your function as:

double get_var(const NumericVector& v,double m,int l)

Learning about references and const references is very useful in C++.



Now, the data you first pass in from R is an integer vector:

 > a=1:1000000
 > typeof( a )
[1] "integer"

So creating a NumericVector from it will be expensive because a coercion 
from an integer vector to a numeric vector is needed.

Consider this code:

#include <Rcpp/Benchmark/Timer.h>

// [[Rcpp::export]]
SEXP construct(SEXP x, int n){
     Timer timer ;
     timer.step("start");
     for(int i=0; i<n; i++){
         NumericVector vec(x) ;
     }
     timer.step("NumericVector ctor") ;
     return timer ;
}

This uses our internal Timer, i.e.. to time things at the C++ level to 
measure how many nanoseconds are needed for n constructions of a 
NumericVector from an R object.

Now things are very different if I feed in an integer vector or a 
numeric vector:

 > a <- 1:1e+06

 > b <- seq(1, 1e+06, by = 1)

 > typeof(a)
[1] "integer"

 > typeof(b)
[1] "double"

 > n <- 1000

 > construct(a, n)[-1]/n
NumericVector ctor
            4128426

 > construct(b, n)[-1]/n
NumericVector ctor
             36.767


On average it took 4 128 426 ns (4ms) to construct s, this is because a 
copy is made.

but it takes 36 ns to create the vector from a numeric vector. not free, 
but very cheap. This only includes copying a pointer, and doing the 
business of protection of the underlying object from R's garbage 
collection.


Now JJ's timings of callRcppVar:

// [[Rcpp::export]]
void callRcppVar(const NumericVector& v) {
    for (int i=0; i<100; i++)
       rcppVar(v);
}

This takes about the same time it takes to construct the numeric vector, 
so I suspect there is some compiler magic taking place, i.e. since the 
returned value of rcppVar is just ignored, I'm guessing the compiler 
just does not bother. Let's split things :

// [[Rcpp::export]]
double returnDouble() {
    return 1.0 ;
}

 > system.time(returnDouble())
    user  system elapsed
       0       0       0

The price of returning a double is none, or very small. I'm just trying 
to evaluate the cost of the invocation of whatever sourceCpp makes since 
it does very nice work of marsjalling inputs and outputs.

Now, let's do the same, but taking the NumericVector as before:

// [[Rcpp::export]]
double returnDouble(const NumericVector& v) {
    return 1.0 ;
}

 > system.time(returnDouble(a))
    user  system elapsed
   0.003   0.000   0.003

Those 3 ms are the time it takes to construct the NumericVector from an 
integer vector.

Now,

 > system.time(callRcppVar(a))
    user  system elapsed
   0.003   0.000   0.002

I believe nothing is done here. Here is a version that won't be 
optimized away by the compiler because we need the value:

// [[Rcpp::export]]
double callRcppVar2(const NumericVector& v) {
    double x = 0.0 ;
    for (int i=0; i<100; i++)
       x += rcppVar(v);
    return x ;
}

 > system.time(callRcppVar2(a))
    user  system elapsed
   0.198   0.000   0.198

That's more like it. And we are not paying the price of +=

// [[Rcpp::export]]
double callRcppVar3(const NumericVector& v) {
    double x = 0.0 ;
    for (int i=0; i<100; i++)
       x += 2.0;
    return x ;
}

 > system.time(callRcppVar3(a))
    user  system elapsed
   0.003   0.000   0.003


Sorry for the long-ish email, I hope this clarifies things.

Romain

PS: Alon. You are welcome to use stack overflow or here, but if you use 
stack overflow, please play the game as it is supposed to. You've asked 
5 questions there, got some really valid answers but you never bothered 
to accept any of them. You are not likely to get more answers if you 
don't play the game correctly. Please read the faq. 
http://stackoverflow.com/faq

stack overflow is really good, but it is only useful if people use it 
appropriately.


Le 20/12/12 18:35, Alon Honig a écrit :
> My current RCPP program is only 4 times faster than its R byte code
> compiled equivalent. I am trying to speed up my code and from what I
> understand using pointers prevents an external function from copying the
> entire object when executing its methods.   I am having difficulty
> finding salient examples that use points and NumericVectors. Please tell
> me how in this same code I could get the functions "get_sum" and
> "get_var" to point to the NumericVector object "v" rather than copy the
> whole thing.
>
>     library(inline)
>     library(Rcpp)
>       a=1:1000000
>     Rcpp.var = cxxfunction(signature(input="numeric"), plugin="Rcpp",
>     body="
>
>           NumericVector v = input;
>
>           int n = v.size();
>
>           double v_mean =  get_sum(v,n)/n;
>
>           double v_var = get_var(v,v_mean,n);
>
>           return wrap(v_var);
>
>         ",includes="
>
>           double get_var(NumericVector v,double m,int l)
>
>           {double a = 0;
>
>               for (int i = 0; i <l;i++)
>
>         {a += (v[i]-m)*(v[i]-m);}
>
>         return(a/l);
>
>           }
>
>
>           double get_sum(NumericVector v,int l)
>
>           { double s = 0;
>
>             for (int i = 0; i <l;i++)
>
>         {s += v[i];}
>
>         return(s);
>
>           }
>
>       ")
>
>       b=system.time(for (i in 1:100)Rcpp.var (a))
>       c= system.time(for (i in 1:100)var (a))
>
>
>
> Thank you Alon.
>
> P.S. I am aware that the "get_var" function provides the population
> variance and not the sample variance.


-- 
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30

R Graph Gallery: http://gallery.r-enthusiasts.com

blog:            http://romainfrancois.blog.free.fr
|- http://bit.ly/RE6sYH : OOP with Rcpp modules
`- http://bit.ly/Thw7IK : Rcpp modules more flexible



More information about the Rcpp-devel mailing list