[Rcpp-devel] Using pointers with Numeric Vectors
Romain Francois
romain at r-enthusiasts.com
Fri Dec 21 09:04:37 CET 2012
Hello,
I'm arriving after the battle here.
Let's clear out some misconceptions. The copy constructor of
NumericVector is very cheap, it just copies a pointer to the underlying
R object, i.e. the data is not copied. This is the way Rcpp classes are
implemented, they encapsulate an R object and provide nicer syntax, but
they work with the data of the R object.
So when you call e.g.
double get_var(NumericVector v,double m,int l)
which triggers a copy of v, this is cheap, you don't have to worry about
it. I would still recommend to use reference semantics, like others have
pointers and delcare your function as:
double get_var(const NumericVector& v,double m,int l)
Learning about references and const references is very useful in C++.
Now, the data you first pass in from R is an integer vector:
> a=1:1000000
> typeof( a )
[1] "integer"
So creating a NumericVector from it will be expensive because a coercion
from an integer vector to a numeric vector is needed.
Consider this code:
#include <Rcpp/Benchmark/Timer.h>
// [[Rcpp::export]]
SEXP construct(SEXP x, int n){
Timer timer ;
timer.step("start");
for(int i=0; i<n; i++){
NumericVector vec(x) ;
}
timer.step("NumericVector ctor") ;
return timer ;
}
This uses our internal Timer, i.e.. to time things at the C++ level to
measure how many nanoseconds are needed for n constructions of a
NumericVector from an R object.
Now things are very different if I feed in an integer vector or a
numeric vector:
> a <- 1:1e+06
> b <- seq(1, 1e+06, by = 1)
> typeof(a)
[1] "integer"
> typeof(b)
[1] "double"
> n <- 1000
> construct(a, n)[-1]/n
NumericVector ctor
4128426
> construct(b, n)[-1]/n
NumericVector ctor
36.767
On average it took 4 128 426 ns (4ms) to construct s, this is because a
copy is made.
but it takes 36 ns to create the vector from a numeric vector. not free,
but very cheap. This only includes copying a pointer, and doing the
business of protection of the underlying object from R's garbage
collection.
Now JJ's timings of callRcppVar:
// [[Rcpp::export]]
void callRcppVar(const NumericVector& v) {
for (int i=0; i<100; i++)
rcppVar(v);
}
This takes about the same time it takes to construct the numeric vector,
so I suspect there is some compiler magic taking place, i.e. since the
returned value of rcppVar is just ignored, I'm guessing the compiler
just does not bother. Let's split things :
// [[Rcpp::export]]
double returnDouble() {
return 1.0 ;
}
> system.time(returnDouble())
user system elapsed
0 0 0
The price of returning a double is none, or very small. I'm just trying
to evaluate the cost of the invocation of whatever sourceCpp makes since
it does very nice work of marsjalling inputs and outputs.
Now, let's do the same, but taking the NumericVector as before:
// [[Rcpp::export]]
double returnDouble(const NumericVector& v) {
return 1.0 ;
}
> system.time(returnDouble(a))
user system elapsed
0.003 0.000 0.003
Those 3 ms are the time it takes to construct the NumericVector from an
integer vector.
Now,
> system.time(callRcppVar(a))
user system elapsed
0.003 0.000 0.002
I believe nothing is done here. Here is a version that won't be
optimized away by the compiler because we need the value:
// [[Rcpp::export]]
double callRcppVar2(const NumericVector& v) {
double x = 0.0 ;
for (int i=0; i<100; i++)
x += rcppVar(v);
return x ;
}
> system.time(callRcppVar2(a))
user system elapsed
0.198 0.000 0.198
That's more like it. And we are not paying the price of +=
// [[Rcpp::export]]
double callRcppVar3(const NumericVector& v) {
double x = 0.0 ;
for (int i=0; i<100; i++)
x += 2.0;
return x ;
}
> system.time(callRcppVar3(a))
user system elapsed
0.003 0.000 0.003
Sorry for the long-ish email, I hope this clarifies things.
Romain
PS: Alon. You are welcome to use stack overflow or here, but if you use
stack overflow, please play the game as it is supposed to. You've asked
5 questions there, got some really valid answers but you never bothered
to accept any of them. You are not likely to get more answers if you
don't play the game correctly. Please read the faq.
http://stackoverflow.com/faq
stack overflow is really good, but it is only useful if people use it
appropriately.
Le 20/12/12 18:35, Alon Honig a écrit :
> My current RCPP program is only 4 times faster than its R byte code
> compiled equivalent. I am trying to speed up my code and from what I
> understand using pointers prevents an external function from copying the
> entire object when executing its methods. I am having difficulty
> finding salient examples that use points and NumericVectors. Please tell
> me how in this same code I could get the functions "get_sum" and
> "get_var" to point to the NumericVector object "v" rather than copy the
> whole thing.
>
> library(inline)
> library(Rcpp)
> a=1:1000000
> Rcpp.var = cxxfunction(signature(input="numeric"), plugin="Rcpp",
> body="
>
> NumericVector v = input;
>
> int n = v.size();
>
> double v_mean = get_sum(v,n)/n;
>
> double v_var = get_var(v,v_mean,n);
>
> return wrap(v_var);
>
> ",includes="
>
> double get_var(NumericVector v,double m,int l)
>
> {double a = 0;
>
> for (int i = 0; i <l;i++)
>
> {a += (v[i]-m)*(v[i]-m);}
>
> return(a/l);
>
> }
>
>
> double get_sum(NumericVector v,int l)
>
> { double s = 0;
>
> for (int i = 0; i <l;i++)
>
> {s += v[i];}
>
> return(s);
>
> }
>
> ")
>
> b=system.time(for (i in 1:100)Rcpp.var (a))
> c= system.time(for (i in 1:100)var (a))
>
>
>
> Thank you Alon.
>
> P.S. I am aware that the "get_var" function provides the population
> variance and not the sample variance.
--
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30
R Graph Gallery: http://gallery.r-enthusiasts.com
blog: http://romainfrancois.blog.free.fr
|- http://bit.ly/RE6sYH : OOP with Rcpp modules
`- http://bit.ly/Thw7IK : Rcpp modules more flexible
More information about the Rcpp-devel
mailing list