[Rcpp-devel] R.e. Speed gain assistance (Wray, Christopher)

Christian Gunning xian at unm.edu
Fri Aug 19 02:48:23 CEST 2011


On Thu, Aug 18, 2011 at 7:57 AM, Hadley Wickham <hadley at rice.edu> wrote:
>> Take a look at this (unweighted) sample() function.  It's giving R a
>> run for it's money, and is pretty fast even for very large n, and it
>> looks statistically correct (not sure if I'm glossing over ugly,
>> machine-specific details of double->int conversion here). Does this
>> shed any light on your question?
>> mysample<-cxxfunction( signature(x='numeric', n="numeric"), src1, plugin='Rcpp')
>>
>> system.time(result <- mysample(1:50, 1e7))
>> system.time(resultR <- sample(1:50, 1e7, replace=T))
>
> Don't forget about sample.int:
>
>> system.time(result <- mysample(1:50, 1e7))
>   user  system elapsed
>  0.493   0.064   0.557
>> system.time(resultR <- sample(1:50, 1e7, replace=T))
>   user  system elapsed
>  0.872   0.089   0.962
>> system.time(resultR <- sample.int(1:50, 1e7, replace=T))
>   user  system elapsed
>  0.209   0.001   0.212
>
> Hadley

So, I can't figure out how to import and use sample.int into C++.  I
keep getting the following when I exchange sample.int for sample,
below.  I'm guess it's the interaction between IntegerVector and R??
Not sure it matters, but I'm surprised/confused:

cpp_exception("invalid first argument", "Rcpp::eval_error")

src2<-'
NumericVector xx(x);
int xx_sz = xx.size(); // index of last xx
// generate 0:(length(xx)-1)
IntegerVector index(xx_sz, 1.0); // cant use 1 here, 1.0 still required
std::partial_sum(index.begin(), index.end(), index.begin());
index = index -1;

// sample out of index
int nn=as<int>(n);
IntegerVector sindex(nn);
NumericVector ret(nn);
//Function sample("sample.int"); // doesnt work
Function sample("sample");
RNGScope scope;
// sindex has length nn
sindex = sample(index, _["size"]=nn, _["replace"] = true);
for (int i=0; i<nn; i++) {
    ret[i] = xx[ sindex[i] ];
};
return(ret);
'

mysample2<-cxxfunction( signature(n="numeric", x='numeric'), src2,
plugin='Rcpp')
print(system.time(result <- mysample2( 1e7, 50:1)))

I can shave a little time off by changing everything to NumericVectors
and using ret in place of sindex (and thus avoiding an extra Vector
allocation of size nn), but then I don't expect sample.int to work
anyway.

Moral: In this case, a pure C++ loop is cleaner *and* faster.  Not to
beat a dead horse or anything...
-Christian


-- 
A man, a plan, a cat, a ham, a yak, a yam, a hat, a canal – Panama!


More information about the Rcpp-devel mailing list