[Rcpp-devel] Is Rcpp:: not a good choice for iterators?
Tim Keitt
tkeitt at utexas.edu
Tue Aug 11 18:06:24 CEST 2015
1) Figure out where you are making full copies of your data structures
(casting NumericVector to std::vector<double>)
2) Use eg R::rbinom instead of rbinom if you do not want the return as an
Rcpp Vector type.
3) Move the outer loop (your replicate call) into C++
The copying of data is what is slowing things down most likely.
T.
http://www.keittlab.org/
On Tue, Aug 11, 2015 at 10:13 AM, FHRB Toledo <fernandohtoledo at gmail.com>
wrote:
> Dear All,
>
> It is my first post here at Rcpp-devel List. I'm passionate with the
> awesome work and with all perspectives that it can opens for speed up codes
> and embedding of C++ in R package seamless. Thus, I wish to thank all
> developers.
>
> After, some reading over the Book, online vignettes and so on. I
> translated one of my R scripts using two approaches: taking all advantages
> of Rcpp; and using just the "seamless" integration, i.e., casting SEXP
> objects, working only with std:: objects and wrapping the return.
>
> So far so good, however, by my surprise I found huge difference regarding
> the performance between both implementations. If it's not enough, even the
> raw R code shown better performance compared to the "full Rcpp".
>
> I made a short example to show you the point. First of all let me explain
> the process for who do not have any idea about meiosis: We take a
> hypothetical genome (that gives us the SNPs positions, vector Map); then
> based on this length we draw a Poisson value (where length is lambda), with
> this value (e.g., n) we draw n uniform within the genome length. With this,
> after some manipulations (findInterval % 2) we know where Crossing Over
> (recombination) happened. Then we can take this prototype and generate a
> gamete of some entry genotype (DNA tapes A and B).
>
> The codes are:
>
> Raw R script:
>
> ## Meiosis process through Poisson/Uniform
> MeiosisR <- function( Map, TapeA, TapeB )
> {
> ## Number of Crossing Overs **Poisson Draw**
> nCrossOver <- rpois(n = 1, lambda = ceiling(max(Map)))
> ## Assigning Crossing Overs on the Genome **Uniform Draw**
> posCrossOver <- sort(runif(n = nCrossOver, min = 0, max =
> max(Map)))
> ## Building the "Prototype" **mixture between the 2 haplotypes**
> Prototype <- findInterval(x = Map, vec = posCrossOver) %% 2
> ## Raffle for the reference haplotype
> if ( as.logical( rbinom(n = 1, size = 1, prob = .5) ) )
> {
> ## Building the gamete
> Gamete <- ifelse(test = as.logical(Prototype),
> yes = TapeA,
> no = TapeB)
> } else {
> ## idem with other "guide"
> Gamete <- ifelse(test = as.logical(Prototype),
> yes = TapeB,
> no = TapeA)
> }
> return( Gamete )
> }
>
> STL cpp version:
>
> // [[Rcpp::plugins(cpp11)]]
> #include <algorithm>
> #include <Rcpp.h>
>
> using namespace Rcpp;
>
> // [[Rcpp::export]]
> std::vector<int> MeiosisSTL( std::vector<double> Map,
> std::vector<int> TapeA,
> std::vector<int> TapeB ) {
>
> // Obtaining Chromossome's parameter: length, number & positions of
> SNPs
> double L = *std::max_element(Map.begin(), Map.end());
> int Size = Map.size();
>
> // RNG process: draw the number & positions of Crossing Overs
> double CrossingNumber = as<double>( rpois(1, L) );
> std::vector<double> CrossingPositions = as<std::vector<double> >( runif(
> CrossingNumber, 0.0, L ) );
> std::sort(CrossingPositions.begin(), CrossingPositions.end());
>
> // Building a Gamete "Prototype"
> std::vector<bool> Prototype( Size );
>
> // "findInterval" adapted from Wickham's Rcpp Chapter (
> http://adv-r.had.co.nz/Rcpp.html)
> std::vector<double>::iterator Iter, Positions;
> std::vector<bool>::iterator ProtoIter;
> std::vector<double>::iterator MapBegin = Map.begin();
> std::vector<double>::iterator MapEnd = Map.end();
> std::vector<bool>::iterator ProtoBegin = Prototype.begin();
> std::vector<double>::iterator CrossPosBegin = CrossingPositions.begin();
> std::vector<double>::iterator CrossPosEnd = CrossingPositions.end();
>
> for(Iter = MapBegin, ProtoIter = ProtoBegin; Iter != MapEnd; ++Iter,
> ++ProtoIter) {
> Positions = std::upper_bound(CrossPosBegin, CrossPosEnd, *Iter);
> *ProtoIter = std::distance(CrossPosBegin, Positions) % 2; // odd or
> even
> }
>
> std::vector<int> Gamete;
> Gamete.reserve( Size );
> // raffle: which tape shall be the "guide"?
> bool v01 = as<bool>( rbinom(1, 1, .5) );
>
> if ( v01 ) {
> for ( int i = 0; i < Size; i++ ) {
> // Build the Gamete from the Prototype
> Gamete.push_back( ((Prototype.at( i )) ? TapeA.at( i ) : TapeB.at( i
> )) );
> }
> } else {
> for ( int i = 0; i < Size; i++ ) {
> Gamete.push_back( ((Prototype.at( i )) ? TapeB.at( i ) : TapeA.at( i
> )) );
> }
> }
>
> return Gamete ;
> }
>
> Full Rcpp version:
>
> // [[Rcpp::plugins(cpp11)]]
> #include <algorithm>
> #include <Rcpp.h>
>
> using namespace Rcpp;
>
> // [[Rcpp::export]]
> IntegerVector MeiosisRcpp( NumericVector Map,
> IntegerVector TapeA,
> IntegerVector TapeB ) {
>
> double L = max( Map ); // sugar :)
> int Size = Map.size();
>
> double CrossingNumber = as<double>( rpois(1, L) );
> NumericVector CrossingPositions = runif( CrossingNumber, 0.0, L );
> std::sort(CrossingPositions.begin(), CrossingPositions.end());
>
> LogicalVector Prototype( Size );
>
> NumericVector::iterator Iter, Positions;
> LogicalVector::iterator ProtoIter;
> NumericVector::iterator MapBegin = Map.begin();
> NumericVector::iterator MapEnd = Map.end();
> LogicalVector::iterator ProtoBegin = Prototype.begin();
> NumericVector::iterator CrossPosBegin = CrossingPositions.begin();
> NumericVector::iterator CrossPosEnd = CrossingPositions.end();
>
> for(Iter = MapBegin, ProtoIter = ProtoBegin; Iter != MapEnd; ++Iter,
> ++ProtoIter) {
> Positions = std::upper_bound(CrossPosBegin, CrossPosEnd, *Iter);
> *ProtoIter = std::distance(CrossPosBegin, Positions) % 2; // odd or
> even
> }
>
> IntegerVector Gamete;
> bool v01 = as<bool>( rbinom(1, 1, .5) );
>
> if ( v01 ) {
> for ( int i = 0; i < Size; i++ ) {
> Gamete.push_back( ((Prototype[ i ]) ? TapeA[ i ] : TapeB[ i ]) );
>
> }
> } else {
> for ( int i = 0; i < Size; i++ ) {
> Gamete.push_back( ((Prototype[ i ]) ? TapeB[ i ] : TapeA[ i ]) );
> }
> }
>
> return Gamete ;
> }
>
> If we take the entry as:
>
> ## Genome Setup
> N <- 1E4 # number of gametes
> nM <- 400 # number of Markers (SNPs)
> L <- 4 # genome length (in Morgans)
>
> ## building the genome & full heterozigous genotype (diploid, 2 DNA tapes)
> ## map <- seq(from = .1, to = L, length.out = nM) # SNPs positions
> **regular**
> map <- sort(runif(nM, 0, L)) # idem **random**
> tapeA <- rep(1, nM)
> tapeB <- rep(2, nM)
>
> You will found results like that:
>
> evaluated expressions:
>
> RawR = replicate(n = N, expr = MeiosisR(Map = map, TapeA = tapeA, TapeB =
> tapeB))
> cppSTL = replicate(n = N, expr = MeiosisSTL(Map = map, TapeA = tapeA,
> TapeB = tapeB))
> Rcpp = replicate(n = N, expr = MeiosisRcpp(Map = map, TapeA = tapeA, TapeB
> = tapeB))
>
> - microbenchmark:
>
> Unit: seconds
> expr min lq mean median uq max neval
> cld
> RawR 1.2169414 1.2655129 1.2978896 1.286561 1.3055376 1.5020700 100
> b
> cppSTL 0.1721323 0.1773904 0.1867463 0.180910 0.1863087 0.2835556 100
> a
> Rcpp 1.9190921 2.0101288 2.0633424 2.046132 2.1081319 2.2832565 100
> c
>
> - rbenchmark:
>
> test replications elapsed user.self sys.self
> 2 cppSTL 100 18.048 18.020 0.020
> 1 RawR 100 126.264 124.888 1.324
> 3 Rcpp 100 208.647 208.548 0.012
>
> Just to report, my sessionInfo() is as follow:
>
> sessionInfo() # Useful Infos **Reproducible Research**
> R version 3.2.1 (2015-06-18)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Debian GNU/Linux 8 (jessie)
>
> locale:
> [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C
> [3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8
> [5] LC_MONETARY=en_US.utf8 LC_MESSAGES=en_US.utf8
> [7] LC_PAPER=en_US.utf8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] rbenchmark_1.0.0 microbenchmark_1.4-2 Rcpp_0.11.5
>
> loaded via a namespace (and not attached):
> [1] MASS_7.3-40 colorspace_1.2-6 scales_0.2.4 compiler_3.2.1
> [5] plyr_1.8.2 tools_3.2.1 gtable_0.1.2 reshape2_1.4.1
> [9] ggplot2_1.0.1 grid_3.2.1 stringr_0.6.2 digest_0.6.8
> [13] proto_0.3-10 munsell_0.4.2
>
> After that, I wish to request help to evaluate those results. Is "full
> Rcpp" not a good choice when using iterators? Let me know if I am not
> taking care about something that may change the things. You may notice that
> I left all parameters as fixed as possible, then, they should not influence
> on that.
>
> I am open to share the files or give further information.
>
> Thank you very much indeed regarding any insight on that.
>
> Cheers,
> FH
>
> _______________________________________________
> Rcpp-devel mailing list
> Rcpp-devel at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20150811/59706b8c/attachment.html>
More information about the Rcpp-devel
mailing list