[Rcpp-devel] Is Rcpp:: not a good choice for iterators?

Tim Keitt tkeitt at utexas.edu
Tue Aug 11 18:06:24 CEST 2015


1) Figure out where you are making full copies of your data structures
(casting NumericVector to std::vector<double>)
2) Use eg R::rbinom instead of rbinom if you do not want the return as an
Rcpp Vector type.
3) Move the outer loop (your replicate call) into C++

The copying of data is what is slowing things down most likely.

T.

http://www.keittlab.org/

On Tue, Aug 11, 2015 at 10:13 AM, FHRB Toledo <fernandohtoledo at gmail.com>
wrote:

> Dear All,
>
> It is my first post here at Rcpp-devel List. I'm passionate with the
> awesome work and with all perspectives that it can opens for speed up codes
> and embedding of C++ in R package seamless. Thus, I wish to thank all
> developers.
>
> After, some reading over the Book, online vignettes and so on. I
> translated one of my R scripts using two approaches: taking all advantages
> of Rcpp; and using just the "seamless" integration, i.e., casting SEXP
> objects, working only with std:: objects and wrapping the return.
>
> So far so good, however, by my surprise I found huge difference regarding
> the performance between both implementations. If it's not enough, even the
> raw R code shown better performance compared to the "full Rcpp".
>
> I made a short example to show you the point. First of all let me explain
> the process for who do not have any idea about meiosis: We take a
> hypothetical genome (that gives us the SNPs positions, vector Map); then
> based on this length we draw a Poisson value (where length is lambda), with
> this value (e.g., n) we draw n uniform within the genome length. With this,
> after some manipulations (findInterval % 2) we know where Crossing Over
> (recombination) happened. Then we can take this prototype and generate a
> gamete of some entry genotype (DNA tapes A and B).
>
> The codes are:
>
> Raw R script:
>
> ## Meiosis process through Poisson/Uniform
> MeiosisR <- function( Map, TapeA, TapeB )
>     {
>         ## Number of Crossing Overs **Poisson Draw**
>         nCrossOver <- rpois(n = 1, lambda = ceiling(max(Map)))
>         ## Assigning Crossing Overs on the Genome **Uniform Draw**
>         posCrossOver <- sort(runif(n = nCrossOver, min = 0, max =
> max(Map)))
>         ## Building the "Prototype" **mixture between the 2 haplotypes**
>         Prototype <- findInterval(x = Map, vec = posCrossOver) %% 2
>         ## Raffle for the reference haplotype
>         if ( as.logical( rbinom(n = 1, size = 1, prob = .5)  ) )
>             {
>                 ## Building the gamete
>                 Gamete <- ifelse(test = as.logical(Prototype),
>                                  yes = TapeA,
>                                  no = TapeB)
>             } else {
>                 ## idem with other "guide"
>                 Gamete <- ifelse(test = as.logical(Prototype),
>                                  yes = TapeB,
>                                  no = TapeA)
>             }
>         return( Gamete )
>     }
>
> STL cpp version:
>
> // [[Rcpp::plugins(cpp11)]]
> #include <algorithm>
> #include <Rcpp.h>
>
> using namespace Rcpp;
>
> // [[Rcpp::export]]
> std::vector<int> MeiosisSTL( std::vector<double> Map,
>                              std::vector<int> TapeA,
>                              std::vector<int> TapeB ) {
>
>   // Obtaining Chromossome's parameter: length, number & positions of
> SNPs
>   double L = *std::max_element(Map.begin(), Map.end());
>   int Size = Map.size();
>
>   // RNG process: draw the number & positions of Crossing Overs
>   double CrossingNumber = as<double>( rpois(1, L) );
>   std::vector<double> CrossingPositions = as<std::vector<double> >( runif(
> CrossingNumber, 0.0, L ) );
>   std::sort(CrossingPositions.begin(), CrossingPositions.end());
>
>   // Building a Gamete "Prototype"
>   std::vector<bool> Prototype( Size );
>
>   // "findInterval" adapted from Wickham's Rcpp Chapter (
> http://adv-r.had.co.nz/Rcpp.html)
>   std::vector<double>::iterator Iter, Positions;
>   std::vector<bool>::iterator ProtoIter;
>   std::vector<double>::iterator MapBegin = Map.begin();
>   std::vector<double>::iterator MapEnd = Map.end();
>   std::vector<bool>::iterator ProtoBegin = Prototype.begin();
>   std::vector<double>::iterator CrossPosBegin = CrossingPositions.begin();
>   std::vector<double>::iterator CrossPosEnd = CrossingPositions.end();
>
>   for(Iter = MapBegin, ProtoIter = ProtoBegin; Iter != MapEnd; ++Iter,
> ++ProtoIter) {
>     Positions = std::upper_bound(CrossPosBegin, CrossPosEnd, *Iter);
>     *ProtoIter = std::distance(CrossPosBegin, Positions) % 2; // odd or
> even
>   }
>
>   std::vector<int> Gamete;
>   Gamete.reserve( Size );
>   // raffle: which tape shall be the "guide"?
>   bool v01 = as<bool>( rbinom(1, 1, .5) );
>
>   if ( v01 ) {
>     for ( int i = 0; i < Size; i++ ) {
>       // Build the Gamete from the Prototype
>       Gamete.push_back( ((Prototype.at( i )) ? TapeA.at( i ) : TapeB.at( i
> )) );
>     }
>   } else {
>     for ( int i = 0; i < Size; i++ ) {
>       Gamete.push_back( ((Prototype.at( i )) ? TapeB.at( i ) : TapeA.at( i
> )) );
>     }
>   }
>
>   return Gamete ;
> }
>
> Full Rcpp version:
>
> // [[Rcpp::plugins(cpp11)]]
> #include <algorithm>
> #include <Rcpp.h>
>
> using namespace Rcpp;
>
> // [[Rcpp::export]]
> IntegerVector MeiosisRcpp( NumericVector Map,
>                            IntegerVector TapeA,
>                            IntegerVector TapeB ) {
>
>   double L = max( Map ); // sugar :)
>   int Size = Map.size();
>
>   double CrossingNumber = as<double>( rpois(1, L) );
>   NumericVector CrossingPositions = runif( CrossingNumber, 0.0, L );
>   std::sort(CrossingPositions.begin(), CrossingPositions.end());
>
>   LogicalVector Prototype( Size );
>
>   NumericVector::iterator Iter, Positions;
>   LogicalVector::iterator ProtoIter;
>   NumericVector::iterator MapBegin = Map.begin();
>   NumericVector::iterator MapEnd = Map.end();
>   LogicalVector::iterator ProtoBegin = Prototype.begin();
>   NumericVector::iterator CrossPosBegin = CrossingPositions.begin();
>   NumericVector::iterator CrossPosEnd = CrossingPositions.end();
>
>   for(Iter = MapBegin, ProtoIter = ProtoBegin; Iter != MapEnd; ++Iter,
> ++ProtoIter) {
>     Positions = std::upper_bound(CrossPosBegin, CrossPosEnd, *Iter);
>     *ProtoIter = std::distance(CrossPosBegin, Positions) % 2; // odd or
> even
>   }
>
>   IntegerVector Gamete;
>   bool v01 = as<bool>( rbinom(1, 1, .5) );
>
>   if ( v01 ) {
>     for ( int i = 0; i < Size; i++ ) {
>       Gamete.push_back( ((Prototype[ i ]) ? TapeA[ i ] : TapeB[ i ]) );
>
>     }
>   } else {
>     for ( int i = 0; i < Size; i++ ) {
>       Gamete.push_back( ((Prototype[ i ]) ? TapeB[ i ] : TapeA[ i ]) );
>     }
>   }
>
>   return Gamete ;
> }
>
> If we take the entry as:
>
> ## Genome Setup
> N <- 1E4 # number of gametes
> nM <- 400 # number of Markers (SNPs)
> L <-  4 # genome length (in Morgans)
>
> ## building the genome & full heterozigous genotype (diploid, 2 DNA tapes)
> ## map <- seq(from = .1, to = L, length.out = nM) # SNPs positions
> **regular**
> map <- sort(runif(nM, 0, L)) # idem **random**
> tapeA <- rep(1, nM)
> tapeB <- rep(2, nM)
>
> You will found results like that:
>
> evaluated expressions:
>
> RawR = replicate(n = N, expr = MeiosisR(Map = map, TapeA = tapeA, TapeB =
> tapeB))
> cppSTL = replicate(n = N, expr = MeiosisSTL(Map = map, TapeA = tapeA,
> TapeB = tapeB))
> Rcpp = replicate(n = N, expr = MeiosisRcpp(Map = map, TapeA = tapeA, TapeB
> = tapeB))
>
> - microbenchmark:
>
> Unit: seconds
>    expr       min        lq      mean   median        uq       max neval
> cld
>    RawR 1.2169414 1.2655129 1.2978896 1.286561 1.3055376 1.5020700   100
> b
>  cppSTL 0.1721323 0.1773904 0.1867463 0.180910 0.1863087 0.2835556   100
> a
>    Rcpp 1.9190921 2.0101288 2.0633424 2.046132 2.1081319 2.2832565   100
> c
>
> - rbenchmark:
>
>     test replications elapsed user.self sys.self
> 2 cppSTL          100  18.048    18.020    0.020
> 1   RawR          100 126.264   124.888    1.324
> 3   Rcpp          100 208.647   208.548    0.012
>
> Just to report, my sessionInfo() is as follow:
>
> sessionInfo() # Useful Infos **Reproducible Research**
> R version 3.2.1 (2015-06-18)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Debian GNU/Linux 8 (jessie)
>
> locale:
>  [1] LC_CTYPE=en_US.utf8       LC_NUMERIC=C
>  [3] LC_TIME=en_US.utf8        LC_COLLATE=en_US.utf8
>  [5] LC_MONETARY=en_US.utf8    LC_MESSAGES=en_US.utf8
>  [7] LC_PAPER=en_US.utf8       LC_NAME=C
>  [9] LC_ADDRESS=C              LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] rbenchmark_1.0.0     microbenchmark_1.4-2 Rcpp_0.11.5
>
> loaded via a namespace (and not attached):
>  [1] MASS_7.3-40      colorspace_1.2-6 scales_0.2.4     compiler_3.2.1
>  [5] plyr_1.8.2       tools_3.2.1      gtable_0.1.2     reshape2_1.4.1
>  [9] ggplot2_1.0.1    grid_3.2.1       stringr_0.6.2    digest_0.6.8
> [13] proto_0.3-10     munsell_0.4.2
>
> After that, I wish to request help to evaluate those results. Is "full
> Rcpp" not a good choice when using iterators? Let me know if I am not
> taking care about something that may change the things. You may notice that
> I left all parameters as fixed as possible, then, they should not influence
> on that.
>
> I am open to share the files or give further information.
>
> Thank you very much indeed regarding any insight on that.
>
> Cheers,
> FH
>
> _______________________________________________
> Rcpp-devel mailing list
> Rcpp-devel at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20150811/59706b8c/attachment.html>


More information about the Rcpp-devel mailing list