[Rcpp-devel] Is Rcpp:: not a good choice for iterators?

Hao Ye hye at ucsd.edu
Tue Aug 11 18:16:59 CEST 2015


It also looks like you are not initializing the size of Gamete in the "full
Rcpp" version, so it is resizing on every loop iteration...

Best,
--
Hao Ye
hye at ucsd.edu

On Tue, Aug 11, 2015 at 12:06 PM, Tim Keitt <tkeitt at utexas.edu> wrote:

> 1) Figure out where you are making full copies of your data structures
> (casting NumericVector to std::vector<double>)
> 2) Use eg R::rbinom instead of rbinom if you do not want the return as an
> Rcpp Vector type.
> 3) Move the outer loop (your replicate call) into C++
>
> The copying of data is what is slowing things down most likely.
>
> T.
>
> http://www.keittlab.org/
>
> On Tue, Aug 11, 2015 at 10:13 AM, FHRB Toledo <fernandohtoledo at gmail.com>
> wrote:
>
>> Dear All,
>>
>> It is my first post here at Rcpp-devel List. I'm passionate with the
>> awesome work and with all perspectives that it can opens for speed up codes
>> and embedding of C++ in R package seamless. Thus, I wish to thank all
>> developers.
>>
>> After, some reading over the Book, online vignettes and so on. I
>> translated one of my R scripts using two approaches: taking all advantages
>> of Rcpp; and using just the "seamless" integration, i.e., casting SEXP
>> objects, working only with std:: objects and wrapping the return.
>>
>> So far so good, however, by my surprise I found huge difference regarding
>> the performance between both implementations. If it's not enough, even the
>> raw R code shown better performance compared to the "full Rcpp".
>>
>> I made a short example to show you the point. First of all let me explain
>> the process for who do not have any idea about meiosis: We take a
>> hypothetical genome (that gives us the SNPs positions, vector Map); then
>> based on this length we draw a Poisson value (where length is lambda), with
>> this value (e.g., n) we draw n uniform within the genome length. With this,
>> after some manipulations (findInterval % 2) we know where Crossing Over
>> (recombination) happened. Then we can take this prototype and generate a
>> gamete of some entry genotype (DNA tapes A and B).
>>
>> The codes are:
>>
>> Raw R script:
>>
>> ## Meiosis process through Poisson/Uniform
>> MeiosisR <- function( Map, TapeA, TapeB )
>>     {
>>         ## Number of Crossing Overs **Poisson Draw**
>>         nCrossOver <- rpois(n = 1, lambda = ceiling(max(Map)))
>>         ## Assigning Crossing Overs on the Genome **Uniform Draw**
>>         posCrossOver <- sort(runif(n = nCrossOver, min = 0, max =
>> max(Map)))
>>         ## Building the "Prototype" **mixture between the 2 haplotypes**
>>         Prototype <- findInterval(x = Map, vec = posCrossOver) %% 2
>>         ## Raffle for the reference haplotype
>>         if ( as.logical( rbinom(n = 1, size = 1, prob = .5)  ) )
>>             {
>>                 ## Building the gamete
>>                 Gamete <- ifelse(test = as.logical(Prototype),
>>                                  yes = TapeA,
>>                                  no = TapeB)
>>             } else {
>>                 ## idem with other "guide"
>>                 Gamete <- ifelse(test = as.logical(Prototype),
>>                                  yes = TapeB,
>>                                  no = TapeA)
>>             }
>>         return( Gamete )
>>     }
>>
>> STL cpp version:
>>
>> // [[Rcpp::plugins(cpp11)]]
>> #include <algorithm>
>> #include <Rcpp.h>
>>
>> using namespace Rcpp;
>>
>> // [[Rcpp::export]]
>> std::vector<int> MeiosisSTL( std::vector<double> Map,
>>                              std::vector<int> TapeA,
>>                              std::vector<int> TapeB ) {
>>
>>   // Obtaining Chromossome's parameter: length, number & positions of
>> SNPs
>>   double L = *std::max_element(Map.begin(), Map.end());
>>   int Size = Map.size();
>>
>>   // RNG process: draw the number & positions of Crossing Overs
>>   double CrossingNumber = as<double>( rpois(1, L) );
>>   std::vector<double> CrossingPositions = as<std::vector<double> >(
>> runif( CrossingNumber, 0.0, L ) );
>>   std::sort(CrossingPositions.begin(), CrossingPositions.end());
>>
>>   // Building a Gamete "Prototype"
>>   std::vector<bool> Prototype( Size );
>>
>>   // "findInterval" adapted from Wickham's Rcpp Chapter (
>> http://adv-r.had.co.nz/Rcpp.html)
>>   std::vector<double>::iterator Iter, Positions;
>>   std::vector<bool>::iterator ProtoIter;
>>   std::vector<double>::iterator MapBegin = Map.begin();
>>   std::vector<double>::iterator MapEnd = Map.end();
>>   std::vector<bool>::iterator ProtoBegin = Prototype.begin();
>>   std::vector<double>::iterator CrossPosBegin = CrossingPositions.begin();
>>   std::vector<double>::iterator CrossPosEnd = CrossingPositions.end();
>>
>>   for(Iter = MapBegin, ProtoIter = ProtoBegin; Iter != MapEnd; ++Iter,
>> ++ProtoIter) {
>>     Positions = std::upper_bound(CrossPosBegin, CrossPosEnd, *Iter);
>>     *ProtoIter = std::distance(CrossPosBegin, Positions) % 2; // odd or
>> even
>>   }
>>
>>   std::vector<int> Gamete;
>>   Gamete.reserve( Size );
>>   // raffle: which tape shall be the "guide"?
>>   bool v01 = as<bool>( rbinom(1, 1, .5) );
>>
>>   if ( v01 ) {
>>     for ( int i = 0; i < Size; i++ ) {
>>       // Build the Gamete from the Prototype
>>       Gamete.push_back( ((Prototype.at( i )) ? TapeA.at( i ) : TapeB.at(
>> i )) );
>>     }
>>   } else {
>>     for ( int i = 0; i < Size; i++ ) {
>>       Gamete.push_back( ((Prototype.at( i )) ? TapeB.at( i ) : TapeA.at(
>> i )) );
>>     }
>>   }
>>
>>   return Gamete ;
>> }
>>
>> Full Rcpp version:
>>
>> // [[Rcpp::plugins(cpp11)]]
>> #include <algorithm>
>> #include <Rcpp.h>
>>
>> using namespace Rcpp;
>>
>> // [[Rcpp::export]]
>> IntegerVector MeiosisRcpp( NumericVector Map,
>>                            IntegerVector TapeA,
>>                            IntegerVector TapeB ) {
>>
>>   double L = max( Map ); // sugar :)
>>   int Size = Map.size();
>>
>>   double CrossingNumber = as<double>( rpois(1, L) );
>>   NumericVector CrossingPositions = runif( CrossingNumber, 0.0, L );
>>   std::sort(CrossingPositions.begin(), CrossingPositions.end());
>>
>>   LogicalVector Prototype( Size );
>>
>>   NumericVector::iterator Iter, Positions;
>>   LogicalVector::iterator ProtoIter;
>>   NumericVector::iterator MapBegin = Map.begin();
>>   NumericVector::iterator MapEnd = Map.end();
>>   LogicalVector::iterator ProtoBegin = Prototype.begin();
>>   NumericVector::iterator CrossPosBegin = CrossingPositions.begin();
>>   NumericVector::iterator CrossPosEnd = CrossingPositions.end();
>>
>>   for(Iter = MapBegin, ProtoIter = ProtoBegin; Iter != MapEnd; ++Iter,
>> ++ProtoIter) {
>>     Positions = std::upper_bound(CrossPosBegin, CrossPosEnd, *Iter);
>>     *ProtoIter = std::distance(CrossPosBegin, Positions) % 2; // odd or
>> even
>>   }
>>
>>   IntegerVector Gamete;
>>   bool v01 = as<bool>( rbinom(1, 1, .5) );
>>
>>   if ( v01 ) {
>>     for ( int i = 0; i < Size; i++ ) {
>>       Gamete.push_back( ((Prototype[ i ]) ? TapeA[ i ] : TapeB[ i ]) );
>>
>>     }
>>   } else {
>>     for ( int i = 0; i < Size; i++ ) {
>>       Gamete.push_back( ((Prototype[ i ]) ? TapeB[ i ] : TapeA[ i ]) );
>>     }
>>   }
>>
>>   return Gamete ;
>> }
>>
>> If we take the entry as:
>>
>> ## Genome Setup
>> N <- 1E4 # number of gametes
>> nM <- 400 # number of Markers (SNPs)
>> L <-  4 # genome length (in Morgans)
>>
>> ## building the genome & full heterozigous genotype (diploid, 2 DNA tapes)
>> ## map <- seq(from = .1, to = L, length.out = nM) # SNPs positions
>> **regular**
>> map <- sort(runif(nM, 0, L)) # idem **random**
>> tapeA <- rep(1, nM)
>> tapeB <- rep(2, nM)
>>
>> You will found results like that:
>>
>> evaluated expressions:
>>
>> RawR = replicate(n = N, expr = MeiosisR(Map = map, TapeA = tapeA, TapeB =
>> tapeB))
>> cppSTL = replicate(n = N, expr = MeiosisSTL(Map = map, TapeA = tapeA,
>> TapeB = tapeB))
>> Rcpp = replicate(n = N, expr = MeiosisRcpp(Map = map, TapeA = tapeA,
>> TapeB = tapeB))
>>
>> - microbenchmark:
>>
>> Unit: seconds
>>    expr       min        lq      mean   median        uq       max neval
>> cld
>>    RawR 1.2169414 1.2655129 1.2978896 1.286561 1.3055376 1.5020700   100
>> b
>>  cppSTL 0.1721323 0.1773904 0.1867463 0.180910 0.1863087 0.2835556   100
>> a
>>    Rcpp 1.9190921 2.0101288 2.0633424 2.046132 2.1081319 2.2832565
>> 100   c
>>
>> - rbenchmark:
>>
>>     test replications elapsed user.self sys.self
>> 2 cppSTL          100  18.048    18.020    0.020
>> 1   RawR          100 126.264   124.888    1.324
>> 3   Rcpp          100 208.647   208.548    0.012
>>
>> Just to report, my sessionInfo() is as follow:
>>
>> sessionInfo() # Useful Infos **Reproducible Research**
>> R version 3.2.1 (2015-06-18)
>> Platform: x86_64-pc-linux-gnu (64-bit)
>> Running under: Debian GNU/Linux 8 (jessie)
>>
>> locale:
>>  [1] LC_CTYPE=en_US.utf8       LC_NUMERIC=C
>>  [3] LC_TIME=en_US.utf8        LC_COLLATE=en_US.utf8
>>  [5] LC_MONETARY=en_US.utf8    LC_MESSAGES=en_US.utf8
>>  [7] LC_PAPER=en_US.utf8       LC_NAME=C
>>  [9] LC_ADDRESS=C              LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] rbenchmark_1.0.0     microbenchmark_1.4-2 Rcpp_0.11.5
>>
>> loaded via a namespace (and not attached):
>>  [1] MASS_7.3-40      colorspace_1.2-6 scales_0.2.4     compiler_3.2.1
>>  [5] plyr_1.8.2       tools_3.2.1      gtable_0.1.2     reshape2_1.4.1
>>  [9] ggplot2_1.0.1    grid_3.2.1       stringr_0.6.2    digest_0.6.8
>> [13] proto_0.3-10     munsell_0.4.2
>>
>> After that, I wish to request help to evaluate those results. Is "full
>> Rcpp" not a good choice when using iterators? Let me know if I am not
>> taking care about something that may change the things. You may notice that
>> I left all parameters as fixed as possible, then, they should not influence
>> on that.
>>
>> I am open to share the files or give further information.
>>
>> Thank you very much indeed regarding any insight on that.
>>
>> Cheers,
>> FH
>>
>> _______________________________________________
>> Rcpp-devel mailing list
>> Rcpp-devel at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
>>
>
>
> _______________________________________________
> Rcpp-devel mailing list
> Rcpp-devel at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20150811/bf875e2c/attachment-0001.html>


More information about the Rcpp-devel mailing list