<div dir="ltr"><br><div class="gmail_quote">---------- Forwarded message ----------<br>From: <b class="gmail_sendername">FHRB Toledo</b> <span dir="ltr"><<a href="mailto:fernandohtoledo@gmail.com">fernandohtoledo@gmail.com</a>></span><br>Date: 11 August 2015 at 12:07<br>Subject: Re: [Rcpp-devel] Is Rcpp:: not a good choice for iterators?<br>To: Hao Ye <<a href="mailto:hye@ucsd.edu">hye@ucsd.edu</a>><br><br><br><div dir="ltr">Hi Hao Ye,<br><br>Great, , you are right! I tried your suggestion through:<br><br>NumericVector Gamete = no_init( Size );<br><br>And then change the loop such as:<span class=""><br><br> for ( int i = 0; i < Size; i++ ) {<br></span> Gamete[ i ] = ( ((Prototype[ i ]) ? TapeA[ i ] : TapeB[ i ]) ); <br> }<br><br>With those, the results seem much more reasonable:<span class=""><br><br> expr min lq mean median uq max neval cld<br></span> RawR 1.2258215 1.2904011 1.3287819 1.3133870 1.3466908 1.6158531 100 c<br> cppSTL 0.1741156 0.1791741 0.1933450 0.1835020 0.2083610 0.2628995 100 a<br> Rcpp 0.2019144 0.2086193 0.2217021 0.2154814 0.2234401 0.4010162 100 b<br><br>and as rebuttal:<span class=""><br><br> test replications elapsed user.self sys.self<br></span>2 cppSTL 100 18.788 18.700 0.080<br>3 Rcpp 100 21.451 21.444 0.000<br>1 RawR 100 133.003 131.604 1.344<br><br>Consistently, both C++ versions are faster than raw R.<br><br>Thank you very much indeed.<br><br>P.S.: As I mentioned, I also will try the Eddelbuette's comment and asap report the benchmark for complementarity over the thread.<br><br>Cheers,<br>FH</div><div class="gmail_extra"><br><div class="gmail_quote"><span class="">On 11 August 2015 at 11:16, Hao Ye <span dir="ltr"><<a href="mailto:hye@ucsd.edu" target="_blank">hye@ucsd.edu</a>></span> wrote:<br></span><div><div class="h5"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">It also looks like you are not initializing the size of Gamete in the "full Rcpp" version, so it is resizing on every loop iteration...<div class="gmail_extra"><br></div><div class="gmail_extra">Best,<br clear="all"><div><div><div>--</div>Hao Ye<br><a href="mailto:hye@ucsd.edu" target="_blank">hye@ucsd.edu</a></div></div><div><div>
<br><div class="gmail_quote">On Tue, Aug 11, 2015 at 12:06 PM, Tim Keitt <span dir="ltr"><<a href="mailto:tkeitt@utexas.edu" target="_blank">tkeitt@utexas.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">1) Figure out where you are making full copies of your data structures (casting NumericVector to std::vector<double>)<div>2) Use eg R::rbinom instead of rbinom if you do not want the return as an Rcpp Vector type.</div><div>3) Move the outer loop (your replicate call) into C++</div><div><br></div><div>The copying of data is what is slowing things down most likely.</div><div><br></div><div>T.<br><div class="gmail_extra"><br clear="all"><div><div><div dir="ltr"><a href="http://www.keittlab.org/" target="_blank">http://www.keittlab.org/</a></div></div></div>
<br><div class="gmail_quote">On Tue, Aug 11, 2015 at 10:13 AM, FHRB Toledo <span dir="ltr"><<a href="mailto:fernandohtoledo@gmail.com" target="_blank">fernandohtoledo@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Dear All,<br><br>It is my first post here at Rcpp-devel List. I'm passionate with the awesome work and with all perspectives that it can opens for speed up codes and embedding of C++ in R package seamless. Thus, I wish to thank all developers.<br><br>After, some reading over the Book, online vignettes and so on. I translated one of my R scripts using two approaches: taking all advantages of Rcpp; and using just the "seamless" integration, i.e., casting SEXP objects, working only with std:: objects and wrapping the return.<br><br>So far so good, however, by my surprise I found huge difference regarding the performance between both implementations. If it's not enough, even the raw R code shown better performance compared to the "full Rcpp".<br><br>I made a short example to show you the point. First of all let me explain the process for who do not have any idea about meiosis: We take a hypothetical genome (that gives us the SNPs positions, vector Map); then based on this length we draw a Poisson value (where length is lambda), with this value (e.g., n) we draw n uniform within the genome length. With this, after some manipulations (findInterval % 2) we know where Crossing Over (recombination) happened. Then we can take this prototype and generate a gamete of some entry genotype (DNA tapes A and B).<br><br>The codes are:<br><br>Raw R script:<br><br>## Meiosis process through Poisson/Uniform<br>MeiosisR <- function( Map, TapeA, TapeB )<br> {<br> ## Number of Crossing Overs **Poisson Draw**<br> nCrossOver <- rpois(n = 1, lambda = ceiling(max(Map)))<br> ## Assigning Crossing Overs on the Genome **Uniform Draw**<br> posCrossOver <- sort(runif(n = nCrossOver, min = 0, max = max(Map)))<br> ## Building the "Prototype" **mixture between the 2 haplotypes**<br> Prototype <- findInterval(x = Map, vec = posCrossOver) %% 2<br> ## Raffle for the reference haplotype<br> if ( as.logical( rbinom(n = 1, size = 1, prob = .5) ) )<br> {<br> ## Building the gamete<br> Gamete <- ifelse(test = as.logical(Prototype),<br> yes = TapeA,<br> no = TapeB)<br> } else {<br> ## idem with other "guide"<br> Gamete <- ifelse(test = as.logical(Prototype),<br> yes = TapeB,<br> no = TapeA)<br> }<br> return( Gamete ) <br> }<br><br>STL cpp version:<br><br>// [[Rcpp::plugins(cpp11)]]<br>#include <algorithm><br>#include <Rcpp.h><br><br>using namespace Rcpp;<br><br>// [[Rcpp::export]]<br>std::vector<int> MeiosisSTL( std::vector<double> Map, <br> std::vector<int> TapeA, <br> std::vector<int> TapeB ) {<br><br> // Obtaining Chromossome's parameter: length, number & positions of SNPs <br> double L = *std::max_element(Map.begin(), Map.end());<br> int Size = Map.size();<br><br> // RNG process: draw the number & positions of Crossing Overs<br> double CrossingNumber = as<double>( rpois(1, L) );<br> std::vector<double> CrossingPositions = as<std::vector<double> >( runif( CrossingNumber, 0.0, L ) );<br> std::sort(CrossingPositions.begin(), CrossingPositions.end());<br><br> // Building a Gamete "Prototype"<br> std::vector<bool> Prototype( Size );<br> <br> // "findInterval" adapted from Wickham's Rcpp Chapter (<a href="http://adv-r.had.co.nz/Rcpp.html" target="_blank">http://adv-r.had.co.nz/Rcpp.html</a>)<br> std::vector<double>::iterator Iter, Positions;<br> std::vector<bool>::iterator ProtoIter;<br> std::vector<double>::iterator MapBegin = Map.begin();<br> std::vector<double>::iterator MapEnd = Map.end();<br> std::vector<bool>::iterator ProtoBegin = Prototype.begin();<br> std::vector<double>::iterator CrossPosBegin = CrossingPositions.begin();<br> std::vector<double>::iterator CrossPosEnd = CrossingPositions.end();<br><br> for(Iter = MapBegin, ProtoIter = ProtoBegin; Iter != MapEnd; ++Iter, ++ProtoIter) {<br> Positions = std::upper_bound(CrossPosBegin, CrossPosEnd, *Iter);<br> *ProtoIter = std::distance(CrossPosBegin, Positions) % 2; // odd or even<br> }<br><br> std::vector<int> Gamete;<br> Gamete.reserve( Size );<br> // raffle: which tape shall be the "guide"?<br> bool v01 = as<bool>( rbinom(1, 1, .5) );<br><br> if ( v01 ) {<br> for ( int i = 0; i < Size; i++ ) { <br> // Build the Gamete from the Prototype<br> Gamete.push_back( ((Prototype.at( i )) ? TapeA.at( i ) : TapeB.at( i )) ); <br> }<br> } else {<br> for ( int i = 0; i < Size; i++ ) {<br> Gamete.push_back( ((Prototype.at( i )) ? TapeB.at( i ) : TapeA.at( i )) );<br> }<br> }<br> <br> return Gamete ;<br>}<br><br>Full Rcpp version:<br><br>// [[Rcpp::plugins(cpp11)]]<br>#include <algorithm><br>#include <Rcpp.h><br><br>using namespace Rcpp;<br><br>// [[Rcpp::export]]<br>IntegerVector MeiosisRcpp( NumericVector Map, <br> IntegerVector TapeA, <br> IntegerVector TapeB ) {<br><br> double L = max( Map ); // sugar :)<br> int Size = Map.size();<br><br> double CrossingNumber = as<double>( rpois(1, L) );<br> NumericVector CrossingPositions = runif( CrossingNumber, 0.0, L );<br> std::sort(CrossingPositions.begin(), CrossingPositions.end());<br><br> LogicalVector Prototype( Size );<br><br> NumericVector::iterator Iter, Positions;<br> LogicalVector::iterator ProtoIter;<br> NumericVector::iterator MapBegin = Map.begin();<br> NumericVector::iterator MapEnd = Map.end();<br> LogicalVector::iterator ProtoBegin = Prototype.begin();<br> NumericVector::iterator CrossPosBegin = CrossingPositions.begin();<br> NumericVector::iterator CrossPosEnd = CrossingPositions.end();<br><br> for(Iter = MapBegin, ProtoIter = ProtoBegin; Iter != MapEnd; ++Iter, ++ProtoIter) {<br> Positions = std::upper_bound(CrossPosBegin, CrossPosEnd, *Iter);<br> *ProtoIter = std::distance(CrossPosBegin, Positions) % 2; // odd or even<br> }<br><br> IntegerVector Gamete;<br> bool v01 = as<bool>( rbinom(1, 1, .5) );<br><br> if ( v01 ) {<br> for ( int i = 0; i < Size; i++ ) { <br> Gamete.push_back( ((Prototype[ i ]) ? TapeA[ i ] : TapeB[ i ]) ); <br> }<br> } else {<br> for ( int i = 0; i < Size; i++ ) {<br> Gamete.push_back( ((Prototype[ i ]) ? TapeB[ i ] : TapeA[ i ]) );<br> }<br> }<br> <br> return Gamete ;<br>}<br><br>If we take the entry as:<br><br>## Genome Setup<br>N <- 1E4 # number of gametes<br>nM <- 400 # number of Markers (SNPs)<br>L <- 4 # genome length (in Morgans)<br><br>## building the genome & full heterozigous genotype (diploid, 2 DNA tapes)<br>## map <- seq(from = .1, to = L, length.out = nM) # SNPs positions **regular**<br>map <- sort(runif(nM, 0, L)) # idem **random**<br>tapeA <- rep(1, nM)<br>tapeB <- rep(2, nM)<br><br>You will found results like that:<br><br>evaluated expressions: <br><br>RawR = replicate(n = N, expr = MeiosisR(Map = map, TapeA = tapeA, TapeB = tapeB))<br>cppSTL = replicate(n = N, expr = MeiosisSTL(Map = map, TapeA = tapeA, TapeB = tapeB))<br>Rcpp = replicate(n = N, expr = MeiosisRcpp(Map = map, TapeA = tapeA, TapeB = tapeB))<br><br>- microbenchmark:<br><br>Unit: seconds<br> expr min lq mean median uq max neval cld<br> RawR 1.2169414 1.2655129 1.2978896 1.286561 1.3055376 1.5020700 100 b <br> cppSTL 0.1721323 0.1773904 0.1867463 0.180910 0.1863087 0.2835556 100 a <br> Rcpp 1.9190921 2.0101288 2.0633424 2.046132 2.1081319 2.2832565 100 c<br><br>- rbenchmark:<br><br> test replications elapsed user.self sys.self<br>2 cppSTL 100 18.048 18.020 0.020<br>1 RawR 100 126.264 124.888 1.324<br>3 Rcpp 100 208.647 208.548 0.012<br><br>Just to report, my sessionInfo() is as follow:<br><br>sessionInfo() # Useful Infos **Reproducible Research**<br>R version 3.2.1 (2015-06-18)<br>Platform: x86_64-pc-linux-gnu (64-bit)<br>Running under: Debian GNU/Linux 8 (jessie)<br><br>locale:<br> [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C <br> [3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8 <br> [5] LC_MONETARY=en_US.utf8 LC_MESSAGES=en_US.utf8 <br> [7] LC_PAPER=en_US.utf8 LC_NAME=C <br> [9] LC_ADDRESS=C LC_TELEPHONE=C <br>[11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C <br><br>attached base packages:<br>[1] stats graphics grDevices utils datasets methods base <br><br>other attached packages:<br>[1] rbenchmark_1.0.0 microbenchmark_1.4-2 Rcpp_0.11.5 <br><br>loaded via a namespace (and not attached):<br> [1] MASS_7.3-40 colorspace_1.2-6 scales_0.2.4 compiler_3.2.1 <br> [5] plyr_1.8.2 tools_3.2.1 gtable_0.1.2 reshape2_1.4.1 <br> [9] ggplot2_1.0.1 grid_3.2.1 stringr_0.6.2 digest_0.6.8 <br>[13] proto_0.3-10 munsell_0.4.2<br><br>After that, I wish to request help to evaluate those results. Is "full Rcpp" not a good choice when using iterators? Let me know if I am not taking care about something that may change the things. You may notice that I left all parameters as fixed as possible, then, they should not influence on that.<br><br>I am open to share the files or give further information.<br><br>Thank you very much indeed regarding any insight on that.<br><br>Cheers,<br>FH<br></div>
<br>_______________________________________________<br>
Rcpp-devel mailing list<br>
<a href="mailto:Rcpp-devel@lists.r-forge.r-project.org" target="_blank">Rcpp-devel@lists.r-forge.r-project.org</a><br>
<a href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel" rel="noreferrer" target="_blank">https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel</a><br></blockquote></div><br></div></div></div>
<br>_______________________________________________<br>
Rcpp-devel mailing list<br>
<a href="mailto:Rcpp-devel@lists.r-forge.r-project.org" target="_blank">Rcpp-devel@lists.r-forge.r-project.org</a><br>
<a href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel" rel="noreferrer" target="_blank">https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel</a><br></blockquote></div><br></div></div></div></div>
</blockquote></div></div></div><br></div>
</div><br></div>