[Rcpp-devel] Is Rcpp:: not a good choice for iterators?
Hao Ye
hye at ucsd.edu
Tue Aug 11 18:16:59 CEST 2015
It also looks like you are not initializing the size of Gamete in the "full
Rcpp" version, so it is resizing on every loop iteration...
Best,
--
Hao Ye
hye at ucsd.edu
On Tue, Aug 11, 2015 at 12:06 PM, Tim Keitt <tkeitt at utexas.edu> wrote:
> 1) Figure out where you are making full copies of your data structures
> (casting NumericVector to std::vector<double>)
> 2) Use eg R::rbinom instead of rbinom if you do not want the return as an
> Rcpp Vector type.
> 3) Move the outer loop (your replicate call) into C++
>
> The copying of data is what is slowing things down most likely.
>
> T.
>
> http://www.keittlab.org/
>
> On Tue, Aug 11, 2015 at 10:13 AM, FHRB Toledo <fernandohtoledo at gmail.com>
> wrote:
>
>> Dear All,
>>
>> It is my first post here at Rcpp-devel List. I'm passionate with the
>> awesome work and with all perspectives that it can opens for speed up codes
>> and embedding of C++ in R package seamless. Thus, I wish to thank all
>> developers.
>>
>> After, some reading over the Book, online vignettes and so on. I
>> translated one of my R scripts using two approaches: taking all advantages
>> of Rcpp; and using just the "seamless" integration, i.e., casting SEXP
>> objects, working only with std:: objects and wrapping the return.
>>
>> So far so good, however, by my surprise I found huge difference regarding
>> the performance between both implementations. If it's not enough, even the
>> raw R code shown better performance compared to the "full Rcpp".
>>
>> I made a short example to show you the point. First of all let me explain
>> the process for who do not have any idea about meiosis: We take a
>> hypothetical genome (that gives us the SNPs positions, vector Map); then
>> based on this length we draw a Poisson value (where length is lambda), with
>> this value (e.g., n) we draw n uniform within the genome length. With this,
>> after some manipulations (findInterval % 2) we know where Crossing Over
>> (recombination) happened. Then we can take this prototype and generate a
>> gamete of some entry genotype (DNA tapes A and B).
>>
>> The codes are:
>>
>> Raw R script:
>>
>> ## Meiosis process through Poisson/Uniform
>> MeiosisR <- function( Map, TapeA, TapeB )
>> {
>> ## Number of Crossing Overs **Poisson Draw**
>> nCrossOver <- rpois(n = 1, lambda = ceiling(max(Map)))
>> ## Assigning Crossing Overs on the Genome **Uniform Draw**
>> posCrossOver <- sort(runif(n = nCrossOver, min = 0, max =
>> max(Map)))
>> ## Building the "Prototype" **mixture between the 2 haplotypes**
>> Prototype <- findInterval(x = Map, vec = posCrossOver) %% 2
>> ## Raffle for the reference haplotype
>> if ( as.logical( rbinom(n = 1, size = 1, prob = .5) ) )
>> {
>> ## Building the gamete
>> Gamete <- ifelse(test = as.logical(Prototype),
>> yes = TapeA,
>> no = TapeB)
>> } else {
>> ## idem with other "guide"
>> Gamete <- ifelse(test = as.logical(Prototype),
>> yes = TapeB,
>> no = TapeA)
>> }
>> return( Gamete )
>> }
>>
>> STL cpp version:
>>
>> // [[Rcpp::plugins(cpp11)]]
>> #include <algorithm>
>> #include <Rcpp.h>
>>
>> using namespace Rcpp;
>>
>> // [[Rcpp::export]]
>> std::vector<int> MeiosisSTL( std::vector<double> Map,
>> std::vector<int> TapeA,
>> std::vector<int> TapeB ) {
>>
>> // Obtaining Chromossome's parameter: length, number & positions of
>> SNPs
>> double L = *std::max_element(Map.begin(), Map.end());
>> int Size = Map.size();
>>
>> // RNG process: draw the number & positions of Crossing Overs
>> double CrossingNumber = as<double>( rpois(1, L) );
>> std::vector<double> CrossingPositions = as<std::vector<double> >(
>> runif( CrossingNumber, 0.0, L ) );
>> std::sort(CrossingPositions.begin(), CrossingPositions.end());
>>
>> // Building a Gamete "Prototype"
>> std::vector<bool> Prototype( Size );
>>
>> // "findInterval" adapted from Wickham's Rcpp Chapter (
>> http://adv-r.had.co.nz/Rcpp.html)
>> std::vector<double>::iterator Iter, Positions;
>> std::vector<bool>::iterator ProtoIter;
>> std::vector<double>::iterator MapBegin = Map.begin();
>> std::vector<double>::iterator MapEnd = Map.end();
>> std::vector<bool>::iterator ProtoBegin = Prototype.begin();
>> std::vector<double>::iterator CrossPosBegin = CrossingPositions.begin();
>> std::vector<double>::iterator CrossPosEnd = CrossingPositions.end();
>>
>> for(Iter = MapBegin, ProtoIter = ProtoBegin; Iter != MapEnd; ++Iter,
>> ++ProtoIter) {
>> Positions = std::upper_bound(CrossPosBegin, CrossPosEnd, *Iter);
>> *ProtoIter = std::distance(CrossPosBegin, Positions) % 2; // odd or
>> even
>> }
>>
>> std::vector<int> Gamete;
>> Gamete.reserve( Size );
>> // raffle: which tape shall be the "guide"?
>> bool v01 = as<bool>( rbinom(1, 1, .5) );
>>
>> if ( v01 ) {
>> for ( int i = 0; i < Size; i++ ) {
>> // Build the Gamete from the Prototype
>> Gamete.push_back( ((Prototype.at( i )) ? TapeA.at( i ) : TapeB.at(
>> i )) );
>> }
>> } else {
>> for ( int i = 0; i < Size; i++ ) {
>> Gamete.push_back( ((Prototype.at( i )) ? TapeB.at( i ) : TapeA.at(
>> i )) );
>> }
>> }
>>
>> return Gamete ;
>> }
>>
>> Full Rcpp version:
>>
>> // [[Rcpp::plugins(cpp11)]]
>> #include <algorithm>
>> #include <Rcpp.h>
>>
>> using namespace Rcpp;
>>
>> // [[Rcpp::export]]
>> IntegerVector MeiosisRcpp( NumericVector Map,
>> IntegerVector TapeA,
>> IntegerVector TapeB ) {
>>
>> double L = max( Map ); // sugar :)
>> int Size = Map.size();
>>
>> double CrossingNumber = as<double>( rpois(1, L) );
>> NumericVector CrossingPositions = runif( CrossingNumber, 0.0, L );
>> std::sort(CrossingPositions.begin(), CrossingPositions.end());
>>
>> LogicalVector Prototype( Size );
>>
>> NumericVector::iterator Iter, Positions;
>> LogicalVector::iterator ProtoIter;
>> NumericVector::iterator MapBegin = Map.begin();
>> NumericVector::iterator MapEnd = Map.end();
>> LogicalVector::iterator ProtoBegin = Prototype.begin();
>> NumericVector::iterator CrossPosBegin = CrossingPositions.begin();
>> NumericVector::iterator CrossPosEnd = CrossingPositions.end();
>>
>> for(Iter = MapBegin, ProtoIter = ProtoBegin; Iter != MapEnd; ++Iter,
>> ++ProtoIter) {
>> Positions = std::upper_bound(CrossPosBegin, CrossPosEnd, *Iter);
>> *ProtoIter = std::distance(CrossPosBegin, Positions) % 2; // odd or
>> even
>> }
>>
>> IntegerVector Gamete;
>> bool v01 = as<bool>( rbinom(1, 1, .5) );
>>
>> if ( v01 ) {
>> for ( int i = 0; i < Size; i++ ) {
>> Gamete.push_back( ((Prototype[ i ]) ? TapeA[ i ] : TapeB[ i ]) );
>>
>> }
>> } else {
>> for ( int i = 0; i < Size; i++ ) {
>> Gamete.push_back( ((Prototype[ i ]) ? TapeB[ i ] : TapeA[ i ]) );
>> }
>> }
>>
>> return Gamete ;
>> }
>>
>> If we take the entry as:
>>
>> ## Genome Setup
>> N <- 1E4 # number of gametes
>> nM <- 400 # number of Markers (SNPs)
>> L <- 4 # genome length (in Morgans)
>>
>> ## building the genome & full heterozigous genotype (diploid, 2 DNA tapes)
>> ## map <- seq(from = .1, to = L, length.out = nM) # SNPs positions
>> **regular**
>> map <- sort(runif(nM, 0, L)) # idem **random**
>> tapeA <- rep(1, nM)
>> tapeB <- rep(2, nM)
>>
>> You will found results like that:
>>
>> evaluated expressions:
>>
>> RawR = replicate(n = N, expr = MeiosisR(Map = map, TapeA = tapeA, TapeB =
>> tapeB))
>> cppSTL = replicate(n = N, expr = MeiosisSTL(Map = map, TapeA = tapeA,
>> TapeB = tapeB))
>> Rcpp = replicate(n = N, expr = MeiosisRcpp(Map = map, TapeA = tapeA,
>> TapeB = tapeB))
>>
>> - microbenchmark:
>>
>> Unit: seconds
>> expr min lq mean median uq max neval
>> cld
>> RawR 1.2169414 1.2655129 1.2978896 1.286561 1.3055376 1.5020700 100
>> b
>> cppSTL 0.1721323 0.1773904 0.1867463 0.180910 0.1863087 0.2835556 100
>> a
>> Rcpp 1.9190921 2.0101288 2.0633424 2.046132 2.1081319 2.2832565
>> 100 c
>>
>> - rbenchmark:
>>
>> test replications elapsed user.self sys.self
>> 2 cppSTL 100 18.048 18.020 0.020
>> 1 RawR 100 126.264 124.888 1.324
>> 3 Rcpp 100 208.647 208.548 0.012
>>
>> Just to report, my sessionInfo() is as follow:
>>
>> sessionInfo() # Useful Infos **Reproducible Research**
>> R version 3.2.1 (2015-06-18)
>> Platform: x86_64-pc-linux-gnu (64-bit)
>> Running under: Debian GNU/Linux 8 (jessie)
>>
>> locale:
>> [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C
>> [3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8
>> [5] LC_MONETARY=en_US.utf8 LC_MESSAGES=en_US.utf8
>> [7] LC_PAPER=en_US.utf8 LC_NAME=C
>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats graphics grDevices utils datasets methods base
>>
>> other attached packages:
>> [1] rbenchmark_1.0.0 microbenchmark_1.4-2 Rcpp_0.11.5
>>
>> loaded via a namespace (and not attached):
>> [1] MASS_7.3-40 colorspace_1.2-6 scales_0.2.4 compiler_3.2.1
>> [5] plyr_1.8.2 tools_3.2.1 gtable_0.1.2 reshape2_1.4.1
>> [9] ggplot2_1.0.1 grid_3.2.1 stringr_0.6.2 digest_0.6.8
>> [13] proto_0.3-10 munsell_0.4.2
>>
>> After that, I wish to request help to evaluate those results. Is "full
>> Rcpp" not a good choice when using iterators? Let me know if I am not
>> taking care about something that may change the things. You may notice that
>> I left all parameters as fixed as possible, then, they should not influence
>> on that.
>>
>> I am open to share the files or give further information.
>>
>> Thank you very much indeed regarding any insight on that.
>>
>> Cheers,
>> FH
>>
>> _______________________________________________
>> Rcpp-devel mailing list
>> Rcpp-devel at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
>>
>
>
> _______________________________________________
> Rcpp-devel mailing list
> Rcpp-devel at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20150811/bf875e2c/attachment-0001.html>
More information about the Rcpp-devel
mailing list