[Rcpp-devel] Fwd: Is Rcpp:: not a good choice for iterators?
FHRB Toledo
fernandohtoledo at gmail.com
Tue Aug 11 20:47:55 CEST 2015
---------- Forwarded message ----------
From: FHRB Toledo <fernandohtoledo at gmail.com>
Date: 11 August 2015 at 12:07
Subject: Re: [Rcpp-devel] Is Rcpp:: not a good choice for iterators?
To: Hao Ye <hye at ucsd.edu>
Hi Hao Ye,
Great, , you are right! I tried your suggestion through:
NumericVector Gamete = no_init( Size );
And then change the loop such as:
for ( int i = 0; i < Size; i++ ) {
Gamete[ i ] = ( ((Prototype[ i ]) ? TapeA[ i ] : TapeB[ i ])
);
}
With those, the results seem much more reasonable:
expr min lq mean median uq max neval cld
RawR 1.2258215 1.2904011 1.3287819 1.3133870 1.3466908 1.6158531 100
c
cppSTL 0.1741156 0.1791741 0.1933450 0.1835020 0.2083610 0.2628995 100 a
Rcpp 0.2019144 0.2086193 0.2217021 0.2154814 0.2234401 0.4010162 100 b
and as rebuttal:
test replications elapsed user.self sys.self
2 cppSTL 100 18.788 18.700 0.080
3 Rcpp 100 21.451 21.444 0.000
1 RawR 100 133.003 131.604 1.344
Consistently, both C++ versions are faster than raw R.
Thank you very much indeed.
P.S.: As I mentioned, I also will try the Eddelbuette's comment and asap
report the benchmark for complementarity over the thread.
Cheers,
FH
On 11 August 2015 at 11:16, Hao Ye <hye at ucsd.edu> wrote:
> It also looks like you are not initializing the size of Gamete in the
> "full Rcpp" version, so it is resizing on every loop iteration...
>
> Best,
> --
> Hao Ye
> hye at ucsd.edu
>
> On Tue, Aug 11, 2015 at 12:06 PM, Tim Keitt <tkeitt at utexas.edu> wrote:
>
>> 1) Figure out where you are making full copies of your data structures
>> (casting NumericVector to std::vector<double>)
>> 2) Use eg R::rbinom instead of rbinom if you do not want the return as an
>> Rcpp Vector type.
>> 3) Move the outer loop (your replicate call) into C++
>>
>> The copying of data is what is slowing things down most likely.
>>
>> T.
>>
>> http://www.keittlab.org/
>>
>> On Tue, Aug 11, 2015 at 10:13 AM, FHRB Toledo <fernandohtoledo at gmail.com>
>> wrote:
>>
>>> Dear All,
>>>
>>> It is my first post here at Rcpp-devel List. I'm passionate with the
>>> awesome work and with all perspectives that it can opens for speed up codes
>>> and embedding of C++ in R package seamless. Thus, I wish to thank all
>>> developers.
>>>
>>> After, some reading over the Book, online vignettes and so on. I
>>> translated one of my R scripts using two approaches: taking all advantages
>>> of Rcpp; and using just the "seamless" integration, i.e., casting SEXP
>>> objects, working only with std:: objects and wrapping the return.
>>>
>>> So far so good, however, by my surprise I found huge difference
>>> regarding the performance between both implementations. If it's not enough,
>>> even the raw R code shown better performance compared to the "full Rcpp".
>>>
>>> I made a short example to show you the point. First of all let me
>>> explain the process for who do not have any idea about meiosis: We take a
>>> hypothetical genome (that gives us the SNPs positions, vector Map); then
>>> based on this length we draw a Poisson value (where length is lambda), with
>>> this value (e.g., n) we draw n uniform within the genome length. With this,
>>> after some manipulations (findInterval % 2) we know where Crossing Over
>>> (recombination) happened. Then we can take this prototype and generate a
>>> gamete of some entry genotype (DNA tapes A and B).
>>>
>>> The codes are:
>>>
>>> Raw R script:
>>>
>>> ## Meiosis process through Poisson/Uniform
>>> MeiosisR <- function( Map, TapeA, TapeB )
>>> {
>>> ## Number of Crossing Overs **Poisson Draw**
>>> nCrossOver <- rpois(n = 1, lambda = ceiling(max(Map)))
>>> ## Assigning Crossing Overs on the Genome **Uniform Draw**
>>> posCrossOver <- sort(runif(n = nCrossOver, min = 0, max =
>>> max(Map)))
>>> ## Building the "Prototype" **mixture between the 2 haplotypes**
>>> Prototype <- findInterval(x = Map, vec = posCrossOver) %% 2
>>> ## Raffle for the reference haplotype
>>> if ( as.logical( rbinom(n = 1, size = 1, prob = .5) ) )
>>> {
>>> ## Building the gamete
>>> Gamete <- ifelse(test = as.logical(Prototype),
>>> yes = TapeA,
>>> no = TapeB)
>>> } else {
>>> ## idem with other "guide"
>>> Gamete <- ifelse(test = as.logical(Prototype),
>>> yes = TapeB,
>>> no = TapeA)
>>> }
>>> return( Gamete )
>>> }
>>>
>>> STL cpp version:
>>>
>>> // [[Rcpp::plugins(cpp11)]]
>>> #include <algorithm>
>>> #include <Rcpp.h>
>>>
>>> using namespace Rcpp;
>>>
>>> // [[Rcpp::export]]
>>> std::vector<int> MeiosisSTL( std::vector<double> Map,
>>> std::vector<int> TapeA,
>>> std::vector<int> TapeB ) {
>>>
>>> // Obtaining Chromossome's parameter: length, number & positions of
>>> SNPs
>>> double L = *std::max_element(Map.begin(), Map.end());
>>> int Size = Map.size();
>>>
>>> // RNG process: draw the number & positions of Crossing Overs
>>> double CrossingNumber = as<double>( rpois(1, L) );
>>> std::vector<double> CrossingPositions = as<std::vector<double> >(
>>> runif( CrossingNumber, 0.0, L ) );
>>> std::sort(CrossingPositions.begin(), CrossingPositions.end());
>>>
>>> // Building a Gamete "Prototype"
>>> std::vector<bool> Prototype( Size );
>>>
>>> // "findInterval" adapted from Wickham's Rcpp Chapter (
>>> http://adv-r.had.co.nz/Rcpp.html)
>>> std::vector<double>::iterator Iter, Positions;
>>> std::vector<bool>::iterator ProtoIter;
>>> std::vector<double>::iterator MapBegin = Map.begin();
>>> std::vector<double>::iterator MapEnd = Map.end();
>>> std::vector<bool>::iterator ProtoBegin = Prototype.begin();
>>> std::vector<double>::iterator CrossPosBegin =
>>> CrossingPositions.begin();
>>> std::vector<double>::iterator CrossPosEnd = CrossingPositions.end();
>>>
>>> for(Iter = MapBegin, ProtoIter = ProtoBegin; Iter != MapEnd; ++Iter,
>>> ++ProtoIter) {
>>> Positions = std::upper_bound(CrossPosBegin, CrossPosEnd, *Iter);
>>> *ProtoIter = std::distance(CrossPosBegin, Positions) % 2; // odd or
>>> even
>>> }
>>>
>>> std::vector<int> Gamete;
>>> Gamete.reserve( Size );
>>> // raffle: which tape shall be the "guide"?
>>> bool v01 = as<bool>( rbinom(1, 1, .5) );
>>>
>>> if ( v01 ) {
>>> for ( int i = 0; i < Size; i++ ) {
>>> // Build the Gamete from the Prototype
>>> Gamete.push_back( ((Prototype.at( i )) ? TapeA.at( i ) : TapeB.at(
>>> i )) );
>>> }
>>> } else {
>>> for ( int i = 0; i < Size; i++ ) {
>>> Gamete.push_back( ((Prototype.at( i )) ? TapeB.at( i ) : TapeA.at(
>>> i )) );
>>> }
>>> }
>>>
>>> return Gamete ;
>>> }
>>>
>>> Full Rcpp version:
>>>
>>> // [[Rcpp::plugins(cpp11)]]
>>> #include <algorithm>
>>> #include <Rcpp.h>
>>>
>>> using namespace Rcpp;
>>>
>>> // [[Rcpp::export]]
>>> IntegerVector MeiosisRcpp( NumericVector Map,
>>> IntegerVector TapeA,
>>> IntegerVector TapeB ) {
>>>
>>> double L = max( Map ); // sugar :)
>>> int Size = Map.size();
>>>
>>> double CrossingNumber = as<double>( rpois(1, L) );
>>> NumericVector CrossingPositions = runif( CrossingNumber, 0.0, L );
>>> std::sort(CrossingPositions.begin(), CrossingPositions.end());
>>>
>>> LogicalVector Prototype( Size );
>>>
>>> NumericVector::iterator Iter, Positions;
>>> LogicalVector::iterator ProtoIter;
>>> NumericVector::iterator MapBegin = Map.begin();
>>> NumericVector::iterator MapEnd = Map.end();
>>> LogicalVector::iterator ProtoBegin = Prototype.begin();
>>> NumericVector::iterator CrossPosBegin = CrossingPositions.begin();
>>> NumericVector::iterator CrossPosEnd = CrossingPositions.end();
>>>
>>> for(Iter = MapBegin, ProtoIter = ProtoBegin; Iter != MapEnd; ++Iter,
>>> ++ProtoIter) {
>>> Positions = std::upper_bound(CrossPosBegin, CrossPosEnd, *Iter);
>>> *ProtoIter = std::distance(CrossPosBegin, Positions) % 2; // odd or
>>> even
>>> }
>>>
>>> IntegerVector Gamete;
>>> bool v01 = as<bool>( rbinom(1, 1, .5) );
>>>
>>> if ( v01 ) {
>>> for ( int i = 0; i < Size; i++ ) {
>>> Gamete.push_back( ((Prototype[ i ]) ? TapeA[ i ] : TapeB[ i ]) );
>>>
>>> }
>>> } else {
>>> for ( int i = 0; i < Size; i++ ) {
>>> Gamete.push_back( ((Prototype[ i ]) ? TapeB[ i ] : TapeA[ i ]) );
>>> }
>>> }
>>>
>>> return Gamete ;
>>> }
>>>
>>> If we take the entry as:
>>>
>>> ## Genome Setup
>>> N <- 1E4 # number of gametes
>>> nM <- 400 # number of Markers (SNPs)
>>> L <- 4 # genome length (in Morgans)
>>>
>>> ## building the genome & full heterozigous genotype (diploid, 2 DNA
>>> tapes)
>>> ## map <- seq(from = .1, to = L, length.out = nM) # SNPs positions
>>> **regular**
>>> map <- sort(runif(nM, 0, L)) # idem **random**
>>> tapeA <- rep(1, nM)
>>> tapeB <- rep(2, nM)
>>>
>>> You will found results like that:
>>>
>>> evaluated expressions:
>>>
>>> RawR = replicate(n = N, expr = MeiosisR(Map = map, TapeA = tapeA, TapeB
>>> = tapeB))
>>> cppSTL = replicate(n = N, expr = MeiosisSTL(Map = map, TapeA = tapeA,
>>> TapeB = tapeB))
>>> Rcpp = replicate(n = N, expr = MeiosisRcpp(Map = map, TapeA = tapeA,
>>> TapeB = tapeB))
>>>
>>> - microbenchmark:
>>>
>>> Unit: seconds
>>> expr min lq mean median uq max neval
>>> cld
>>> RawR 1.2169414 1.2655129 1.2978896 1.286561 1.3055376 1.5020700
>>> 100 b
>>> cppSTL 0.1721323 0.1773904 0.1867463 0.180910 0.1863087 0.2835556 100
>>> a
>>> Rcpp 1.9190921 2.0101288 2.0633424 2.046132 2.1081319 2.2832565
>>> 100 c
>>>
>>> - rbenchmark:
>>>
>>> test replications elapsed user.self sys.self
>>> 2 cppSTL 100 18.048 18.020 0.020
>>> 1 RawR 100 126.264 124.888 1.324
>>> 3 Rcpp 100 208.647 208.548 0.012
>>>
>>> Just to report, my sessionInfo() is as follow:
>>>
>>> sessionInfo() # Useful Infos **Reproducible Research**
>>> R version 3.2.1 (2015-06-18)
>>> Platform: x86_64-pc-linux-gnu (64-bit)
>>> Running under: Debian GNU/Linux 8 (jessie)
>>>
>>> locale:
>>> [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C
>>> [3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8
>>> [5] LC_MONETARY=en_US.utf8 LC_MESSAGES=en_US.utf8
>>> [7] LC_PAPER=en_US.utf8 LC_NAME=C
>>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>>> [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] stats graphics grDevices utils datasets methods base
>>>
>>> other attached packages:
>>> [1] rbenchmark_1.0.0 microbenchmark_1.4-2 Rcpp_0.11.5
>>>
>>> loaded via a namespace (and not attached):
>>> [1] MASS_7.3-40 colorspace_1.2-6 scales_0.2.4 compiler_3.2.1
>>> [5] plyr_1.8.2 tools_3.2.1 gtable_0.1.2 reshape2_1.4.1
>>> [9] ggplot2_1.0.1 grid_3.2.1 stringr_0.6.2 digest_0.6.8
>>> [13] proto_0.3-10 munsell_0.4.2
>>>
>>> After that, I wish to request help to evaluate those results. Is "full
>>> Rcpp" not a good choice when using iterators? Let me know if I am not
>>> taking care about something that may change the things. You may notice that
>>> I left all parameters as fixed as possible, then, they should not influence
>>> on that.
>>>
>>> I am open to share the files or give further information.
>>>
>>> Thank you very much indeed regarding any insight on that.
>>>
>>> Cheers,
>>> FH
>>>
>>> _______________________________________________
>>> Rcpp-devel mailing list
>>> Rcpp-devel at lists.r-forge.r-project.org
>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
>>>
>>
>>
>> _______________________________________________
>> Rcpp-devel mailing list
>> Rcpp-devel at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20150811/0b3f84de/attachment-0001.html>
More information about the Rcpp-devel
mailing list