[Rcpp-devel] When calling same Rcpp function several times different results are returned

JJ Allaire jj.allaire at gmail.com
Wed Jul 15 15:20:53 CEST 2015


It's because several instances of your object will be created (via the
splitting constructor) and then joined together using the join method.
If all instances share an output variable then they will overwrite
each others' output, giving them their own allows the split/join logic
to work correctly.


On Wed, Jul 15, 2015 at 8:37 AM, Vaidas Zemlys <mpiktas at gmail.com> wrote:
> Hi,
>
> I’ve finally managed to produce code that works. I hope. The key was to
> leave output variable out of constructor:
>
> #include <Rcpp.h>
> using namespace Rcpp;
> // [[Rcpp::depends(RcppParallel)]]
> #include <RcppParallel.h>
>
> using namespace RcppParallel;
> struct SumsInGroups5: public Worker
> {
>     const RVector<double> v;
>     const RVector<int> g;
>
>     std::vector<double> s;
>
>     SumsInGroups5(const NumericVector v, const IntegerVector g): v(v), g(g),
> s(*std::max_element(g.begin(), g.end()) + 1, 0.0){ }
>
>     SumsInGroups5(const SumsInGroups5& p, Split): v(p.v), g(p.g),
> s(*std::max_element(g.begin(), g.end()) + 1, 0.0) {}
>
>     void operator()(std::size_t begin, std::size_t end) {
>         for (std::size_t i = begin; i < end; ++i) {
>             s[g[i]]+=v[i];
>         }
>
>     }
>
>     void join(const SumsInGroups5& rhs) {
>         for(std::size_t i = 0; i < s.size(); i++) {
>             s[i] += rhs.s[i];
>         }
>     }
> };
>
> // [[Rcpp::export]]
> NumericVector sg5(NumericVector v, IntegerVector g) {
>     SumsInGroups5 p(v, g);
>     parallelReduce(0, v.length(), p);
>     return wrap(p.s);
> }
>
> /*** R
> a <- 1:10
> g <- c(rep(0,5),rep(1,5))
>
>
> bb <- lapply(1:10000,function(x)sg5(a,g))
> cc<-do.call("rbind",bb)
> table(cc[,1])
> table(cc[,2])
> */
>
> Looking at the example
> http://gallery.rcpp.org/articles/parallel-distance-matrix/, the ouput is
> initialized in constructor and everything works. So if anybody can explain
> why in this case leaving out the output from the constructor made the code
> work I would be all ears.
>
> Vaidotas Zemlys
>
>
> Le mar. 14 juil. 2015 à 15:22, Danas Zuokas <danas.zuokas at gmail.com> a écrit
> :
>>
>> I have tried this with no luck.
>>
>> 2015-07-14 14:34 GMT+03:00 Romain Francois <romain at r-enthusiasts.com>:
>>>
>>> Or just use a std::vector<double> for sg. That should do it.
>>>
>>> Romain
>>>
>>> Envoyé de mon iPhone
>>>
>>> > Le 14 juil. 2015 à 13:21, JJ Allaire <jj.allaire at gmail.com> a écrit :
>>> >
>>> > The examples in the RcppParallel documentation assume that access to
>>> > vectors and matrixes are *aligned* (i.e. fall into neat buckets
>>> > whereby reading and writing doesn't overlap between worker instances).
>>> > Your example appears to access arbitrary elements of sg (depending on
>>> > what's passed in gi) which probably creates overlapping reads/writes.
>>> > You should also study the documentation for join carefully.
>>> >
>>> > There's nothing incorrect about RcppParallel's behavior here, rather
>>> > you need to think more carefully about the access patterns of your
>>> > data and how they might conflict. You may need to introduce locking to
>>> > overcome the conflicts, which in turn could kill the performance
>>> > benefit you gain from parallelism. No easy answers here :-\
>>> >
>>> >> On Tue, Jul 14, 2015 at 7:15 AM, Danas Zuokas <danas.zuokas at gmail.com>
>>> >> wrote:
>>> >> Yes it is the same question on SO and I did consider RHertel's
>>> >> comments.
>>> >> But this problem (sums by group id) is not parallelFor it is
>>> >> parallelReduce:
>>> >> I split vector, calculate sums and then aggregate those sums.
>>> >> Please correct me if I am wrong.
>>> >>
>>> >> 2015-07-14 13:54 GMT+03:00 Dirk Eddelbuettel <edd at debian.org>:
>>> >>>
>>> >>>
>>> >>> On 14 July 2015 at 09:25, Danas Zuokas wrote:
>>> >>> | I have written parallel implementation of sums in groups using
>>> >>> RcppParallel.
>>> >>>
>>> >>> Isn't this the same question as
>>> >>>
>>> >>>
>>> >>> http://stackoverflow.com/questions/31318419/when-calling-same-rcpp-function-several-times-different-results-are-returned
>>> >>>
>>> >>> You got some excellent comments there by SO user 'RHertel'. Did you
>>> >>> consider those?
>>> >>>
>>> >>> Dirk
>>> >>>
>>> >>> | // [[Rcpp::depends(RcppParallel)]]
>>> >>> | #include <Rcpp.h>
>>> >>> | #include <RcppParallel.h>
>>> >>> | using namespace Rcpp;
>>> >>> | using namespace RcppParallel;
>>> >>> |
>>> >>> | struct SumsG: public Worker
>>> >>> | {
>>> >>> |   const RVector<double> v;
>>> >>> |   const RVector<int> gi;
>>> >>> |
>>> >>> |   RVector<double> sg;
>>> >>> |
>>> >>> |   SumsG(const NumericVector v, const IntegerVector gi,
>>> >>> NumericVector
>>> >>> sg): v(v), gi(gi), sg(sg) {}
>>> >>> |   SumsG(const SumsG& p, Split): v(p.v), gi(p.gi), sg(p.sg) {}
>>> >>> |
>>> >>> |   void operator()(std::size_t begin, std::size_t end) {
>>> >>> |     for (std::size_t i = begin; i < end; i++) {
>>> >>> |       sg[gi[i]] += v[i];
>>> >>> |     }
>>> >>> |   }
>>> >>> |
>>> >>> |   void join(const SumsG& p) {
>>> >>> |     for(std::size_t i = 0; i < sg.length(); i++) {
>>> >>> |       sg[i] += p.sg[i];
>>> >>> |     }
>>> >>> |   }
>>> >>> | };
>>> >>> |
>>> >>> | // [[Rcpp::export]]
>>> >>> | List sumsingroups(NumericVector v, IntegerVector gi, int ni) {
>>> >>> |   NumericVector sg(ni);
>>> >>> |   SumsG p(v, gi, sg);
>>> >>> |   parallelReduce(0, v.length(), p);
>>> >>> |
>>> >>> |   return List::create(_["sg"] = p.sg);
>>> >>> | }
>>> >>> |
>>> >>> | It compiles using Rcpp::sourceCpp. Now when I call it from R
>>> >>> sumsingroups(1:10,
>>> >>> | rep(0:1, each = 5), 2) several times I get the right answer (15 40)
>>> >>> and
>>> >>> then
>>> >>> | something different (usually some multiplicative of the right
>>> >>> answer).
>>> >>> Running
>>> >>> |
>>> >>> |
>>> >>> | res <- sumsingroups(1:10, rep(0:1, each = 5), 2)
>>> >>> | for(i in 1:1000) {
>>> >>> |     tmp <- sumsingroups(1:10, rep(0:1, each = 5), 2)
>>> >>> |     if(res[[1]][1] != tmp[[1]][1]) break
>>> >>> |     Sys.sleep(0.1)
>>> >>> | }
>>> >>> |
>>> >>> | breaks at random iteration returning
>>> >>> |
>>> >>> | $sg
>>> >>> | [1]  60 160
>>> >>> |
>>> >>> | or
>>> >>> |
>>> >>> | $sg
>>> >>> | [1] 30 80
>>> >>> |
>>> >>> | I am new to Rcpp and RcppParallel and do not know what could cause
>>> >>> such
>>> >>> | behavior.
>>> >>> |
>>> >>> | Things that did not help:
>>> >>> |
>>> >>> |  1. Added for (std::size_t i = 0; i < sg.length(); i++) sg[i] = 0;
>>> >>> to
>>> >>> both of
>>> >>> |     constructors.
>>> >>> |  2. Changed names so that they are different in Worker definition
>>> >>> and in
>>> >>> |     function implementation.
>>> >>> |
>>> >>> | _______________________________________________
>>> >>> | Rcpp-devel mailing list
>>> >>> | Rcpp-devel at lists.r-forge.r-project.org
>>> >>> |
>>> >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
>>> >>>
>>> >>> --
>>> >>> http://dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org
>>> >>
>>> >>
>>> >>
>>> >> _______________________________________________
>>> >> Rcpp-devel mailing list
>>> >> Rcpp-devel at lists.r-forge.r-project.org
>>> >>
>>> >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
>>> > _______________________________________________
>>> > Rcpp-devel mailing list
>>> > Rcpp-devel at lists.r-forge.r-project.org
>>> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
>>
>>
>> _______________________________________________
>> Rcpp-devel mailing list
>> Rcpp-devel at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
>
>
> _______________________________________________
> Rcpp-devel mailing list
> Rcpp-devel at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel


More information about the Rcpp-devel mailing list