[Rcpp-devel] Some beginner questions

Smith, Dale (Norcross) Dale.Smith at Fiserv.com
Mon Nov 25 13:30:12 CET 2013


Alessandro,

If you are somewhat inexperienced with C++, I suggest reading Effective C++ by Scott Meyers. It's easy to get lost in some of his explanations as they are very detailed, but you can just follow his advice, and come back to them later.

Dale Smith, Ph.D.
Senior Financial Quantitative Analyst
Financial & Risk Management Solutions
Fiserv
Office: 678-375-5315
www.fiserv.com

-----Original Message-----
From: rcpp-devel-bounces at r-forge.wu-wien.ac.at [mailto:rcpp-devel-bounces at r-forge.wu-wien.ac.at] On Behalf Of Kevin Ushey
Sent: Sunday, November 24, 2013 2:32 PM
To: Alessandro Mammana
Cc: rcpp-devel at lists.r-forge.r-project.org
Subject: Re: [Rcpp-devel] Some beginner questions

Hi Ale,

My guess: the elements are not being initialized in the order you expect.

In fact, class members in C++ are initialized _in the order they are declared in the class_, not the order you place them in the initializer list. So, based on that, your code tries to first initialize run, then rlen, but rlen depends on rlens[0] which has not yet been initialized, and so things go wrong.

If you turn on compiler warnings (-Wall) you get informative errors.
In fact, clang points right at the error for me ;)

test.cpp:16:3: warning: field 'names' will be initialized after field 'rlen' [-Wreorder]
  names(as<std::vector<std::string> >(values.attr("levels"))),
  ^
test.cpp:17:3: warning: field 'rlen' will be initialized after field 'run' [-Wreorder]
  rlen(rlens[0]), // <--- THIS CAUSES SEGFAULT!!!!
  ^

(sidenote: I highly recommend creating a file '~/.R/Makevars', and inserting the line:

    CFLAGS="-g -O2 -Wall -pedantic"
    CXXFLAGS="-g -O2 -Wall -pedantic"

so that your compiler picks out these code smells for you whenever compiling C/C++ code with R)

As for your other questions re: copying: RObjects are merely thin wrappers over pointers, so copying an RObject does not involve copying all the memory encompassing an R object, just the pointer to that object. Rcpp containers will always wrap to the R object if the R type matches the container type -- e.g., IntegerVectors wrap around R's integer vectors, but force a copy / coercion when you have a numeric R vector. Make sure the type of object you think you're passing from R matches the container you're using in Rcpp -- check what
mode(rle at lengths) gives you.

All of Rcpp's containers are very light, so I doubt you gain much e.g.
passing an Rcpp::IntegerVector by reference rather than by value.

-Kevin

On Sun, Nov 24, 2013 at 10:08 AM, Alessandro Mammana <mammana at molgen.mpg.de> wrote:
> Dear all,
> I had some problems figuring out how to write some code for iterating 
> through the values of a run-length-encoded factor (Rle). Now I kind of 
> made it work, but I am not sure that the codes does exactly what I 
> expect. My questions are both about Rcpp and about C++ , tell me if 
> this is not the right place to ask them.
>
> The function I am writing should iterate through an object of formal 
> class 'Rle' (from the "IRanges" packages), which it's like this:
> 1. It has two slots: 'values' and 'lengths'. They have the same 
> length, values is a factor and lengths is a integer vector.
> 2. values is a factor: an integer vector with an associated character 
> vector (attribute "levels"), and the integer vector points to elements 
> in the character vector.
>
> For instance, the factor f= factor(c('a','a','a','a','b','c','c'))
> when it is run-lenght-encoded rle=Rle(f), it looks like this:
> rle at values ~ c(1, 2, 3)
> attributes(rle at values)$levels ~ c("a","b","c") rle at lengths ~ c(3,1,2)
>
> To make things a bit more complicated, in my situation this Rle object 
> is contained in a GRanges object 'gr': rle = gr at seqnames
>
> I wanted to write the code for a class that encapsulates the iteration 
> through such an object (maybe that's a bit java-style). And that was 
> my first version that compiled:
>
> class rleIter {
>     int run;
>     int rlen;
>     int rpos;
> //should I declare them references if I don't want any unnecessary copying?
>     IntegerVector rlens;
>     IntegerVector values;
>     std::vector<std::string> names;
>     public:
>         rleIter(RObject& rle):
>             rlens(as<IntegerVector>(rle.slot("lengths"))), // is here 
> the vector copied?
>             values(as<IntegerVector>(rle.slot("values"))),
>             names(as<std::vector<std::string> >(values.attr("levels"))),
>             rlen(rlens[0]), // <--- THIS CAUSES SEGFAULT!!!!
>             run(0), rpos(0)
>         {}
>
>         bool next(){
>             ++rpos;
>             if (rpos == rlens[run]){ //end of the run, go to the next
>                 ++run; rpos = 0;
>                 if (run == rlens.length())
>                     return false;
>             }
>             return true;
>         }
>
>         const std::string& getValue(){
>             return names[values[run]-1];
>         }
>
> };
>
>
> void readRle(RObject gr){ //passed in by value (it was a mistake)
>     RObject rle = as<RObject>(gr.slot("seqnames")); //<- is this 
> vector copied here?
>     rleIter iter(rle);
>     bool finished = false;
>     for (; !finished; finished = !iter.next()){
>         Rcout << iter.getValue() << std::endl;
>     }
> }
>
> // [[Rcpp::export]]
> void test(RObject gr){
>     readRle(gr);
> }
>
> in R:
>
> library(GenomicRanges)
> gr <- GRanges(seqnames=c("chr1", "chr1","chr2"),
> ranges=IRanges(start=c(1,10,7),end=c(10,101,74)))
> library(my_package_under_development_with_the_rcpp_code_shown_above)
> test(gr)
>
> SEGFAULT
>
> Questions:
>
> 1. This code gives segfault at the point that I indicated. Why? Maybe 
> I am pointing within the initializer list to areas of memory that are 
> allocated and filled in in the initializer list and maybe this is 
> forbidden?
> 2. If I change the signature of the function readRle and I pass the gr 
> object by reference, the segfault dissappears, why? If I copy the gr 
> object the copy should be identical, why do they have different 
> behaviours?
> 3. I don't understand if doing:
> RObject rle = as<RObject>(gr.slot("seqnames")); causes the vector rle 
> to be copied, and, what is worse, I have no idea about what resources 
> to look up to find it out, or what reasoning/principles to think 
> about, other than posting in this mailing list or attempting to look 
> at the source code for hours...
> 4. If I replace the line above with:
> RObject& rle = as<RObject>(gr.slot("seqnames")); so that I am sure 
> that the vector is not copied, the compiler complains saying that
> as<RObject>(gr.slot("seqnames")) is an rvalue, and if I want to 
> reference it, the reference should be constant. How do I create a 
> non-constant reference to a slot of a s4 object then?
>
> If you made it through the end of this very long and boring email and 
> if you could give me some help I would be extremely grateful.
>
> Ale
>
> --
> Alessandro Mammana, PhD Student
> Max Planck Institute for Molecular Genetics Ihnestraße 63-73
> D-14195 Berlin, Germany
> _______________________________________________
> Rcpp-devel mailing list
> Rcpp-devel at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-deve
> l
_______________________________________________
Rcpp-devel mailing list
Rcpp-devel at lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel


More information about the Rcpp-devel mailing list