[Rcpp-devel] Some beginner questions

Kevin Ushey kevinushey at gmail.com
Sun Nov 24 20:32:26 CET 2013


Hi Ale,

My guess: the elements are not being initialized in the order you expect.

In fact, class members in C++ are initialized _in the order they are
declared in the class_, not the order you place them in the
initializer list. So, based on that, your code tries to first
initialize run, then rlen, but rlen depends on rlens[0] which has not
yet been initialized, and so things go wrong.

If you turn on compiler warnings (-Wall) you get informative errors.
In fact, clang points right at the error for me ;)

test.cpp:16:3: warning: field 'names' will be initialized after field
'rlen' [-Wreorder]
  names(as<std::vector<std::string> >(values.attr("levels"))),
  ^
test.cpp:17:3: warning: field 'rlen' will be initialized after field
'run' [-Wreorder]
  rlen(rlens[0]), // <--- THIS CAUSES SEGFAULT!!!!
  ^

(sidenote: I highly recommend creating a file '~/.R/Makevars', and
inserting the line:

    CFLAGS="-g -O2 -Wall -pedantic"
    CXXFLAGS="-g -O2 -Wall -pedantic"

so that your compiler picks out these code smells for you whenever
compiling C/C++ code with R)

As for your other questions re: copying: RObjects are merely thin
wrappers over pointers, so copying an RObject does not involve copying
all the memory encompassing an R object, just the pointer to that
object. Rcpp containers will always wrap to the R object if the R type
matches the container type -- e.g., IntegerVectors wrap around R's
integer vectors, but force a copy / coercion when you have a numeric R
vector. Make sure the type of object you think you're passing from R
matches the container you're using in Rcpp -- check what
mode(rle at lengths) gives you.

All of Rcpp's containers are very light, so I doubt you gain much e.g.
passing an Rcpp::IntegerVector by reference rather than by value.

-Kevin

On Sun, Nov 24, 2013 at 10:08 AM, Alessandro Mammana
<mammana at molgen.mpg.de> wrote:
> Dear all,
> I had some problems figuring out how to write some code for iterating
> through the values of a run-length-encoded factor (Rle). Now I kind of
> made it work, but I am not sure that the codes does exactly what I
> expect. My questions are both about Rcpp and about C++ , tell me if
> this is not the right place to ask them.
>
> The function I am writing should iterate through an object of formal
> class 'Rle' (from the "IRanges" packages), which it's like this:
> 1. It has two slots: 'values' and 'lengths'. They have the same
> length, values is a factor and lengths is a integer vector.
> 2. values is a factor: an integer vector with an associated character
> vector (attribute "levels"), and the integer vector points to elements
> in the character vector.
>
> For instance, the factor f= factor(c('a','a','a','a','b','c','c'))
> when it is run-lenght-encoded rle=Rle(f), it looks like this:
> rle at values ~ c(1, 2, 3)
> attributes(rle at values)$levels ~ c("a","b","c")
> rle at lengths ~ c(3,1,2)
>
> To make things a bit more complicated, in my situation this Rle object
> is contained in a GRanges object 'gr': rle = gr at seqnames
>
> I wanted to write the code for a class that encapsulates the iteration
> through such an object (maybe that's a bit java-style). And that was
> my first version that compiled:
>
> class rleIter {
>     int run;
>     int rlen;
>     int rpos;
> //should I declare them references if I don't want any unnecessary copying?
>     IntegerVector rlens;
>     IntegerVector values;
>     std::vector<std::string> names;
>     public:
>         rleIter(RObject& rle):
>             rlens(as<IntegerVector>(rle.slot("lengths"))), // is here
> the vector copied?
>             values(as<IntegerVector>(rle.slot("values"))),
>             names(as<std::vector<std::string> >(values.attr("levels"))),
>             rlen(rlens[0]), // <--- THIS CAUSES SEGFAULT!!!!
>             run(0), rpos(0)
>         {}
>
>         bool next(){
>             ++rpos;
>             if (rpos == rlens[run]){ //end of the run, go to the next
>                 ++run; rpos = 0;
>                 if (run == rlens.length())
>                     return false;
>             }
>             return true;
>         }
>
>         const std::string& getValue(){
>             return names[values[run]-1];
>         }
>
> };
>
>
> void readRle(RObject gr){ //passed in by value (it was a mistake)
>     RObject rle = as<RObject>(gr.slot("seqnames")); //<- is this
> vector copied here?
>     rleIter iter(rle);
>     bool finished = false;
>     for (; !finished; finished = !iter.next()){
>         Rcout << iter.getValue() << std::endl;
>     }
> }
>
> // [[Rcpp::export]]
> void test(RObject gr){
>     readRle(gr);
> }
>
> in R:
>
> library(GenomicRanges)
> gr <- GRanges(seqnames=c("chr1", "chr1","chr2"),
> ranges=IRanges(start=c(1,10,7),end=c(10,101,74)))
> library(my_package_under_development_with_the_rcpp_code_shown_above)
> test(gr)
>
> SEGFAULT
>
> Questions:
>
> 1. This code gives segfault at the point that I indicated. Why? Maybe
> I am pointing within the initializer list to areas of memory that are
> allocated and filled in in the initializer list and maybe this is
> forbidden?
> 2. If I change the signature of the function readRle and I pass the gr
> object by reference, the segfault dissappears, why? If I copy the gr
> object the copy should be identical, why do they have different
> behaviours?
> 3. I don't understand if doing:
> RObject rle = as<RObject>(gr.slot("seqnames"));
> causes the vector rle to be copied, and, what is worse, I have no idea
> about what resources to look up to find it out, or what
> reasoning/principles to think about, other than posting in this
> mailing list or attempting to look at the source code for hours...
> 4. If I replace the line above with:
> RObject& rle = as<RObject>(gr.slot("seqnames"));
> so that I am sure that the vector is not copied, the compiler
> complains saying that
> as<RObject>(gr.slot("seqnames")) is an rvalue, and if I want to
> reference it, the reference should be constant. How do I create a
> non-constant reference to a slot of a s4 object then?
>
> If you made it through the end of this very long and boring email and
> if you could give me some help I would be extremely grateful.
>
> Ale
>
> --
> Alessandro Mammana, PhD Student
> Max Planck Institute for Molecular Genetics
> Ihnestraße 63-73
> D-14195 Berlin, Germany
> _______________________________________________
> Rcpp-devel mailing list
> Rcpp-devel at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel


More information about the Rcpp-devel mailing list