[Rcpp-devel] Some beginner questions

Alessandro Mammana mammana at molgen.mpg.de
Sun Nov 24 19:08:59 CET 2013


Dear all,
I had some problems figuring out how to write some code for iterating
through the values of a run-length-encoded factor (Rle). Now I kind of
made it work, but I am not sure that the codes does exactly what I
expect. My questions are both about Rcpp and about C++ , tell me if
this is not the right place to ask them.

The function I am writing should iterate through an object of formal
class 'Rle' (from the "IRanges" packages), which it's like this:
1. It has two slots: 'values' and 'lengths'. They have the same
length, values is a factor and lengths is a integer vector.
2. values is a factor: an integer vector with an associated character
vector (attribute "levels"), and the integer vector points to elements
in the character vector.

For instance, the factor f= factor(c('a','a','a','a','b','c','c'))
when it is run-lenght-encoded rle=Rle(f), it looks like this:
rle at values ~ c(1, 2, 3)
attributes(rle at values)$levels ~ c("a","b","c")
rle at lengths ~ c(3,1,2)

To make things a bit more complicated, in my situation this Rle object
is contained in a GRanges object 'gr': rle = gr at seqnames

I wanted to write the code for a class that encapsulates the iteration
through such an object (maybe that's a bit java-style). And that was
my first version that compiled:

class rleIter {
    int run;
    int rlen;
    int rpos;
//should I declare them references if I don't want any unnecessary copying?
    IntegerVector rlens;
    IntegerVector values;
    std::vector<std::string> names;
    public:
        rleIter(RObject& rle):
            rlens(as<IntegerVector>(rle.slot("lengths"))), // is here
the vector copied?
            values(as<IntegerVector>(rle.slot("values"))),
            names(as<std::vector<std::string> >(values.attr("levels"))),
            rlen(rlens[0]), // <--- THIS CAUSES SEGFAULT!!!!
            run(0), rpos(0)
        {}

        bool next(){
            ++rpos;
            if (rpos == rlens[run]){ //end of the run, go to the next
                ++run; rpos = 0;
                if (run == rlens.length())
                    return false;
            }
            return true;
        }

        const std::string& getValue(){
            return names[values[run]-1];
        }

};


void readRle(RObject gr){ //passed in by value (it was a mistake)
    RObject rle = as<RObject>(gr.slot("seqnames")); //<- is this
vector copied here?
    rleIter iter(rle);
    bool finished = false;
    for (; !finished; finished = !iter.next()){
        Rcout << iter.getValue() << std::endl;
    }
}

// [[Rcpp::export]]
void test(RObject gr){
    readRle(gr);
}

in R:

library(GenomicRanges)
gr <- GRanges(seqnames=c("chr1", "chr1","chr2"),
ranges=IRanges(start=c(1,10,7),end=c(10,101,74)))
library(my_package_under_development_with_the_rcpp_code_shown_above)
test(gr)

SEGFAULT

Questions:

1. This code gives segfault at the point that I indicated. Why? Maybe
I am pointing within the initializer list to areas of memory that are
allocated and filled in in the initializer list and maybe this is
forbidden?
2. If I change the signature of the function readRle and I pass the gr
object by reference, the segfault dissappears, why? If I copy the gr
object the copy should be identical, why do they have different
behaviours?
3. I don't understand if doing:
RObject rle = as<RObject>(gr.slot("seqnames"));
causes the vector rle to be copied, and, what is worse, I have no idea
about what resources to look up to find it out, or what
reasoning/principles to think about, other than posting in this
mailing list or attempting to look at the source code for hours...
4. If I replace the line above with:
RObject& rle = as<RObject>(gr.slot("seqnames"));
so that I am sure that the vector is not copied, the compiler
complains saying that
as<RObject>(gr.slot("seqnames")) is an rvalue, and if I want to
reference it, the reference should be constant. How do I create a
non-constant reference to a slot of a s4 object then?

If you made it through the end of this very long and boring email and
if you could give me some help I would be extremely grateful.

Ale

-- 
Alessandro Mammana, PhD Student
Max Planck Institute for Molecular Genetics
Ihnestraße 63-73
D-14195 Berlin, Germany


More information about the Rcpp-devel mailing list