[Rcpp-devel] Some beginner questions
Alessandro Mammana
mammana at molgen.mpg.de
Sun Nov 24 19:08:59 CET 2013
Dear all,
I had some problems figuring out how to write some code for iterating
through the values of a run-length-encoded factor (Rle). Now I kind of
made it work, but I am not sure that the codes does exactly what I
expect. My questions are both about Rcpp and about C++ , tell me if
this is not the right place to ask them.
The function I am writing should iterate through an object of formal
class 'Rle' (from the "IRanges" packages), which it's like this:
1. It has two slots: 'values' and 'lengths'. They have the same
length, values is a factor and lengths is a integer vector.
2. values is a factor: an integer vector with an associated character
vector (attribute "levels"), and the integer vector points to elements
in the character vector.
For instance, the factor f= factor(c('a','a','a','a','b','c','c'))
when it is run-lenght-encoded rle=Rle(f), it looks like this:
rle at values ~ c(1, 2, 3)
attributes(rle at values)$levels ~ c("a","b","c")
rle at lengths ~ c(3,1,2)
To make things a bit more complicated, in my situation this Rle object
is contained in a GRanges object 'gr': rle = gr at seqnames
I wanted to write the code for a class that encapsulates the iteration
through such an object (maybe that's a bit java-style). And that was
my first version that compiled:
class rleIter {
int run;
int rlen;
int rpos;
//should I declare them references if I don't want any unnecessary copying?
IntegerVector rlens;
IntegerVector values;
std::vector<std::string> names;
public:
rleIter(RObject& rle):
rlens(as<IntegerVector>(rle.slot("lengths"))), // is here
the vector copied?
values(as<IntegerVector>(rle.slot("values"))),
names(as<std::vector<std::string> >(values.attr("levels"))),
rlen(rlens[0]), // <--- THIS CAUSES SEGFAULT!!!!
run(0), rpos(0)
{}
bool next(){
++rpos;
if (rpos == rlens[run]){ //end of the run, go to the next
++run; rpos = 0;
if (run == rlens.length())
return false;
}
return true;
}
const std::string& getValue(){
return names[values[run]-1];
}
};
void readRle(RObject gr){ //passed in by value (it was a mistake)
RObject rle = as<RObject>(gr.slot("seqnames")); //<- is this
vector copied here?
rleIter iter(rle);
bool finished = false;
for (; !finished; finished = !iter.next()){
Rcout << iter.getValue() << std::endl;
}
}
// [[Rcpp::export]]
void test(RObject gr){
readRle(gr);
}
in R:
library(GenomicRanges)
gr <- GRanges(seqnames=c("chr1", "chr1","chr2"),
ranges=IRanges(start=c(1,10,7),end=c(10,101,74)))
library(my_package_under_development_with_the_rcpp_code_shown_above)
test(gr)
SEGFAULT
Questions:
1. This code gives segfault at the point that I indicated. Why? Maybe
I am pointing within the initializer list to areas of memory that are
allocated and filled in in the initializer list and maybe this is
forbidden?
2. If I change the signature of the function readRle and I pass the gr
object by reference, the segfault dissappears, why? If I copy the gr
object the copy should be identical, why do they have different
behaviours?
3. I don't understand if doing:
RObject rle = as<RObject>(gr.slot("seqnames"));
causes the vector rle to be copied, and, what is worse, I have no idea
about what resources to look up to find it out, or what
reasoning/principles to think about, other than posting in this
mailing list or attempting to look at the source code for hours...
4. If I replace the line above with:
RObject& rle = as<RObject>(gr.slot("seqnames"));
so that I am sure that the vector is not copied, the compiler
complains saying that
as<RObject>(gr.slot("seqnames")) is an rvalue, and if I want to
reference it, the reference should be constant. How do I create a
non-constant reference to a slot of a s4 object then?
If you made it through the end of this very long and boring email and
if you could give me some help I would be extremely grateful.
Ale
--
Alessandro Mammana, PhD Student
Max Planck Institute for Molecular Genetics
Ihnestraße 63-73
D-14195 Berlin, Germany
More information about the Rcpp-devel
mailing list