[Rcpp-devel] Segfault, is it because of iterators/pointers?
Alessandro Mammana
mammana at molgen.mpg.de
Wed Feb 12 13:36:20 CET 2014
Ah wait, my bad (as always T.T), I found a much simpler explanation:
colset <- sample(3e7-nr, 1e7)
storage.mode(colset)
[1] "integer"
storage.mode(colset-1)
[1] "double"
So when I was unwrapping colset I allocated new memory in Rcpp to
convert from double to integer, which was no longer valid when I went
out of scope.
I think it is a bit dangerous that you never know if you are
allocating memory or just wrapping R objects when parsing arguments in
Rcpp.
Is there a way of ensuring that NOTHING gets copied when parsing
arguments? Can you throw an exception if the type you try to cast to
is not the one you expect?
You might imagine that with large datasets this is important.
Sorry for bothering and thanks again,
Ale
On Wed, Feb 12, 2014 at 1:10 PM, Dirk Eddelbuettel <edd at debian.org> wrote:
>
> On 12 February 2014 at 11:47, Alessandro Mammana wrote:
> | Ok I was able to find the code causing the bug. So it looks like the
>
> Thanks for the added detail.
>
> | pointers you get from an Rcpp::Vector using .begin() become invalid
> | after that the Rcpp::Vector goes out of scope (and this makes sense),
> | what I do not understand is that this Rcpp::Vector was allocated in R
> | and should still be "living" during the execution of the Rcpp call
> | (that's why I wasn't expecting the pointer to be invalid).
> |
> | This is the exact code (the one above is probably fine):
> | @@@@@@@@@@@@@@ in CPP @@@@@@@@@@@@@@i
> |
> | struct GapMat {
> | int* ptr;
> | int* colset;
> | int nrow;
> | int ncol;
> |
> |
> | inline int* colptr(int col){
> | return ptr + colset[col];
> | }
> |
> | GapMat(){}
> |
> | GapMat(int* _ptr, int* _colset, int _nrow, int _ncol):
> | ptr(_ptr), colset(_colset), nrow(_nrow), ncol(_ncol){}
> | };
> |
> |
> | GapMat getGapMat(Rcpp::List gapmat){
> | IntegerVector vec = gapmat["vec"];
> | IntegerVector pos = gapmat["colset"];
> | int nrow = gapmat["nrow"];
> |
> | return GapMat(vec.begin(), pos.begin(), nrow, pos.length());
> | }
> |
> | // [[Rcpp::export]]
> | IntegerVector colSumsGapMat(Rcpp::List gapmat){
> |
> | GapMat mat = getGapMat(gapmat);
> | IntegerVector res(mat.ncol);
> |
> | for (int i = 0; i < mat.ncol; ++i){
> | for (int j = 0; j < mat.nrow; ++j){
> | res[i] += mat.colptr(i)[j];
> | }
> | }
> |
> | return res;
> | }
> |
> | @@@@@@@@@@@@@@ in R (with gdb debugger as suggested by Dirk) @@@@@@@@@@@@@@i
> | library(Rcpp)
> | sourceCpp("scratchpad.cpp")
> |
> | vec <- rnbinom(3e7, mu=0.1, size=1); storage.mode(vec) <- "integer"
> | nr <- 80
> |
> | colset <- sample(3e7-nr, 1e7)
> | foo <- vec[colset] #this is only to trigger some obscure garbage
> | collection mechanisms...
> |
> | for (i in 1:10){
> | colset <- sample(3e7-nr, 1e7)
> | gapmat <- list(vec=vec, nrow=nr, colset=colset-1)
> | cs <- colSumsGapMat(gapmat)
> | print(sum(cs))
> | }
> |
> | [1] 80000000
> | [1] 80000000
> | [1] 80016890
> | [1] 80008144
> | [1] 80016022
> | [1] 80021609
> |
> | Program received signal SIGSEGV, Segmentation fault.
> | 0x00007ffff18a5455 in GapMat::colptr (this=0x7fffffffc120, col=0) at
> | scratchpad.cpp:295
> | 295 return ptr + colset[col];
> |
> | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
> |
> | Why did it happen? What should I do to make sure that my pointers
> | remain valid? My goal is to convert safely some vectors or matrices
> | that "exist" in R to some pointers, how can I do that?
>
> Not sure. It looks fine at first instance. But then it's early in the morning
> and I had very little coffee yet...
>
> Maybe the fact that you tickle the gc() via vec[colset] has something to do
> with it, maybe it has not. Maybe I would try the decomposition of the List
> object inside the colSumsGapMat() function to keep it simpler. Or if you
> _really_ want an external object to iterate over, memcpy it out.
>
> With really large object, you may be stressing parts of the code that have
> not been stressed the same way. If it breaks, you do get to keep both pieces.
>
> Dirk
>
> --
> Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com
--
Alessandro Mammana, PhD Student
Max Planck Institute for Molecular Genetics
Ihnestraße 63-73
D-14195 Berlin, Germany
More information about the Rcpp-devel
mailing list