[Rcpp-devel] Segfault, is it because of iterators/pointers?

Alessandro Mammana mammana at molgen.mpg.de
Wed Feb 12 16:19:35 CET 2014


I like the "Strict  mode" idea, I will use it, thanks!

On Wed, Feb 12, 2014 at 2:34 PM, Romain Francois
<romain at r-enthusiasts.com> wrote:
>
> Le 12 févr. 2014 à 13:36, Alessandro Mammana <mammana at molgen.mpg.de> a écrit :
>
>> Ah wait, my bad (as always T.T), I found a much simpler explanation:
>>
>> colset <- sample(3e7-nr, 1e7)
>> storage.mode(colset)
>> [1] "integer"
>> storage.mode(colset-1)
>> [1] "double"
>>
>> So when I was unwrapping colset I allocated new memory in Rcpp to
>> convert from double to integer, which was no longer valid when I went
>> out of scope.
>> I think it is a bit dangerous that you never know if you are
>> allocating memory or just wrapping R objects when parsing arguments in
>> Rcpp.
>> Is there a way of ensuring that NOTHING gets copied when parsing
>> arguments? Can you throw an exception if the type you try to cast to
>> is not the one you expect?
>> You might imagine that with large datasets this is important.
>
> Silent coercion was added by design. Rcpp does not give you a << strict >> mode.
>
> One thing you can do is something like this:
>
> #include <Rcpp.h>
> using namespace Rcpp ;
>
> template <typename T>
> class Strict : public T {
> public:
>   Strict( SEXP x ) {
>     if( TYPEOF(x) != T::r_type::value )
>       stop( "not compatible" ) ;
>     T::Storage::set__(x) ;
>   }
>
> } ;
>
> // [[Rcpp::export]]
> int foo( Strict<NumericVector> v ){
>   return v.size() ;
> }
>
> You'd get e.g.
>
>> foo(rnorm(10))
> [1] 10
>
>> foo(1:10)
> Error in eval(expr, envir, enclos) : not compatible
> Calls: sourceCpp ... source -> withVisible -> eval -> eval -> foo -> <Anonymous>
> Execution halted
>
>
>
>> Sorry for bothering and thanks again,
>> Ale
>>
>>
>> On Wed, Feb 12, 2014 at 1:10 PM, Dirk Eddelbuettel <edd at debian.org> wrote:
>>>
>>> On 12 February 2014 at 11:47, Alessandro Mammana wrote:
>>> | Ok I was able to find the code causing the bug. So it looks like the
>>>
>>> Thanks for the added detail.
>>>
>>> | pointers you get from an Rcpp::Vector using .begin() become invalid
>>> | after that the Rcpp::Vector goes out of scope (and this makes sense),
>>> | what I do not understand is that this Rcpp::Vector was allocated in R
>>> | and should still be "living" during the execution of the Rcpp call
>>> | (that's why I wasn't expecting the pointer to be invalid).
>>> |
>>> | This is the exact code (the one above is probably fine):
>>> | @@@@@@@@@@@@@@ in CPP @@@@@@@@@@@@@@i
>>> |
>>> | struct GapMat {
>>> |     int* ptr;
>>> |     int* colset;
>>> |     int nrow;
>>> |     int ncol;
>>> |
>>> |
>>> |     inline int* colptr(int col){
>>> |         return ptr + colset[col];
>>> |     }
>>> |
>>> |     GapMat(){}
>>> |
>>> |     GapMat(int* _ptr, int* _colset, int _nrow, int _ncol):
>>> |         ptr(_ptr), colset(_colset), nrow(_nrow), ncol(_ncol){}
>>> | };
>>> |
>>> |
>>> | GapMat getGapMat(Rcpp::List gapmat){
>>> |     IntegerVector vec = gapmat["vec"];
>>> |     IntegerVector pos = gapmat["colset"];
>>> |     int nrow = gapmat["nrow"];
>>> |
>>> |     return GapMat(vec.begin(), pos.begin(), nrow, pos.length());
>>> | }
>>> |
>>> | // [[Rcpp::export]]
>>> | IntegerVector colSumsGapMat(Rcpp::List gapmat){
>>> |
>>> |     GapMat mat = getGapMat(gapmat);
>>> |     IntegerVector res(mat.ncol);
>>> |
>>> |     for (int i = 0; i < mat.ncol; ++i){
>>> |         for (int j = 0; j < mat.nrow; ++j){
>>> |             res[i] += mat.colptr(i)[j];
>>> |         }
>>> |     }
>>> |
>>> |     return res;
>>> | }
>>> |
>>> | @@@@@@@@@@@@@@ in R (with gdb debugger as suggested by Dirk) @@@@@@@@@@@@@@i
>>> | library(Rcpp)
>>> | sourceCpp("scratchpad.cpp")
>>> |
>>> | vec <- rnbinom(3e7, mu=0.1, size=1); storage.mode(vec) <- "integer"
>>> | nr <- 80
>>> |
>>> | colset <- sample(3e7-nr, 1e7)
>>> | foo <- vec[colset] #this is only to trigger some obscure garbage
>>> | collection mechanisms...
>>> |
>>> | for (i in 1:10){
>>> |     colset <- sample(3e7-nr, 1e7)
>>> |     gapmat <- list(vec=vec, nrow=nr, colset=colset-1)
>>> |     cs <- colSumsGapMat(gapmat)
>>> |     print(sum(cs))
>>> | }
>>> |
>>> | [1] 80000000
>>> | [1] 80000000
>>> | [1] 80016890
>>> | [1] 80008144
>>> | [1] 80016022
>>> | [1] 80021609
>>> |
>>> | Program received signal SIGSEGV, Segmentation fault.
>>> | 0x00007ffff18a5455 in GapMat::colptr (this=0x7fffffffc120, col=0) at
>>> | scratchpad.cpp:295
>>> | 295            return ptr + colset[col];
>>> |
>>> | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
>>> |
>>> | Why did it happen? What should I do to make sure that my pointers
>>> | remain valid? My goal is to convert safely some vectors or matrices
>>> | that "exist" in R to some pointers, how can I do that?
>>>
>>> Not sure. It looks fine at first instance. But then it's early in the morning
>>> and I had very little coffee yet...
>>>
>>> Maybe the fact that you tickle the gc() via vec[colset] has something to do
>>> with it, maybe it has not.  Maybe I would try the decomposition of the List
>>> object inside the colSumsGapMat() function to keep it simpler.  Or if you
>>> _really_ want an external object to iterate over, memcpy it out.
>>>
>>> With really large object, you may be stressing parts of the code that have
>>> not been stressed the same way.  If it breaks, you do get to keep both pieces.
>>>
>>> Dirk
>>>
>>> --
>>> Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com
>>
>>
>>
>> --
>> Alessandro Mammana, PhD Student
>> Max Planck Institute for Molecular Genetics
>> Ihnestraße 63-73
>> D-14195 Berlin, Germany
>> _______________________________________________
>> Rcpp-devel mailing list
>> Rcpp-devel at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
>



-- 
Alessandro Mammana, PhD Student
Max Planck Institute for Molecular Genetics
Ihnestraße 63-73
D-14195 Berlin, Germany


More information about the Rcpp-devel mailing list