[Rcpp-devel] Possible regression in R-3.2.3 or Rcpp 0.12.3

Kevin Ushey kevinushey at gmail.com
Sat Jan 30 22:44:40 CET 2016


I think I know what's going on now. Effectively, we're 'evaluating'
the sugar proxy vector, getting a pointer to a (temporary,
unprotected) R data structure, and then storing that pointer. Later,
when we attempt to allocate the output vector (with `no_init`) the GC
runs and cleans up our pointer.

See:

    https://github.com/RcppCore/Rcpp/blob/master/inst/include/Rcpp/vector/Subsetter.h#L147
    https://github.com/RcppCore/Rcpp/blob/master/inst/include/Rcpp/vector/Subsetter.h#L197

The 'best' fix here is to only re-use the indices for an actual
IntegerVector, and generate the indices from sugar proxies, but I'm
not exactly sure how to best handle that. It's also possible that if
we just allocated the output vector earlier (ie, before getting a
pointer to our subset proxy), that we'd avoid the R gc cleaning up our
vector.

FWIW, we should be able to reproduce this reliably when running under
`gctorture()`...

Kevin

On Sat, Jan 30, 2016 at 7:07 AM, Qiang Kou <qkou at umail.iu.edu> wrote:
> Another thing confused me is the segfault only happens when the
> IntegerVector is significantly long.
>
> Best,
>
> KK
>
> On Sat, Jan 30, 2016 at 2:57 AM, Kevin Ushey <kevinushey at gmail.com> wrote:
>>
>> Hmm, I have one thought: we try to re-use the indices from an
>> IntegerVector, but the type here is actually a sugar type:
>>
>>     Rcpp::SubsetProxy<13, Rcpp::PreserveStorage, 13, true,
>>     Rcpp::sugar::Minus_Vector_Primitive<13, true, Rcpp::Vector<13,
>>     Rcpp::PreserveStorage> > >::get_vec (this=<optimized out>)
>>
>> Ie, we access the indices with `INTEGER(x)`, but perhaps those indices
>> have not actually been properly materialized when we attempt to
>> perform the subset?
>>
>> But then, a simple test case with `x[y - 1]` with `x` and `y` both
>> being integer vectors seems to work just fine. So I am a bit confused.
>>
>> On Fri, Jan 29, 2016 at 3:16 PM, Dirk Eddelbuettel <edd at debian.org> wrote:
>> >
>> > On 29 January 2016 at 17:55, Qiang Kou wrote:
>> > | Hi, Paul, can you try my fork of Rcpp? You can install it by the line
>> > below:
>> > |
>> > | devtools::install_github("thirdwing/Rcpp", ref = "subsetter")
>> > |
>> > | This fixed the segfault on my Ubuntu machine.
>> >
>> > Yay. Nice work!
>> >
>> > | The difference can be found from [1].
>> >
>> > Nice and concise.
>> >
>> > | In subsetter, if an IntegerVector passed in, we will try to reuse it.
>> > This led
>> > | to a segfault in this case, which I don't know why.
>> > |
>> > | Dirk and Kevin, do you have any thoughts on it?
>> >
>> > Not really, but happy to give this the full reverse-dependency check
>> > treatment so that we can merge it.
>> >
>> > Dirk
>> >
>> >
>> > | Best wishes,
>> > |
>> > | KK
>> > |
>> > | [1] https://github.com/thirdwing/Rcpp/commit/
>> > | 216c5220bcb84778a656b3496d0f1803b973ef61
>> > |
>> > |
>> > | On Fri, Jan 29, 2016 at 3:00 PM, Qiang Kou <qkou at umail.iu.edu> wrote:
>> > |
>> > |
>> > |     Hi, Kevin, I was also trying to track this down yesterday.
>> > |
>> > |     From the debugging info below, indices_n is not equal to length of
>> > indices,
>> > |     which I don't quite understand.
>> > |
>> > |     Program received signal SIGSEGV, Segmentation fault.
>> > |
>> > |     0x00007ffff2ed5c4e in Rcpp::SubsetProxy<13, Rcpp::PreserveStorage,
>> > 13,
>> > |     true, Rcpp::sugar::Minus_Vector_Primitive<13, true,
>> > Rcpp::Vector<13,
>> > |     Rcpp::PreserveStorage> > >::get_vec
>> > (this=this at entry=0x7fffffff79a0)
>> > |
>> > |         at /usr/local/lib/R/site-library/Rcpp/include/Rcpp/vector/
>> > |     Subsetter.h:200
>> > |
>> > |     199             output[i] = lhs[ indices[i] ];
>> > |
>> > |     (gdb) p i
>> > |
>> > |     $1 = 33622
>> > |
>> > |     (gdb) p indices[i]
>> > |
>> > |     Cannot access memory at address 0x34c6e000
>> > |
>> > |     (gdb) p indices_n
>> > |
>> > |     $2 = 9594546
>> > |
>> > |
>> > |     On Fri, Jan 29, 2016 at 2:29 PM, Dirk Eddelbuettel
>> > <edd at debian.org> wrote:
>> > |
>> > |
>> > |         On 29 January 2016 at 11:27, Kevin Ushey wrote:
>> > |         | When I add some debug printing to the associated
>> > subscripting line
>> > |         | (https://github.com/awalker89/openxlsx/blob/
>> > |
>> > b92bb3acdd6ea759be928c298c6faeef2f26fa3e/src/cppFunctions.cpp#L2608),
>> > |         | I see:
>> > |         |
>> > |         |    colNumbers.size(): 98,03,150
>> > |         |    charCols.size(): 95,94,546
>> > |         |
>> > |         | It looks to me like the package is erroneously attempting to
>> > subset
>> > |         | vectors of different sizes, causing out-of-bounds reads.
>> > |
>> > |         Nice work.
>> > |
>> > |         | Unfortunately, Rcpp is not detecting or warning about
>> > this...
>> > |         |
>> > |         | Either way, I believe this is a bug in the openxlsx package,
>> > but Rcpp
>> > |         | should be checking / reporting this.
>> > |
>> > |         With (Rcpp)Armadillo you do have an option of turning this
>> > on/off. With
>> > |         Rcpp
>> > |         alone not quite.
>> > |
>> > |         Dirk
>> > |
>> > |         --
>> > |         http://dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org
>> > |         _______________________________________________
>> > |         Rcpp-devel mailing list
>> > |         Rcpp-devel at lists.r-forge.r-project.org
>> > |
>> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
>> > |
>> > |
>> > |
>> > |
>> > |     --
>> > |     Qiang Kou
>> > |     qkou at umail.iu.edu
>> > |     School of Informatics and Computing, Indiana University
>> > |
>> > |
>> > |
>> > |
>> > |
>> > | --
>> > | Qiang Kou
>> > | qkou at umail.iu.edu
>> > | School of Informatics and Computing, Indiana University
>> > |
>> >
>> > --
>> > http://dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org
>
>
>
>
> --
> Qiang Kou
> qkou at umail.iu.edu
> School of Informatics and Computing, Indiana University
>


More information about the Rcpp-devel mailing list