[datatable-help] data.table segfaulting, need help verifying the reason

Chris Neff caneff at gmail.com
Tue Sep 10 20:51:35 CEST 2013


On Tue, Sep 10, 2013 at 2:02 PM, Matthew Dowle <mdowle at mdowle.plus.com>wrote:

>
> Nothing springs to mind.  Latest version v1.8.10 from CRAN right?  Or
> v1.8.11 on R-Forge?
>

Both. And 1.8.8.


>
> On this bit :
>
> > So somewhere these key columns think they are different lengths than
> they really are, and
> > when I try to access it I go into memory I shouldn't so I segfault.  How
> can I verify this? Is
> > there something about the DT I can check to see what DT thinks these
> columns are?
>
> .Internal(inspect(DT)) reveals the internal structure including length and
> truelength on the column pointer vector as well as each column.
>
> But it's a really odd way of using data.table.  Iterating by row is going
> to kill performance;  data.table likes by column.
>

Trust me I know this, this isn't my code :) I'm just the data.table guy who
helps debug. I am helping him with better ways, but I think we can agree
that it should at least not segfault.


I ran inspect on the two versions of the data.table, the one that crashes
that is made by doing rbindlist(apply(d,1,...)) and the one that doesn't
that gets made by doing rbindlist(lapply(1:nrow(d),...)), and changed the
variable names and censored out values.

First the one that fails (accessing either a$k1 or a$k2 will segfault):

> .Internal(inspect(a))
@2cc5be0 19 VECSXP g0c7 [OBJ,NAM(2),ATT] (len=13, tl=100)
  @3b643d0 16 STRSXP g0c7 [NAM(2),ATT] (len=326, tl=0)
    @253e488 09 CHARSXP g1c3 [MARK,gp=0x20,ATT] "#########"
    @253e488 09 CHARSXP g1c3 [MARK,gp=0x20,ATT] "#########"
    @253e3f8 09 CHARSXP g1c3 [MARK,gp=0x20] "#########"
    @253e3f8 09 CHARSXP g1c3 [MARK,gp=0x20] "#########"
    @253e3f8 09 CHARSXP g1c3 [MARK,gp=0x20] "#########"
    ...
  ATTRIB:
    @ac6c20 02 LISTSXP g1c0 [MARK]
      TAG: @963418 01 SYMSXP g1c0 [MARK,gp=0x4000] "names"
      @3ba6ad8 16 STRSXP g1c2 [MARK,NAM(2)] (len=2, tl=0)
        @184aed0 09 CHARSXP g1c3 [MARK,gp=0x21,ATT] "k1"
        @184aed0 09 CHARSXP g1c3 [MARK,gp=0x21,ATT] "k1"
  @3b64e30 16 STRSXP g0c7 [NAM(2),ATT] (len=326, tl=0)
    @253e440 09 CHARSXP g1c3 [MARK,gp=0x20] "#########"
    @253e440 09 CHARSXP g1c3 [MARK,gp=0x20] "#########"
    @253e440 09 CHARSXP g1c3 [MARK,gp=0x20] "#########"
    @253e440 09 CHARSXP g1c3 [MARK,gp=0x20] "#########"
    @253e3b0 09 CHARSXP g1c3 [MARK,gp=0x20] "#########"
    ...
  ATTRIB:
    @ac6cc8 02 LISTSXP g1c0 [MARK]
      TAG: @963418 01 SYMSXP g1c0 [MARK,gp=0x4000] "names"
      @3ba6a68 16 STRSXP g1c2 [MARK,NAM(2)] (len=2, tl=0)
        @bf8578 09 CHARSXP g1c2 [MARK,gp=0x21] "k2"
        @bf8578 09 CHARSXP g1c2 [MARK,gp=0x21] "k2"
  @3b65890 16 STRSXP g0c7 [NAM(2)] (len=326, tl=0)
    @24eeb68 09 CHARSXP g1c1 [MARK,gp=0x20] "#########"
    @24eeb08 09 CHARSXP g1c1 [MARK,gp=0x20] "#########"
    @24eeb68 09 CHARSXP g1c1 [MARK,gp=0x20] "#########"
    @24eeb08 09 CHARSXP g1c1 [MARK,gp=0x20] "#########"
    @24eeb68 09 CHARSXP g1c1 [MARK,gp=0x20] "#########"
    ...
  @1ff5850 13 INTSXP g0c7 [NAM(2)] (len=326, tl=0) 3,3,3,3,3,...
  @1fc6600 13 INTSXP g0c7 [NAM(2)] (len=326, tl=0) 2,1,2,1,3,...
  ...
ATTRIB:
  @21f6d48 02 LISTSXP g0c0 []
    TAG: @963418 01 SYMSXP g1c0 [MARK,gp=0x4000] "names"
    @3efc1f0 16 STRSXP g0c7 [NAM(2)] (len=13, tl=100)
      @184aed0 09 CHARSXP g1c3 [MARK,gp=0x21,ATT] "k1"
      @bf8578 09 CHARSXP g1c2 [MARK,gp=0x21] "k2"
      @108be30 09 CHARSXP g1c2 [MARK,gp=0x21] "v1"
      @108be68 09 CHARSXP g1c2 [MARK,gp=0x21] "v2"
      @108bf10 09 CHARSXP g1c2 [MARK,gp=0x21] "v3"
      ...
    TAG: @96d200 01 SYMSXP g1c0 [MARK,gp=0x4000] "row.names"
    @2556908 13 INTSXP g0c1 [] (len=2, tl=0) -2147483648,-326
    TAG: @9638e8 01 SYMSXP g1c0 [MARK,gp=0x4000] "class"
    @2701b38 16 STRSXP g0c2 [NAM(2)] (len=2, tl=0)
      @bf8460 09 CHARSXP g1c2 [MARK,gp=0x21] "data.table"
      @9f2688 09 CHARSXP g1c2 [MARK,gp=0x21,ATT] "data.frame"
    TAG: @1e75218 01 SYMSXP g1c0 [MARK] ".internal.selfref"
    @21f6e28 22 EXTPTRSXP g0c0 []






Secondly the one that works (all values can be accessed fine:

> .Internal(inspect(a))
@45b4850 19 VECSXP g0c7 [OBJ,NAM(2),ATT] (len=13, tl=100)
  @33a53a0 16 STRSXP g0c7 [NAM(2)] (len=326, tl=0)
    @253e488 09 CHARSXP g1c3 [MARK,gp=0x20,ATT] "#########"
    @253e488 09 CHARSXP g1c3 [MARK,gp=0x20,ATT] "#########"
    @253e3f8 09 CHARSXP g1c3 [MARK,gp=0x20] "#########"
    @253e3f8 09 CHARSXP g1c3 [MARK,gp=0x20] "#########"
    @253e3f8 09 CHARSXP g1c3 [MARK,gp=0x20] "#########"
    ...
  @33a5e00 16 STRSXP g0c7 [NAM(2)] (len=326, tl=0)
    @253e440 09 CHARSXP g1c3 [MARK,gp=0x20] "#########"
    @253e440 09 CHARSXP g1c3 [MARK,gp=0x20] "#########"
    @253e440 09 CHARSXP g1c3 [MARK,gp=0x20] "#########"
    @253e440 09 CHARSXP g1c3 [MARK,gp=0x20] "#########"
    @253e3b0 09 CHARSXP g1c3 [MARK,gp=0x20] "#########"
    ...
  @33a6860 16 STRSXP g0c7 [NAM(2)] (len=326, tl=0)
    @24eeb68 09 CHARSXP g1c1 [MARK,gp=0x20] "#########"
    @24eeb08 09 CHARSXP g1c1 [MARK,gp=0x20] "#########"
    @24eeb68 09 CHARSXP g1c1 [MARK,gp=0x20] "#########"
    @24eeb08 09 CHARSXP g1c1 [MARK,gp=0x20] "#########"
    @24eeb68 09 CHARSXP g1c1 [MARK,gp=0x20] "#########"
    ...
  @1ff10f0 13 INTSXP g0c7 [NAM(2)] (len=326, tl=0) 3,3,3,3,3,...
  @3a6d0d0 13 INTSXP g0c7 [NAM(2)] (len=326, tl=0) 2,1,2,1,3,...
  ...
ATTRIB:
  @276c360 02 LISTSXP g0c0 []
    TAG: @963418 01 SYMSXP g1c0 [MARK,gp=0x4000] "names"
    @1fe5670 16 STRSXP g0c7 [NAM(2)] (len=13, tl=100)
      @184aed0 09 CHARSXP g1c3 [MARK,gp=0x21,ATT] "k1"
      @bf8578 09 CHARSXP g1c2 [MARK,gp=0x21] "k2"
      @108be30 09 CHARSXP g1c2 [MARK,gp=0x21] "v1"
      @108be68 09 CHARSXP g1c2 [MARK,gp=0x21] "v2"
      @108bf10 09 CHARSXP g1c2 [MARK,gp=0x21] "v3"
      ...
    TAG: @96d200 01 SYMSXP g1c0 [MARK,gp=0x4000] "row.names"
    @29cbf38 13 INTSXP g0c1 [] (len=2, tl=0) -2147483648,-326
    TAG: @9638e8 01 SYMSXP g1c0 [MARK,gp=0x4000] "class"
    @2d539a0 16 STRSXP g0c2 [NAM(2)] (len=2, tl=0)
      @bf8460 09 CHARSXP g1c2 [MARK,gp=0x21] "data.table"
      @9f2688 09 CHARSXP g1c2 [MARK,gp=0x21,ATT] "data.frame"
    TAG: @1e75218 01 SYMSXP g1c0 [MARK] ".internal.selfref"
    @276c440 22 EXTPTRSXP g0c0 []




It looks to me to be some differences in the ATTRs attached to k1 and k2 in
the first case?  I can't really parse this as well as you can.



> If it really has to be by row  then   DT[, fun(.SD,...), by=1:nrow(DT)]
> should be better than apply().
>
> Matthew
>
>
> On 10/09/13 18:47, Chris Neff wrote:
>
> Narrowing it down further,
>
>  a$x
>
>  segfaults and
>
>  a[,x]
>
>  segfaults but
>
>  a[,"x", with=FALSE]
>
>  doesn't.
>
>
> On Tue, Sep 10, 2013 at 1:32 PM, Chris Neff <caneff at gmail.com> wrote:
>
>> I'm pretty sure it is some issue of a column that thinks it is bigger
>> than it actually is.  I have tried, so far in vain, to make a reproducible
>> example that I can share.  I have one, but can't share it.
>>
>>  What happens is this:
>>
>>  A data.frame is made:
>>
>>  > d = data.frame(...)
>>
>>  Then I call apply over every row, calling a different function that
>> takes in a DT as well:
>>
>>  l = apply(d, 1, function(x) func(x[1], x[2], DT))
>>
>>  This returns a data.frame.  If I rbindlist this:
>>
>>  a = rbindlist(l)
>>
>>  I can print a just fine, and it will show me all data like normal. but
>> if I try to just do
>>
>>  a$x
>>
>>  x is one of the columns that was a key in DT, then it segfaults.  If I
>> ask for a column that was made by "func" and wasn't a column in DT, it
>> works fine.  If I ask for only the first 10 rows and then ask for x:
>>
>>  a[1:10]$x
>>
>>  it works fine.
>>
>>  So somewhere these key columns think they are different lengths than
>> they really are, and when I try to access it I go into memory I shouldn't
>> so I segfault.  How can I verify this? Is there something about the DT I
>> can check to see what DT thinks these columns are?
>>
>>
>>  Also, if instead of apply when making the list, I do
>>
>>  l = lapply(1:nrow(d), function(i) func(x[i,1],x[i,2],DT))
>>
>>  and rbindlist that, it works fine too.
>>
>>
>
>
> _______________________________________________
> datatable-help mailing listdatatable-help at lists.r-forge.r-project.orghttps://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130910/a8a3a504/attachment.html>


More information about the datatable-help mailing list