[datatable-help] data.table segfaulting, need help verifying the reason
Matthew Dowle
mdowle at mdowle.plus.com
Tue Sep 10 22:06:12 CEST 2013
Yes, seems like the columns themselves have names, with inconsistent length.
lapply(a,names) should reveal the "hidden" names
To remove them :
for (i in 1:ncol(a)) setattr(a[[i]],"names",NULL)
Then lapply(a,names) should be clear.
Then try again the things that segfaulted before.
If this fixes it, we'll need to establish how the erroneous names got
in there.
On 10/09/13 19:51, Chris Neff wrote:
>
>
>
> On Tue, Sep 10, 2013 at 2:02 PM, Matthew Dowle <mdowle at mdowle.plus.com
> <mailto:mdowle at mdowle.plus.com>> wrote:
>
>
> Nothing springs to mind. Latest version v1.8.10 from CRAN right?
> Or v1.8.11 on R-Forge?
>
>
> Both. And 1.8.8.
>
>
> On this bit :
>
> > So somewhere these key columns think they are different lengths
> than they really are, and
> > when I try to access it I go into memory I shouldn't so I
> segfault. How can I verify this? Is
> > there something about the DT I can check to see what DT thinks
> these columns are?
>
> .Internal(inspect(DT)) reveals the internal structure including
> length and truelength on the column pointer vector as well as each
> column.
>
> But it's a really odd way of using data.table. Iterating by row is
> going to kill performance; data.table likes by column.
>
>
> Trust me I know this, this isn't my code :) I'm just the data.table
> guy who helps debug. I am helping him with better ways, but I think we
> can agree that it should at least not segfault.
>
>
> I ran inspect on the two versions of the data.table, the one that
> crashes that is made by doing rbindlist(apply(d,1,...)) and the one
> that doesn't that gets made by doing rbindlist(lapply(1:nrow(d),...)),
> and changed the variable names and censored out values.
>
> First the one that fails (accessing either a$k1 or a$k2 will segfault):
>
> > .Internal(inspect(a))
> @2cc5be0 19 VECSXP g0c7 [OBJ,NAM(2),ATT] (len=13, tl=100)
> @3b643d0 16 STRSXP g0c7 [NAM(2),ATT] (len=326, tl=0)
> @253e488 09 CHARSXP g1c3 [MARK,gp=0x20,ATT] "#########"
> @253e488 09 CHARSXP g1c3 [MARK,gp=0x20,ATT] "#########"
> @253e3f8 09 CHARSXP g1c3 [MARK,gp=0x20] "#########"
> @253e3f8 09 CHARSXP g1c3 [MARK,gp=0x20] "#########"
> @253e3f8 09 CHARSXP g1c3 [MARK,gp=0x20] "#########"
> ...
> ATTRIB:
> @ac6c20 02 LISTSXP g1c0 [MARK]
> TAG: @963418 01 SYMSXP g1c0 [MARK,gp=0x4000] "names"
> @3ba6ad8 16 STRSXP g1c2 [MARK,NAM(2)] (len=2, tl=0)
> @184aed0 09 CHARSXP g1c3 [MARK,gp=0x21,ATT] "k1"
> @184aed0 09 CHARSXP g1c3 [MARK,gp=0x21,ATT] "k1"
> @3b64e30 16 STRSXP g0c7 [NAM(2),ATT] (len=326, tl=0)
> @253e440 09 CHARSXP g1c3 [MARK,gp=0x20] "#########"
> @253e440 09 CHARSXP g1c3 [MARK,gp=0x20] "#########"
> @253e440 09 CHARSXP g1c3 [MARK,gp=0x20] "#########"
> @253e440 09 CHARSXP g1c3 [MARK,gp=0x20] "#########"
> @253e3b0 09 CHARSXP g1c3 [MARK,gp=0x20] "#########"
> ...
> ATTRIB:
> @ac6cc8 02 LISTSXP g1c0 [MARK]
> TAG: @963418 01 SYMSXP g1c0 [MARK,gp=0x4000] "names"
> @3ba6a68 16 STRSXP g1c2 [MARK,NAM(2)] (len=2, tl=0)
> @bf8578 09 CHARSXP g1c2 [MARK,gp=0x21] "k2"
> @bf8578 09 CHARSXP g1c2 [MARK,gp=0x21] "k2"
> @3b65890 16 STRSXP g0c7 [NAM(2)] (len=326, tl=0)
> @24eeb68 09 CHARSXP g1c1 [MARK,gp=0x20] "#########"
> @24eeb08 09 CHARSXP g1c1 [MARK,gp=0x20] "#########"
> @24eeb68 09 CHARSXP g1c1 [MARK,gp=0x20] "#########"
> @24eeb08 09 CHARSXP g1c1 [MARK,gp=0x20] "#########"
> @24eeb68 09 CHARSXP g1c1 [MARK,gp=0x20] "#########"
> ...
> @1ff5850 13 INTSXP g0c7 [NAM(2)] (len=326, tl=0) 3,3,3,3,3,...
> @1fc6600 13 INTSXP g0c7 [NAM(2)] (len=326, tl=0) 2,1,2,1,3,...
> ...
> ATTRIB:
> @21f6d48 02 LISTSXP g0c0 []
> TAG: @963418 01 SYMSXP g1c0 [MARK,gp=0x4000] "names"
> @3efc1f0 16 STRSXP g0c7 [NAM(2)] (len=13, tl=100)
> @184aed0 09 CHARSXP g1c3 [MARK,gp=0x21,ATT] "k1"
> @bf8578 09 CHARSXP g1c2 [MARK,gp=0x21] "k2"
> @108be30 09 CHARSXP g1c2 [MARK,gp=0x21] "v1"
> @108be68 09 CHARSXP g1c2 [MARK,gp=0x21] "v2"
> @108bf10 09 CHARSXP g1c2 [MARK,gp=0x21] "v3"
> ...
> TAG: @96d200 01 SYMSXP g1c0 [MARK,gp=0x4000] "row.names"
> @2556908 13 INTSXP g0c1 [] (len=2, tl=0) -2147483648,-326
> TAG: @9638e8 01 SYMSXP g1c0 [MARK,gp=0x4000] "class"
> @2701b38 16 STRSXP g0c2 [NAM(2)] (len=2, tl=0)
> @bf8460 09 CHARSXP g1c2 [MARK,gp=0x21] "data.table"
> @9f2688 09 CHARSXP g1c2 [MARK,gp=0x21,ATT] "data.frame"
> TAG: @1e75218 01 SYMSXP g1c0 [MARK] ".internal.selfref"
> @21f6e28 22 EXTPTRSXP g0c0 []
>
>
>
>
>
>
> Secondly the one that works (all values can be accessed fine:
>
> > .Internal(inspect(a))
> @45b4850 19 VECSXP g0c7 [OBJ,NAM(2),ATT] (len=13, tl=100)
> @33a53a0 16 STRSXP g0c7 [NAM(2)] (len=326, tl=0)
> @253e488 09 CHARSXP g1c3 [MARK,gp=0x20,ATT] "#########"
> @253e488 09 CHARSXP g1c3 [MARK,gp=0x20,ATT] "#########"
> @253e3f8 09 CHARSXP g1c3 [MARK,gp=0x20] "#########"
> @253e3f8 09 CHARSXP g1c3 [MARK,gp=0x20] "#########"
> @253e3f8 09 CHARSXP g1c3 [MARK,gp=0x20] "#########"
> ...
> @33a5e00 16 STRSXP g0c7 [NAM(2)] (len=326, tl=0)
> @253e440 09 CHARSXP g1c3 [MARK,gp=0x20] "#########"
> @253e440 09 CHARSXP g1c3 [MARK,gp=0x20] "#########"
> @253e440 09 CHARSXP g1c3 [MARK,gp=0x20] "#########"
> @253e440 09 CHARSXP g1c3 [MARK,gp=0x20] "#########"
> @253e3b0 09 CHARSXP g1c3 [MARK,gp=0x20] "#########"
> ...
> @33a6860 16 STRSXP g0c7 [NAM(2)] (len=326, tl=0)
> @24eeb68 09 CHARSXP g1c1 [MARK,gp=0x20] "#########"
> @24eeb08 09 CHARSXP g1c1 [MARK,gp=0x20] "#########"
> @24eeb68 09 CHARSXP g1c1 [MARK,gp=0x20] "#########"
> @24eeb08 09 CHARSXP g1c1 [MARK,gp=0x20] "#########"
> @24eeb68 09 CHARSXP g1c1 [MARK,gp=0x20] "#########"
> ...
> @1ff10f0 13 INTSXP g0c7 [NAM(2)] (len=326, tl=0) 3,3,3,3,3,...
> @3a6d0d0 13 INTSXP g0c7 [NAM(2)] (len=326, tl=0) 2,1,2,1,3,...
> ...
> ATTRIB:
> @276c360 02 LISTSXP g0c0 []
> TAG: @963418 01 SYMSXP g1c0 [MARK,gp=0x4000] "names"
> @1fe5670 16 STRSXP g0c7 [NAM(2)] (len=13, tl=100)
> @184aed0 09 CHARSXP g1c3 [MARK,gp=0x21,ATT] "k1"
> @bf8578 09 CHARSXP g1c2 [MARK,gp=0x21] "k2"
> @108be30 09 CHARSXP g1c2 [MARK,gp=0x21] "v1"
> @108be68 09 CHARSXP g1c2 [MARK,gp=0x21] "v2"
> @108bf10 09 CHARSXP g1c2 [MARK,gp=0x21] "v3"
> ...
> TAG: @96d200 01 SYMSXP g1c0 [MARK,gp=0x4000] "row.names"
> @29cbf38 13 INTSXP g0c1 [] (len=2, tl=0) -2147483648,-326
> TAG: @9638e8 01 SYMSXP g1c0 [MARK,gp=0x4000] "class"
> @2d539a0 16 STRSXP g0c2 [NAM(2)] (len=2, tl=0)
> @bf8460 09 CHARSXP g1c2 [MARK,gp=0x21] "data.table"
> @9f2688 09 CHARSXP g1c2 [MARK,gp=0x21,ATT] "data.frame"
> TAG: @1e75218 01 SYMSXP g1c0 [MARK] ".internal.selfref"
> @276c440 22 EXTPTRSXP g0c0 []
>
>
>
>
> It looks to me to be some differences in the ATTRs attached to k1 and
> k2 in the first case? I can't really parse this as well as you can.
>
> If it really has to be by row then DT[, fun(.SD,...),
> by=1:nrow(DT)] should be better than apply().
>
> Matthew
>
>
> On 10/09/13 18:47, Chris Neff wrote:
>> Narrowing it down further,
>>
>> a$x
>>
>> segfaults and
>>
>> a[,x]
>>
>> segfaults but
>>
>> a[,"x", with=FALSE]
>>
>> doesn't.
>>
>>
>> On Tue, Sep 10, 2013 at 1:32 PM, Chris Neff <caneff at gmail.com
>> <mailto:caneff at gmail.com>> wrote:
>>
>> I'm pretty sure it is some issue of a column that thinks it
>> is bigger than it actually is. I have tried, so far in vain,
>> to make a reproducible example that I can share. I have one,
>> but can't share it.
>>
>> What happens is this:
>>
>> A data.frame is made:
>>
>> > d = data.frame(...)
>>
>> Then I call apply over every row, calling a different
>> function that takes in a DT as well:
>>
>> l = apply(d, 1, function(x) func(x[1], x[2], DT))
>>
>> This returns a data.frame. If I rbindlist this:
>>
>> a = rbindlist(l)
>>
>> I can print a just fine, and it will show me all data like
>> normal. but if I try to just do
>>
>> a$x
>>
>> x is one of the columns that was a key in DT, then it
>> segfaults. If I ask for a column that was made by "func" and
>> wasn't a column in DT, it works fine. If I ask for only the
>> first 10 rows and then ask for x:
>>
>> a[1:10]$x
>>
>> it works fine.
>>
>> So somewhere these key columns think they are different
>> lengths than they really are, and when I try to access it I
>> go into memory I shouldn't so I segfault. How can I verify
>> this? Is there something about the DT I can check to see what
>> DT thinks these columns are?
>>
>>
>> Also, if instead of apply when making the list, I do
>>
>> l = lapply(1:nrow(d), function(i) func(x[i,1],x[i,2],DT))
>>
>> and rbindlist that, it works fine too.
>>
>>
>>
>>
>> _______________________________________________
>> datatable-help mailing list
>> datatable-help at lists.r-forge.r-project.org <mailto:datatable-help at lists.r-forge.r-project.org>
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130910/aeba1dce/attachment-0001.html>
More information about the datatable-help
mailing list