<div dir="ltr"><div><br></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Tue, Sep 10, 2013 at 2:02 PM, Matthew Dowle <span dir="ltr"><<a href="mailto:mdowle@mdowle.plus.com" target="_blank">mdowle@mdowle.plus.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div><br>
Nothing springs to mind. Latest version v1.8.10 from CRAN right?
Or v1.8.11 on R-Forge?<br></div></div></blockquote><div><br></div><div>Both. And 1.8.8.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"><div>
<br>
On this bit :<div class="im"><br>
> So somewhere these key columns think they are different
lengths than they really are, and<br>
> when I try to access it I go into memory I shouldn't so I
segfault. How can I verify this? Is<br>
> there something about the DT I can check to see what DT
thinks these columns are?<br>
<br></div>
.Internal(inspect(DT)) reveals the internal structure including
length and truelength on the column pointer vector as well as each
column.<br>
<br>
But it's a really odd way of using data.table. Iterating by row
is going to kill performance; data.table likes by column.<br></div></div></blockquote><div><br></div><div>Trust me I know this, this isn't my code :) I'm just the data.table guy who helps debug. I am helping him with better ways, but I think we can agree that it should at least not segfault.</div>
<div><br></div><div><br></div><div>I ran inspect on the two versions of the data.table, the one that crashes that is made by doing rbindlist(apply(d,1,...)) and the one that doesn't that gets made by doing rbindlist(lapply(1:nrow(d),...)), and changed the variable names and censored out values.</div>
<div><br></div><div>First the one that fails (accessing either a$k1 or a$k2 will segfault):</div><div><br></div><div><div>> .Internal(inspect(a))</div><div>@2cc5be0 19 VECSXP g0c7 [OBJ,NAM(2),ATT] (len=13, tl=100)</div>
<div> @3b643d0 16 STRSXP g0c7 [NAM(2),ATT] (len=326, tl=0)</div><div> @253e488 09 CHARSXP g1c3 [MARK,gp=0x20,ATT] "#########"</div><div> @253e488 09 CHARSXP g1c3 [MARK,gp=0x20,ATT] "#########"</div>
<div> @253e3f8 09 CHARSXP g1c3 [MARK,gp=0x20] "#########"</div><div> @253e3f8 09 CHARSXP g1c3 [MARK,gp=0x20] "#########"</div><div> @253e3f8 09 CHARSXP g1c3 [MARK,gp=0x20] "#########"</div>
<div> ...</div><div> ATTRIB:</div><div> @ac6c20 02 LISTSXP g1c0 [MARK] </div><div> TAG: @963418 01 SYMSXP g1c0 [MARK,gp=0x4000] "names"</div><div> @3ba6ad8 16 STRSXP g1c2 [MARK,NAM(2)] (len=2, tl=0)</div>
<div> @184aed0 09 CHARSXP g1c3 [MARK,gp=0x21,ATT] "k1"</div><div> @184aed0 09 CHARSXP g1c3 [MARK,gp=0x21,ATT] "k1"</div><div> @3b64e30 16 STRSXP g0c7 [NAM(2),ATT] (len=326, tl=0)</div><div>
@253e440 09 CHARSXP g1c3 [MARK,gp=0x20] "#########"</div><div> @253e440 09 CHARSXP g1c3 [MARK,gp=0x20] "#########"</div><div> @253e440 09 CHARSXP g1c3 [MARK,gp=0x20] "#########"</div>
<div> @253e440 09 CHARSXP g1c3 [MARK,gp=0x20] "#########"</div><div> @253e3b0 09 CHARSXP g1c3 [MARK,gp=0x20] "#########"</div><div> ...</div><div> ATTRIB:</div><div> @ac6cc8 02 LISTSXP g1c0 [MARK] </div>
<div> TAG: @963418 01 SYMSXP g1c0 [MARK,gp=0x4000] "names"</div><div> @3ba6a68 16 STRSXP g1c2 [MARK,NAM(2)] (len=2, tl=0)</div><div> @bf8578 09 CHARSXP g1c2 [MARK,gp=0x21] "k2"</div><div>
@bf8578 09 CHARSXP g1c2 [MARK,gp=0x21] "k2"</div><div> @3b65890 16 STRSXP g0c7 [NAM(2)] (len=326, tl=0)</div><div> @24eeb68 09 CHARSXP g1c1 [MARK,gp=0x20] "#########"</div><div> @24eeb08 09 CHARSXP g1c1 [MARK,gp=0x20] "#########"</div>
<div> @24eeb68 09 CHARSXP g1c1 [MARK,gp=0x20] "#########"</div><div> @24eeb08 09 CHARSXP g1c1 [MARK,gp=0x20] "#########"</div><div> @24eeb68 09 CHARSXP g1c1 [MARK,gp=0x20] "#########"</div>
<div> ...</div><div> @1ff5850 13 INTSXP g0c7 [NAM(2)] (len=326, tl=0) 3,3,3,3,3,...</div><div> @1fc6600 13 INTSXP g0c7 [NAM(2)] (len=326, tl=0) 2,1,2,1,3,...</div><div> ...</div><div>ATTRIB:</div><div> @21f6d48 02 LISTSXP g0c0 [] </div>
<div> TAG: @963418 01 SYMSXP g1c0 [MARK,gp=0x4000] "names"</div><div> @3efc1f0 16 STRSXP g0c7 [NAM(2)] (len=13, tl=100)</div><div> @184aed0 09 CHARSXP g1c3 [MARK,gp=0x21,ATT] "k1"</div><div>
@bf8578 09 CHARSXP g1c2 [MARK,gp=0x21] "k2"</div><div> @108be30 09 CHARSXP g1c2 [MARK,gp=0x21] "v1"</div><div> @108be68 09 CHARSXP g1c2 [MARK,gp=0x21] "v2"</div><div> @108bf10 09 CHARSXP g1c2 [MARK,gp=0x21] "v3"</div>
<div> ...</div><div> TAG: @96d200 01 SYMSXP g1c0 [MARK,gp=0x4000] "row.names"</div><div> @2556908 13 INTSXP g0c1 [] (len=2, tl=0) -2147483648,-326</div><div> TAG: @9638e8 01 SYMSXP g1c0 [MARK,gp=0x4000] "class"</div>
<div> @2701b38 16 STRSXP g0c2 [NAM(2)] (len=2, tl=0)</div><div> @bf8460 09 CHARSXP g1c2 [MARK,gp=0x21] "data.table"</div><div> @9f2688 09 CHARSXP g1c2 [MARK,gp=0x21,ATT] "data.frame"</div>
<div> TAG: @1e75218 01 SYMSXP g1c0 [MARK] ".internal.selfref"</div><div> @21f6e28 22 EXTPTRSXP g0c0 [] </div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div>
Secondly the one that works (all values can be accessed fine:</div>
<div><br></div><div>> .Internal(inspect(a))</div><div>@45b4850 19 VECSXP g0c7 [OBJ,NAM(2),ATT] (len=13, tl=100)</div><div> @33a53a0 16 STRSXP g0c7 [NAM(2)] (len=326, tl=0)</div><div> @253e488 09 CHARSXP g1c3 [MARK,gp=0x20,ATT] "#########"</div>
<div> @253e488 09 CHARSXP g1c3 [MARK,gp=0x20,ATT] "#########"</div><div> @253e3f8 09 CHARSXP g1c3 [MARK,gp=0x20] "#########"</div><div> @253e3f8 09 CHARSXP g1c3 [MARK,gp=0x20] "#########"</div>
<div> @253e3f8 09 CHARSXP g1c3 [MARK,gp=0x20] "#########"</div><div> ...</div><div> @33a5e00 16 STRSXP g0c7 [NAM(2)] (len=326, tl=0)</div><div> @253e440 09 CHARSXP g1c3 [MARK,gp=0x20] "#########"</div>
<div> @253e440 09 CHARSXP g1c3 [MARK,gp=0x20] "#########"</div><div> @253e440 09 CHARSXP g1c3 [MARK,gp=0x20] "#########"</div><div> @253e440 09 CHARSXP g1c3 [MARK,gp=0x20] "#########"</div>
<div> @253e3b0 09 CHARSXP g1c3 [MARK,gp=0x20] "#########"</div><div> ...</div><div> @33a6860 16 STRSXP g0c7 [NAM(2)] (len=326, tl=0)</div><div> @24eeb68 09 CHARSXP g1c1 [MARK,gp=0x20] "#########"</div>
<div> @24eeb08 09 CHARSXP g1c1 [MARK,gp=0x20] "#########"</div><div> @24eeb68 09 CHARSXP g1c1 [MARK,gp=0x20] "#########"</div><div> @24eeb08 09 CHARSXP g1c1 [MARK,gp=0x20] "#########"</div>
<div> @24eeb68 09 CHARSXP g1c1 [MARK,gp=0x20] "#########"</div><div> ...</div><div> @1ff10f0 13 INTSXP g0c7 [NAM(2)] (len=326, tl=0) 3,3,3,3,3,...</div><div> @3a6d0d0 13 INTSXP g0c7 [NAM(2)] (len=326, tl=0) 2,1,2,1,3,...</div>
<div> ...</div><div>ATTRIB:</div><div> @276c360 02 LISTSXP g0c0 [] </div><div> TAG: @963418 01 SYMSXP g1c0 [MARK,gp=0x4000] "names"</div><div> @1fe5670 16 STRSXP g0c7 [NAM(2)] (len=13, tl=100)</div><div>
@184aed0 09 CHARSXP g1c3 [MARK,gp=0x21,ATT] "k1"</div><div> @bf8578 09 CHARSXP g1c2 [MARK,gp=0x21] "k2"</div><div> @108be30 09 CHARSXP g1c2 [MARK,gp=0x21] "v1"</div><div> @108be68 09 CHARSXP g1c2 [MARK,gp=0x21] "v2"</div>
<div> @108bf10 09 CHARSXP g1c2 [MARK,gp=0x21] "v3"</div><div> ...</div><div> TAG: @96d200 01 SYMSXP g1c0 [MARK,gp=0x4000] "row.names"</div><div> @29cbf38 13 INTSXP g0c1 [] (len=2, tl=0) -2147483648,-326</div>
<div> TAG: @9638e8 01 SYMSXP g1c0 [MARK,gp=0x4000] "class"</div><div> @2d539a0 16 STRSXP g0c2 [NAM(2)] (len=2, tl=0)</div><div> @bf8460 09 CHARSXP g1c2 [MARK,gp=0x21] "data.table"</div><div>
@9f2688 09 CHARSXP g1c2 [MARK,gp=0x21,ATT] "data.frame"</div><div> TAG: @1e75218 01 SYMSXP g1c0 [MARK] ".internal.selfref"</div><div> @276c440 22 EXTPTRSXP g0c0 [] </div><div><br></div><div>
<br></div></div><div><br></div><div><br></div><div>It looks to me to be some differences in the ATTRs attached to k1 and k2 in the first case? I can't really parse this as well as you can.</div><div><br></div><div> </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000"><div>
If it really has to be by row then DT[, fun(.SD,...),
by=1:nrow(DT)] should be better than apply().<span class=""><font color="#888888"><br>
<br>
Matthew</font></span><div><div class="h5"><br>
<br>
On 10/09/13 18:47, Chris Neff wrote:<br>
</div></div></div>
<blockquote type="cite"><div><div class="h5">
<div dir="ltr">Narrowing it down further,
<div><br>
</div>
<div>a$x</div>
<div><br>
</div>
<div>segfaults and</div>
<div><br>
</div>
<div>a[,x]</div>
<div><br>
</div>
<div>segfaults but</div>
<div><br>
</div>
<div>a[,"x", with=FALSE]</div>
<div><br>
</div>
<div>doesn't.</div>
</div>
<div class="gmail_extra"><br>
<br>
<div class="gmail_quote">On Tue, Sep 10, 2013 at 1:32 PM, Chris
Neff <span dir="ltr"><<a href="mailto:caneff@gmail.com" target="_blank">caneff@gmail.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div dir="ltr">I'm pretty sure it is some issue of a column
that thinks it is bigger than it actually is. I have
tried, so far in vain, to make a reproducible example that
I can share. I have one, but can't share it.
<div>
<br>
</div>
<div>What happens is this: </div>
<div><br>
</div>
<div>A data.frame is made:</div>
<div><br>
</div>
<div>> d = data.frame(...)</div>
<div><br>
</div>
<div>Then I call apply over every row, calling a different
function that takes in a DT as well:</div>
<div><br>
</div>
<div>l = apply(d, 1, function(x) func(x[1], x[2], DT))</div>
<div><br>
</div>
<div>This returns a data.frame. If I rbindlist this:</div>
<div><br>
</div>
<div>a = rbindlist(l)</div>
<div><br>
</div>
<div>I can print a just fine, and it will show me all data
like normal. but if I try to just do </div>
<div><br>
</div>
<div>a$x</div>
<div><br>
</div>
<div>x is one of the columns that was a key in DT, then it
segfaults. If I ask for a column that was made by
"func" and wasn't a column in DT, it works fine. If I
ask for only the first 10 rows and then ask for x:</div>
<div><br>
</div>
<div>a[1:10]$x</div>
<div><br>
</div>
<div>it works fine.</div>
<div><br>
</div>
<div>So somewhere these key columns think they are
different lengths than they really are, and when I try
to access it I go into memory I shouldn't so I segfault.
How can I verify this? Is there something about the DT
I can check to see what DT thinks these columns are?</div>
<div><br>
</div>
<div><br>
</div>
<div>Also, if instead of apply when making the list, I do</div>
<div><br>
</div>
<div>l = lapply(1:nrow(d), function(i)
func(x[i,1],x[i,2],DT))</div>
<div><br>
</div>
<div>and rbindlist that, it works fine too.<br>
<br>
</div>
</div>
</blockquote>
</div>
<br>
</div>
<br>
<fieldset></fieldset>
<br>
</div></div><div class="im"><pre>_______________________________________________
datatable-help mailing list
<a href="mailto:datatable-help@lists.r-forge.r-project.org" target="_blank">datatable-help@lists.r-forge.r-project.org</a>
<a href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help" target="_blank">https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help</a></pre>
</div></blockquote>
<br>
</div>
</blockquote></div><br></div></div>