<div>Frank,
</div><div><br></div><div>The answer to your problem is that you should be using `unique(DT1)` instead of `unique.data.frame(DT1)` because `unique` will call the "correct" `unique.data.table` method on DT1. </div><div><br></div><div>Now, as to why this is happening… You should know that data.table over allocates a list of column pointers in order to add columns by reference (you can read about this more, if you wish, by looking at ?`:=`). That is, if you do:</div><div><br></div><div>DT1 <- data.table(1)</div><div><br></div><div>You've created 1 column. But you've (or data.table has) allocated vector of a 100 column pointers (by default). You can see this by using the function `truelength`.</div><div><br></div><div>truelength(DT1)</div><div>> 100</div><div><br></div><div>Your problem with `unique.data.frame` is that this `truelength` is not maintained after doing this copy. That is:</div><div><br></div><div>DT2 <- unique(DT1) # <~~~ correct way</div><div>DT3 <- unique.data.frame(DT1) # <~~~ incorrect way</div><div><br></div><div>truelength(DT2)</div><div>> 100</div><div>truelength(DT3)</div><div>> 0</div><div><br></div><div>Therefore, we've a problem now. The over-allocated memory is somehow "gone" after this copy. Therefore when you do a `:=` after this, we will be writing to a memory location which isn't allocated. And this would normally lead to a segmentation fault (IIUC). </div><div><br></div><div>And this is what happened with an earlier version of data.table in a similar context - setting the key of data.table. In version 1.7.8, the key of a data.table was set by:</div><div><br></div><div>key(DT) <- …</div><div><br></div><div>And this resulted in a "copy" that set the true length to 0. So assigning by reference after this step lead to a segmentation fault. This is why now we have a "setkey" function or more general "setattr" function to assign things without R's copy screwing things up.</div><div><br></div><div>In order to catch this issue and rectify it without throwing a segmentation fault, the attribute ".internal.selfref" was designed. Basically it finds these situations and in that case gets a copy before assigning by reference. I can't find a documentation on "how" it's done. But the way I think of it is that when you assign by reference the existing .internal.selfref attribute (which is of class externalptr) is compared with the actual value of your data.table and if they match, then everything's good. Else, it has to make a copy and set the correct ptr as the attribute.</div><div><br></div><div>You can read about this in ?setkey. So in essence use `unique` which'll call the correct `unique.data.table` (hidden) function. Hope this helps. If there's ambiguity or I got something wrong, please point out.</div><div><br></div><div>Arun</div><div><div><br></div></div>
<p style="color: #A0A0A8;">On Wednesday, July 31, 2013 at 12:07 AM, Frank Erickson wrote:</p>
<blockquote type="cite" style="border-left-style:solid;border-width:1px;margin-left:0px;padding-left:10px;">
<span><div><div><div>I expect DT2 <- unique.data.frame(DT1) to be a new object, but get a warning about pointers, so apparently it is not...? </div><div><br></div><div>A short example:</div><div><br></div><blockquote style="margin:0 0 0 40px;border:none;padding:0px">
<div><div>DT1 <- data.table(1)</div></div><div><div>DT2 <- unique.data.frame(DT1)</div></div><div><div>DT2[,gah:=1]</div></div></blockquote><div><br></div><div>An example closer to my application, undoing a cartesian/cross join:</div>
<div><br></div><blockquote style="margin:0 0 0 40px;border:none;padding:0px"><div><div>DT1 <- CJ(A=0:1,B=1:6,D0=0:1,D=0:1)[D>=D0]</div></div><div><div>setkey(DT1,A)</div></div><div><div>DT2 <- unique.data.frame(DT1[,-which(names(DT1)%in%'B'),with=FALSE])</div>
</div><div><div>DT2[,gah:=1] # warning: I should have made a copy, apparently</div></div></blockquote><div><br></div><div>I'm fine with explicitly making a copy, of course, and don't really know anything about pointers. I just thought I'd bring it up.</div>
<div><br></div><div>--Frank</div>
</div><div><div>_______________________________________________</div><div>datatable-help mailing list</div><div><a href="mailto:datatable-help@lists.r-forge.r-project.org">datatable-help@lists.r-forge.r-project.org</a></div><div><a href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help">https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help</a></div></div></div></span>
</blockquote>
<div>
<br>
</div>