<div dir="ltr">As a fan of your work I have always been curious if you are aware of this? I find it causes new users to make mistakes.<div><br></div><div><br></div><div><div>> dt = list()</div><div>> dt$x = 1:10</div>
<div>> dt$y = letters[10:1]</div><div>> dt = as.data.table(as.data.frame(dt))</div><div>> dt</div><div> x y</div><div> 1: 1 j</div><div> 2: 2 i</div><div> 3: 3 h</div><div> 4: 4 g</div><div> 5: 5 f</div>
<div> 6: 6 e</div><div> 7: 7 d</div><div> 8: 8 c</div><div> 9: 9 b</div><div>10: 10 a</div><div>> x0 = dt$x</div><div>> x1 = dt$x</div><div>> x0[1] = 11</div><div>> setkeyv(dt,"y")</div><div>> x0</div>
<div> [1] 11 2 3 4 5 6 7 8 9 10</div><div>> x1</div><div> [1] 10 9 8 7 6 5 4 3 2 1</div><div>> x1 == x0</div><div> [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE</div></div><div><br>
</div><div><br></div><div>x0 and x1 have assignments at the same exact time, and since R data.frame's will not do this, it lures people into thinking they are then identical and distinct as they are with data.frame's. My theory is they are not actually copied: they are promised. When x0 has its index 1 changed it induces a copy distinct from dt$x, but x1 has had no operation on it so it refers to dt$x with its promise. Setting the key on dt reorders it and since x1 still hasn't been evaluated it now matches the order of dt.</div>
<div><br></div><div>I found new users getting unpredictable results because they would try to use a data.table as a data.frame and induce this with sorts. If you thought you copied something in a particular order in dt by doing the assigning ahead of the setkeyv you make a mistake. You don't really expect x1 assigned maybe a page of code above to have its order changed by a setkeyv. You do if you think about C pointers and references, but in R you really don't think that way. Many R users don't even know what a pointer is.</div>
<div><br></div><div><br></div><div>Thanks,</div><div>Jeremiah</div><div><br></div><div><div>> sessionInfo()</div><div>R version 3.0.1 (2013-05-16)</div><div>Platform: x86_64-unknown-linux-gnu (64-bit)</div><div><br></div>
<div>locale:</div><div> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C </div><div> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 </div><div> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 </div>
<div> [7] LC_PAPER=C LC_NAME=C </div><div> [9] LC_ADDRESS=C LC_TELEPHONE=C </div><div>[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C </div><div><br></div>
<div>attached base packages:</div><div>[1] splines parallel stats graphics grDevices utils datasets </div><div>[8] methods base </div><div><br></div><div>other attached packages:</div><div>[1] locfit_1.5-9.1 edgeR_3.4.2 limma_3.18.13 </div>
<div>[4] data.table_1.9.2 GenomicRanges_1.14.4 XVector_0.2.0 </div><div>[7] IRanges_1.20.7 BiocGenerics_0.8.0 </div><div><br></div><div>loaded via a namespace (and not attached):</div><div>[1] grid_3.0.1 lattice_0.20-15 plyr_1.8.1 Rcpp_0.11.1 </div>
<div>[5] reshape2_1.4 stats4_3.0.1 stringr_0.6.2 tools_3.0.1 </div></div><div><br></div><div><br></div><div><br></div></div>