[datatable-help] Are you aware of this?
jeremiah rounds
roundsjeremiah at gmail.com
Sat Jun 14 07:23:10 CEST 2014
As a fan of your work I have always been curious if you are aware of this?
I find it causes new users to make mistakes.
> dt = list()
> dt$x = 1:10
> dt$y = letters[10:1]
> dt = as.data.table(as.data.frame(dt))
> dt
x y
1: 1 j
2: 2 i
3: 3 h
4: 4 g
5: 5 f
6: 6 e
7: 7 d
8: 8 c
9: 9 b
10: 10 a
> x0 = dt$x
> x1 = dt$x
> x0[1] = 11
> setkeyv(dt,"y")
> x0
[1] 11 2 3 4 5 6 7 8 9 10
> x1
[1] 10 9 8 7 6 5 4 3 2 1
> x1 == x0
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
x0 and x1 have assignments at the same exact time, and since R data.frame's
will not do this, it lures people into thinking they are then identical and
distinct as they are with data.frame's. My theory is they are not actually
copied: they are promised. When x0 has its index 1 changed it induces a
copy distinct from dt$x, but x1 has had no operation on it so it refers to
dt$x with its promise. Setting the key on dt reorders it and since x1 still
hasn't been evaluated it now matches the order of dt.
I found new users getting unpredictable results because they would try to
use a data.table as a data.frame and induce this with sorts. If you
thought you copied something in a particular order in dt by doing the
assigning ahead of the setkeyv you make a mistake. You don't really
expect x1 assigned maybe a page of code above to have its order changed by
a setkeyv. You do if you think about C pointers and references, but in R
you really don't think that way. Many R users don't even know what a
pointer is.
Thanks,
Jeremiah
> sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] splines parallel stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] locfit_1.5-9.1 edgeR_3.4.2 limma_3.18.13
[4] data.table_1.9.2 GenomicRanges_1.14.4 XVector_0.2.0
[7] IRanges_1.20.7 BiocGenerics_0.8.0
loaded via a namespace (and not attached):
[1] grid_3.0.1 lattice_0.20-15 plyr_1.8.1 Rcpp_0.11.1
[5] reshape2_1.4 stats4_3.0.1 stringr_0.6.2 tools_3.0.1
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140614/615ff843/attachment.html>
More information about the datatable-help
mailing list