[datatable-help] Weird interaction between data.table and plyr, not sure which list to mail.
Chris Neff
caneff at gmail.com
Tue Aug 2 15:57:42 CEST 2011
Hi all,
There seems to be an issue where I can alter the ordering of a data
table and have it not lose it's key, thereby screwing up future
output. Example
x <- data.table(x=rep(1:10,2), y=rep(1:2,each=10))
key(x) <- "x"
x
x y
[1,] 1 1
[2,] 1 2
[3,] 2 1
[4,] 2 2
[5,] 3 1
[6,] 3 2
[7,] 4 1
[8,] 4 2
[9,] 5 1
[10,] 5 2
[11,] 6 1
[12,] 6 2
[13,] 7 1
[14,] 7 2
[15,] 8 1
[16,] 8 2
[17,] 9 1
[18,] 9 2
[19,] 10 1
[20,] 10 2
Now, if I want to find all the elements where x is 2, I do
x[J(2)]
x y
[1,] 2 1
[2,] 2 2
and it works fine. Let's say I want to reorder the rows for some
reason. I'll use the arrange function in plyr:
x <- arrange(x, y)
If I go key(x), I get back "x" is still the key. However x now looks like:
x
x y
[1,] 1 1
[2,] 2 1
[3,] 3 1
[4,] 4 1
[5,] 5 1
[6,] 6 1
[7,] 7 1
[8,] 8 1
[9,] 9 1
[10,] 10 1
[11,] 1 2
[12,] 2 2
[13,] 3 2
[14,] 4 2
[15,] 5 2
[16,] 6 2
[17,] 7 2
[18,] 8 2
[19,] 9 2
[20,] 10 2
This is now not sorted by its key. If I do something like x[J(2)] I get:
x[J(2)]
x y
[1,] 2 1
The second row becomes missing because data.table assumes things are
still sorted, so after seeing the first 2 in x, and then seeing
something not 2, it stops.
I've noticed that some other cases have been prevented. For instance if I do:
x[order(y)]
Then the key gets set to NULL as I should expect. But this case fails.
In the process of writing this email I've come to realize I shouldn't
be using arrange(x, y) but instead be using x[order(y)] to begin with,
so some of this email is moot. But the main point still exists that
I've found a way that reorders the data.table while retaining the
original key, therefore breaking things.
-Chris
More information about the datatable-help
mailing list