[datatable-help] Weird interaction between data.table and plyr, not sure which list to mail.

Chris Neff caneff at gmail.com
Tue Aug 2 15:57:42 CEST 2011


Hi all,
There seems to be an issue where I can alter the ordering of a data
table and have it not lose it's key, thereby screwing up future
output. Example

x <- data.table(x=rep(1:10,2), y=rep(1:2,each=10))
key(x) <- "x"
x

       x y
 [1,]  1 1
 [2,]  1 2
 [3,]  2 1
 [4,]  2 2
 [5,]  3 1
 [6,]  3 2
 [7,]  4 1
 [8,]  4 2
 [9,]  5 1
[10,]  5 2
[11,]  6 1
[12,]  6 2
[13,]  7 1
[14,]  7 2
[15,]  8 1
[16,]  8 2
[17,]  9 1
[18,]  9 2
[19,] 10 1
[20,] 10 2

Now, if I want to find all the elements where x is 2, I do

x[J(2)]

     x y
[1,] 2 1
[2,] 2 2


and it works fine. Let's say I want to reorder the rows for some
reason.  I'll use the arrange function in plyr:

x <- arrange(x, y)

If I go key(x), I get back "x" is still the key.  However x now looks like:
x
       x y
 [1,]  1 1
 [2,]  2 1
 [3,]  3 1
 [4,]  4 1
 [5,]  5 1
 [6,]  6 1
 [7,]  7 1
 [8,]  8 1
 [9,]  9 1
[10,] 10 1
[11,]  1 2
[12,]  2 2
[13,]  3 2
[14,]  4 2
[15,]  5 2
[16,]  6 2
[17,]  7 2
[18,]  8 2
[19,]  9 2
[20,] 10 2

This is now not sorted by its key.  If I do something like x[J(2)] I get:

x[J(2)]

     x y
[1,] 2 1

The second row becomes missing because data.table assumes things are
still sorted, so after seeing the first 2 in x, and then seeing
something not 2, it stops.

I've noticed that some other cases have been prevented.  For instance if I do:

x[order(y)]

Then the key gets set to NULL as I should expect. But this case fails.

In the process of writing this email I've come to realize I shouldn't
be using arrange(x, y) but instead be using x[order(y)] to begin with,
so some of this email is moot.  But the main point still exists that
I've found a way that reorders the data.table while retaining the
original key, therefore breaking things.

-Chris


More information about the datatable-help mailing list