[datatable-help] Weird interaction between data.table and plyr, not sure which list to mail.

Matthew Dowle mdowle at mdowle.plus.com
Wed Aug 3 00:41:54 CEST 2011


That's a bug. Thanks Chris. Fixed, test added and committed to 1.6.3.

o   Invalid keys are no longer created when a non-data.table-aware
    package reorders the data; e.g.,
        setkey(DT,x,y)
        plyr::arrange(DT,y)       # same as DT[order(y)]
    This now drops the key to avoid incorrect results being
    returned the next time the invalid key is joined to. Thanks
    to Chris Neff for reporting this bug.

Matthew

On Tue, 2011-08-02 at 09:57 -0400, Chris Neff wrote:
> Hi all,
> There seems to be an issue where I can alter the ordering of a data
> table and have it not lose it's key, thereby screwing up future
> output. Example
> 
> x <- data.table(x=rep(1:10,2), y=rep(1:2,each=10))
> key(x) <- "x"
> x
> 
>        x y
>  [1,]  1 1
>  [2,]  1 2
>  [3,]  2 1
>  [4,]  2 2
>  [5,]  3 1
>  [6,]  3 2
>  [7,]  4 1
>  [8,]  4 2
>  [9,]  5 1
> [10,]  5 2
> [11,]  6 1
> [12,]  6 2
> [13,]  7 1
> [14,]  7 2
> [15,]  8 1
> [16,]  8 2
> [17,]  9 1
> [18,]  9 2
> [19,] 10 1
> [20,] 10 2
> 
> Now, if I want to find all the elements where x is 2, I do
> 
> x[J(2)]
> 
>      x y
> [1,] 2 1
> [2,] 2 2
> 
> 
> and it works fine. Let's say I want to reorder the rows for some
> reason.  I'll use the arrange function in plyr:
> 
> x <- arrange(x, y)
> 
> If I go key(x), I get back "x" is still the key.  However x now looks like:
> x
>        x y
>  [1,]  1 1
>  [2,]  2 1
>  [3,]  3 1
>  [4,]  4 1
>  [5,]  5 1
>  [6,]  6 1
>  [7,]  7 1
>  [8,]  8 1
>  [9,]  9 1
> [10,] 10 1
> [11,]  1 2
> [12,]  2 2
> [13,]  3 2
> [14,]  4 2
> [15,]  5 2
> [16,]  6 2
> [17,]  7 2
> [18,]  8 2
> [19,]  9 2
> [20,] 10 2
> 
> This is now not sorted by its key.  If I do something like x[J(2)] I get:
> 
> x[J(2)]
> 
>      x y
> [1,] 2 1
> 
> The second row becomes missing because data.table assumes things are
> still sorted, so after seeing the first 2 in x, and then seeing
> something not 2, it stops.
> 
> I've noticed that some other cases have been prevented.  For instance if I do:
> 
> x[order(y)]
> 
> Then the key gets set to NULL as I should expect. But this case fails.
> 
> In the process of writing this email I've come to realize I shouldn't
> be using arrange(x, y) but instead be using x[order(y)] to begin with,
> so some of this email is moot.  But the main point still exists that
> I've found a way that reorders the data.table while retaining the
> original key, therefore breaking things.
> 
> -Chris
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help




More information about the datatable-help mailing list