[datatable-help] Setting key when resulting order of table is not unique

Steve Lianoglou mailinglist.honeypot at gmail.com
Thu Jul 21 17:23:53 CEST 2011


Hi,

On Thu, Jul 21, 2011 at 11:02 AM, Alexander Peterhansl
<APeterhansl at gaincapital.com> wrote:
> Dear Data Table Help List,
>
> I am using data.table version 1.6 (with R version 2.12.2, 64-bit on Windows
> 7).  Suppose I have a table whose key does not give me a unique ordering.
> Then the output of the “roll” option will be arbitrary (i.e., it will depend
> on what one does between the two executions).  Is this something noteworthy?
>
> Please see output of the following:
>
>> DT = data.table(A=c(1,2,2),B=c("b1","b3","b2"),key="A")
>
>> DT[J(1:3),roll=TRUE]  # output 1
>
>         A  B
> [1,] 1 b1
> [2,] 2 b3
> [3,] 2 b2
> [4,] 3 b2
>
>> key(DT)="B"           # change keys to do other stuff...
>> key(DT)="A"           # get back to key A
>> DT[J(1:3),roll=TRUE]  # output 2 does not match output 1
>         A  B
> [1,] 1 b1
> [2,] 2 b2
> [3,] 2 b3
> [4,] 3 b3
>
> (Also, as an aside, I get identical output in the two executions of
> DT[J(1:3),roll=TRUE] when I start with the table DT =
> data.table(A=c(1,2,2),B=c("b1","b2","b3"),key="A") instead.)
>
> I’m sure there must also be other reverberations—beyond the effect on the
> roll option.
>
> Any insight would be of interest.  Thank you.

I don't think it's all that surprising in this case.

The original "keying" on A does not take your B column into consideration here:

R> DT = data.table(A=c(1,2,2),B=c("b1","b3","b2"),key="A")

But then when you set the key on "B", of course "b2" will have to be
rearranged to come before "b3".

After you set the key on your DT back to A, A itself is in order
already (1,2,2) == (1,2,2) so no moving around happens. You should
note that the reordering in data.table is "stable" (I'm 95% sure on
that, Matthew can verify) so "ties" will appear in the same order as
they did in the original input.

If it is important in your scenario that this doesn't change when you
"roll", you can always set a compound key on DT prior to doing that
calculation:

R> key(DT) <- c('A', 'B')

Anyway you shake it, if you run your code, then set the key to just
"B", then again to c("A", "B") to "roll" again, your results will be
the same.

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact


More information about the datatable-help mailing list