[datatable-help] Setting key when resulting order of table is not unique

Matthew Dowle mdowle at mdowle.plus.com
Thu Jul 21 19:38:07 CEST 2011


On Thu, 2011-07-21 at 11:23 -0400, Steve Lianoglou wrote:
> Hi,
> 
> On Thu, Jul 21, 2011 at 11:02 AM, Alexander Peterhansl
> <APeterhansl at gaincapital.com> wrote:
> > Dear Data Table Help List,
> >
> > I am using data.table version 1.6 (with R version 2.12.2, 64-bit on Windows
> > 7).  Suppose I have a table whose key does not give me a unique ordering.
> > Then the output of the “roll” option will be arbitrary (i.e., it will depend
> > on what one does between the two executions).  Is this something noteworthy?
> >
> > Please see output of the following:
> >
> >> DT = data.table(A=c(1,2,2),B=c("b1","b3","b2"),key="A")
> >
> >> DT[J(1:3),roll=TRUE]  # output 1
> >
> >         A  B
> > [1,] 1 b1
> > [2,] 2 b3
> > [3,] 2 b2
> > [4,] 3 b2
> >
> >> key(DT)="B"           # change keys to do other stuff...
> >> key(DT)="A"           # get back to key A
> >> DT[J(1:3),roll=TRUE]  # output 2 does not match output 1
> >         A  B
> > [1,] 1 b1
> > [2,] 2 b2
> > [3,] 2 b3
> > [4,] 3 b3
> >
> > (Also, as an aside, I get identical output in the two executions of
> > DT[J(1:3),roll=TRUE] when I start with the table DT =
> > data.table(A=c(1,2,2),B=c("b1","b2","b3"),key="A") instead.)
> >
> > I’m sure there must also be other reverberations—beyond the effect on the
> > roll option.
> >
> > Any insight would be of interest.  Thank you.
> 
> I don't think it's all that surprising in this case.
> 
> The original "keying" on A does not take your B column into consideration here:
> 
> R> DT = data.table(A=c(1,2,2),B=c("b1","b3","b2"),key="A")
> 
> But then when you set the key on "B", of course "b2" will have to be
> rearranged to come before "b3".
> 
> After you set the key on your DT back to A, A itself is in order
> already (1,2,2) == (1,2,2) so no moving around happens. You should
> note that the reordering in data.table is "stable" (I'm 95% sure on
> that, Matthew can verify) so "ties" will appear in the same order as
> they did in the original input.
Yes, the sort is stable for ties. Have just committed changes to ?setkey
to make that clear now.
> 
> If it is important in your scenario that this doesn't change when you
> "roll", you can always set a compound key on DT prior to doing that
> calculation:
> 
> R> key(DT) <- c('A', 'B')
> 
> Anyway you shake it, if you run your code, then set the key to just
> "B", then again to c("A", "B") to "roll" again, your results will be
> the same.
> 
Exactly. 2-column key seems like the ticket for Alex. You don't need to
join to all the columns of the key.

key(DT) = c("A","B")
DT[J(1:3),roll=TRUE]  # join to 1st column of key
DT[J(1:3,"b2")]       # join to both columns of key, where "b2" is
recycled by J() to match the length of 1:3 in this example




More information about the datatable-help mailing list