[datatable-help] Setting key when resulting order of table is not unique

Alexander Peterhansl APeterhansl at GAINCapital.com
Thu Jul 21 19:33:52 CEST 2011


Thank you for your reply.

Yes, it's best to enforce a unique ordering with an additional key, as you said.  This works as expected, but seems to not be in line with what the help page says.

Example:
> DT = data.table(index1=c(1,2,2),index2=c(1,2,3),values=c("a","b","c"))
> key(DT) <- c("index1","index2")
> DT
     index1 index2 values
[1,]      1      1      a
[2,]      2      2      b
[3,]      2      3      c

> DT[J(1:3),roll=TRUE]
     index1 index2 values
[1,]      1      1      a
[2,]      2      2      b
[3,]      2      3      c
[4,]      3      3      c

The "rolling index" is index1 here.  Isn't index1 considered the first column of DT's key?  

In the help pages -- help(data.table) -- the following is said about the "roll" option:
Applies to the last column of x's key, which is generally a date but can be any ordered variable, with gaps. When roll=TRUE if i's row matches to all but the last column of x's key, and the value of the last column falls in a gap (including after the last observation for that group), the prevailing value in x is rolled forward.

-Alex



-----Original Message-----
From: Steve Lianoglou [mailto:mailinglist.honeypot at gmail.com] 
Sent: Thursday, July 21, 2011 11:24 AM
To: Alexander Peterhansl
Cc: datatable-help at lists.r-forge.r-project.org
Subject: Re: [datatable-help] Setting key when resulting order of table is not unique

Hi,

On Thu, Jul 21, 2011 at 11:02 AM, Alexander Peterhansl <APeterhansl at gaincapital.com> wrote:
> Dear Data Table Help List,
>
> I am using data.table version 1.6 (with R version 2.12.2, 64-bit on 
> Windows 7).  Suppose I have a table whose key does not give me a unique ordering.
> Then the output of the "roll" option will be arbitrary (i.e., it will 
> depend on what one does between the two executions).  Is this something noteworthy?
>
> Please see output of the following:
>
>> DT = data.table(A=c(1,2,2),B=c("b1","b3","b2"),key="A")
>
>> DT[J(1:3),roll=TRUE]  # output 1
>
>         A  B
> [1,] 1 b1
> [2,] 2 b3
> [3,] 2 b2
> [4,] 3 b2
>
>> key(DT)="B"           # change keys to do other stuff...
>> key(DT)="A"           # get back to key A DT[J(1:3),roll=TRUE]  # 
>> output 2 does not match output 1
>         A  B
> [1,] 1 b1
> [2,] 2 b2
> [3,] 2 b3
> [4,] 3 b3
>
> (Also, as an aside, I get identical output in the two executions of 
> DT[J(1:3),roll=TRUE] when I start with the table DT =
> data.table(A=c(1,2,2),B=c("b1","b2","b3"),key="A") instead.)
>
> I'm sure there must also be other reverberations-beyond the effect on 
> the roll option.
>
> Any insight would be of interest.  Thank you.

I don't think it's all that surprising in this case.

The original "keying" on A does not take your B column into consideration here:

R> DT = data.table(A=c(1,2,2),B=c("b1","b3","b2"),key="A")

But then when you set the key on "B", of course "b2" will have to be rearranged to come before "b3".

After you set the key on your DT back to A, A itself is in order already (1,2,2) == (1,2,2) so no moving around happens. You should note that the reordering in data.table is "stable" (I'm 95% sure on that, Matthew can verify) so "ties" will appear in the same order as they did in the original input.

If it is important in your scenario that this doesn't change when you "roll", you can always set a compound key on DT prior to doing that
calculation:

R> key(DT) <- c('A', 'B')

Anyway you shake it, if you run your code, then set the key to just "B", then again to c("A", "B") to "roll" again, your results will be the same.

--
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact


More information about the datatable-help mailing list