[datatable-help] My real issue with numeric keys: two numeric keys don't seem to unique correctly.

Chris Neff caneff at gmail.com
Tue May 22 18:31:53 CEST 2012


Okay, I tried the latest dev version that claimed to fix this issue,
but it is still there in a different way.  This was one hell of an
issue to nail down. An example:

> dt=data.table(x=rep(c(1,2), each=10), y=rnorm(20))
> setkeyv(dt,c("x","y"))

dt is not properly sorted in the y column. This isn't just an issue
with your code. If you try is.unsorted (which you use in setkeyv), it
returns FALSE, so it thinks it is sorted.

Why this is happening is beyond me. I would have thought something
like is.unsorted wouldn't have such a glaring issue.

Could it be some sort of copied code between is.unsorted in base and
your fastorder code?

-Chris

On Tue, May 15, 2012 at 12:55 PM, Matthew Dowle <mdowle at mdowle.plus.com> wrote:
>
> Interesting, thanks. Yes, please file a bug report.
>
>> Sorry for the last email, I had realised I wasn't 100% sure about what
>> my issue was. Here it is:
>>
>>
>>> dt=data.table(x=0.0,y=c(0,.1,0,.2,0))
>>> setkeyv(dt,c('x','y'))
>>
>>
>> After doing this, y is not sorted.  Note that dt has the row 0,0
>> repeated three different times. This comes from the following issue I
>> guess:
>>
>>
>>> dt
>>      x   y
>> [1,] 0 0.0
>> [2,] 0 0.1
>> [3,] 0 0.0
>> [4,] 0 0.2
>> [5,] 0 0.0
>>> unique(dt)
>>      x   y
>> [1,] 0 0.0
>> [2,] 0 0.1
>> [3,] 0 0.0
>> [4,] 0 0.2
>> [5,] 0 0.0
>>
>>
>> Unique does not detect the duplicated rows! This also means doing
>>
>>> dt[,list(count=.N),by=c("x","y")]
>>
>> Does not group the way it should.
>>
>> This seems to result from faulty logic in data.table:::fastorder.  It
>> sorts the last column, y, correctly, but when using that to sort the x
>> column, it returns the identity ordering which clearly doesn't make
>> sense here.
>>
>> The final tidbit is that it seems to be because of two numeric columns
>> together.  If you change x to character:
>>
>>> dt$x=as.character(dt$x)
>>> unique(dt)
>>      x   y
>> [1,] 0 0.0
>> [2,] 0 0.1
>> [3,] 0 0.2
>>
>>
>> And everything works fine as it should.  Shall I file a bug report?
>>
>> -Chris
>> _______________________________________________
>> datatable-help mailing list
>> datatable-help at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>
>
>


More information about the datatable-help mailing list