[datatable-help] My real issue with numeric keys: two numeric keys don't seem to unique correctly.

Chris Neff caneff at gmail.com
Tue May 15 18:38:45 CEST 2012


Sorry for the last email, I had realised I wasn't 100% sure about what
my issue was. Here it is:


> dt=data.table(x=0.0,y=c(0,.1,0,.2,0))
> setkeyv(dt,c('x','y'))


After doing this, y is not sorted.  Note that dt has the row 0,0
repeated three different times. This comes from the following issue I
guess:


> dt
     x   y
[1,] 0 0.0
[2,] 0 0.1
[3,] 0 0.0
[4,] 0 0.2
[5,] 0 0.0
> unique(dt)
     x   y
[1,] 0 0.0
[2,] 0 0.1
[3,] 0 0.0
[4,] 0 0.2
[5,] 0 0.0


Unique does not detect the duplicated rows! This also means doing

> dt[,list(count=.N),by=c("x","y")]

Does not group the way it should.

This seems to result from faulty logic in data.table:::fastorder.  It
sorts the last column, y, correctly, but when using that to sort the x
column, it returns the identity ordering which clearly doesn't make
sense here.

The final tidbit is that it seems to be because of two numeric columns
together.  If you change x to character:

> dt$x=as.character(dt$x)
> unique(dt)
     x   y
[1,] 0 0.0
[2,] 0 0.1
[3,] 0 0.2


And everything works fine as it should.  Shall I file a bug report?

-Chris


More information about the datatable-help mailing list