[datatable-help] My real issue with numeric keys: two numeric keys don't seem to unique correctly.
Chris Neff
caneff at gmail.com
Tue May 15 18:38:45 CEST 2012
Sorry for the last email, I had realised I wasn't 100% sure about what
my issue was. Here it is:
> dt=data.table(x=0.0,y=c(0,.1,0,.2,0))
> setkeyv(dt,c('x','y'))
After doing this, y is not sorted. Note that dt has the row 0,0
repeated three different times. This comes from the following issue I
guess:
> dt
x y
[1,] 0 0.0
[2,] 0 0.1
[3,] 0 0.0
[4,] 0 0.2
[5,] 0 0.0
> unique(dt)
x y
[1,] 0 0.0
[2,] 0 0.1
[3,] 0 0.0
[4,] 0 0.2
[5,] 0 0.0
Unique does not detect the duplicated rows! This also means doing
> dt[,list(count=.N),by=c("x","y")]
Does not group the way it should.
This seems to result from faulty logic in data.table:::fastorder. It
sorts the last column, y, correctly, but when using that to sort the x
column, it returns the identity ordering which clearly doesn't make
sense here.
The final tidbit is that it seems to be because of two numeric columns
together. If you change x to character:
> dt$x=as.character(dt$x)
> unique(dt)
x y
[1,] 0 0.0
[2,] 0 0.1
[3,] 0 0.2
And everything works fine as it should. Shall I file a bug report?
-Chris
More information about the datatable-help
mailing list