[datatable-help] sorting on floating point column

Matthew Dowle mdowle at mdowle.plus.com
Tue Apr 30 16:09:25 CEST 2013


 

Hi, 

data.table sorts double within machine tolerance : 

>
sqrt(.Machine$double.eps)
[1] 1.490116e-08
> 

i.e. numbers closer than
this are considered equal.

Otherwise we wouldn't be able to do things
like DT[.(3.14)].

I had a quick look, see arguments of
data.table:::ordernumtol which takes "tol" but there is no option
provided (yet) to change this. Do we need one?

In the examples section
of one of the help pages it has an example which generates a series of
numers very close together using pi. Note that your numbers are both
close together, and, very close to 0.

Matthew

On 30.04.2013 14:52,
Arunkumar Srinivasan wrote: 

> Hi there, 
> I just saw something
strange when I was sorting a column of p-values. I checked the
data.table bug tracker for words "sort" and "floating point" and there
were no hits for this case. There's a bug for "integer 64" sort on a
column though. 
> So, here's a reproducible example. I'd be glad to file
a bug, if it is and be corrected if it's something I am doing wrong. 
>

> set.seed(45) 
> dt <- data.table(x=sample(50), y= sample(c(seq(0, 1,
length.out=1000), 7000000:7000100), 50)/1e7) 
> head(dt) 
> x y 
> 1: 32
5.395395e-08 
> 2: 16 6.956957e-08 
> 3: 12 2.142142e-08 
> 4: 18
5.855856e-08 
> 5: 17 6.216216e-08 
> 6: 14 5.025025e-08 
> setkey(dt,
"y") # sort by column y 
> head(dt, 10) 
> x y 
> 1: 47 1.401401e-09 
>
2: 12 2.142142e-08 
> 3: 24 1.391391e-08 
> 4: 43 9.809810e-09 <~~~
obviously false 
> 5: 1 2.932933e-08 
> 6: 48 2.562563e-08 
> 7: 49
1.891892e-08 
> 8: 40 2.182182e-08 
> 9: 9 7.307307e-09 <~~~ obviously
false 
> 10: 45 2.482482e-08 
> 
> Best, 
> Arun

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130430/6d78e494/attachment.html>


More information about the datatable-help mailing list