[datatable-help] sorting on a floating point column
frederik at ofb.net
frederik at ofb.net
Thu Jan 28 00:03:16 CET 2016
data.table 1.9.6
What's surprising is that sorting a list of floats wouldn't do the
obvious thing, and sort them exactly. Is it surprising that this would
be surprising?
Why do you want a minimal test case, when setNumericRounding explains
that the behavior I reported is intentional?
I now see that this is also documented in the data.table::order page.
So I guess it is already "documented visibly".
And setNumericRounding explains that it is slightly faster to ignore
the last two bytes, requiring fewer radix sort passes.
I wanted to share my experience that this behavior is confusing. Thank
you at least for pointing me to your documentation.
Frederick
On Wed, Jan 27, 2016 at 10:13:44PM +0100, Arunkumar Srinivasan wrote:
> This is following up on a thread from a couple years ago:
> http://lists.r-forge.r-project.org/pipermail/datatable-help/2013-May/001689.html
> Things have changed A LOT! I suggest you keep up-to-date by reading the README about bug fixes and features from the github project page: https://github.com/Rdatatable/data.table
>
> I ran into this problem myself, it took a bit of time to debug because it is so surprising.
> What’s surprising? Reproducible example please. data.table package version, R version as well please.
> Without that my best guess is for you to look at `?setNumericRounding`.
>
> --
> Arun
>
> On 27 January 2016 at 21:40:23, frederik at ofb.net (frederik at ofb.net) wrote:
>
> This is following up on a thread from a couple years ago:
>
> http://lists.r-forge.r-project.org/pipermail/datatable-help/2013-May/001689.html
>
> I ran into this problem myself, it took a bit of time to debug because
> it is so surprising.
>
> In my case, I was using order() to sort a list of floats.
>
> I expected the result to be monotonic but it wasn't!
>
> Then I found out that the problem was due to 'order' being part of the
> data.table library. By using base::order, I was able to get correct
> behavior.
>
> I don't understand why improperly ordering floating point data helps
> the data.table library accomplish anything, whether it is looking up
> keys or what.
>
> Also, it must be much slower to compare floats with a tolerance, than
> to just compare them. I seem to recall that floats were designed so
> that normal comparison is quite fast.
>
> Please fix this bug, or at least document it more visibly.
>
> Thank you,
>
> Frederick Eaton
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
More information about the datatable-help
mailing list