[datatable-help] Subsetting bug?

Gene Leynes gleynes+r at gmail.com
Fri Jan 11 18:21:38 CET 2013


Jim,

Thanks for your insight.  I forget about that one sometimes.

For future readers this is a reference to the R FAQ:
http://cran.r-project.org/doc/FAQ/R-FAQ.html
Not the data.table FAQ that is included in the data.table package


7.31 Why doesn't R think these numbers are equal?

The only numbers that can be represented exactly in R's numeric type are
integers and fractions whose denominator is a power of 2. Other numbers
have to be rounded to (typically) 53 binary digits accuracy. As a result,
two floating point numbers will not reliably be equal unless they have been
computed by the same algorithm, and not always even then. For example

     R> a <- sqrt(2)
     R> a * a == 2
     [1] FALSE
     R> a * a - 2
     [1] 4.440892e-16

The function all.equal() compares two objects using a numeric
tolerance of .Machine$double.eps
^ 0.5. If you want much greater accuracy than this you will need to
consider error propagation carefully.

For more information, see e.g. David Goldberg (1991), “What Every Computer
Scientist Should Know About Floating-Point Arithmetic”, *ACM Computing
Surveys*, *23/1*, 5–48, also available via
http://www.validlab.com/goldberg/paper.pdf.

To quote from “The Elements of Programming Style” by Kernighan and Plauger:

*10.0 times 0.1 is hardly ever 1.0*.




On Fri, Jan 11, 2013 at 9:27 AM, jim holtman <jholtman at gmail.com> wrote:

> this sounds like FAQ 7.31
>
>
> > x <- seq(0,1,.1)
>
> > print(x,digits = 20)
>  [1] 0.00000000000000000000 0.10000000000000000555
> 0.20000000000000001110 0.30000000000000004441
>  [5] 0.40000000000000002220 0.50000000000000000000
> 0.60000000000000008882 0.70000000000000006661
>  [9] 0.80000000000000004441 0.90000000000000002220 1.00000000000000000000
>
> try using:
>
>  quantile = seq(0,100,10)
>
> and then test for integer values
>
>
>
> On Fri, Jan 11, 2013 at 10:10 AM, Gene Leynes <gleynes+r at gmail.com> wrote:
> >
> > Yesterday I was having a problem subsetting based on a numeric key.  I
> had
> > some quantile data and I could get the 10% and 20%, but getting the 30%
> > failed.  I was using quantile==.1, quantile==.2, etc.
> >
> > Thanks to the FAQ I realize that I should be using J to subset and
> setting
> > the key first,
> > Thanks to StackOverflow I realize now that I should using J to subset on
> > numeric keys fixes the problem.
> >
> > However, this doesn't explain why using a vector search would sometimes
> work
> > and sometimes fail.
> >
> > Thank you,
> >    Gene Leynes
> >
> >
> >>
> >> library(data.table)
> > data.table 1.8.6  For help type: help("data.table")
> >>
> >> set.seed(1)
> >>
> >> ## Make an example data table
> >> dat = data.table(
> > + index = 1:1e5,
> > + groups = sample(letters[1:3], 1e5, replace=TRUE),
> > + values = rnorm(1e5))
> >>
> >> ## Calculate some quantiles for each group
> >> dat_quants = dat[
> > +         i=TRUE,
> > +         j=list(
> > +             quantile = seq(0,1,.1),
> > +             value = quantile(values, seq(0,1,.1))),
> > +         keyby=groups]
> >>
> >> ## Print the 10% 20% and 30% quantiles... but 30% doesn't work
> >> dat_quants[quantile==.1, ]
> >    groups quantile     value
> > 1:      a      0.1 -1.284277
> > 2:      b      0.1 -1.280095
> > 3:      c      0.1 -1.291173
> >> dat_quants[quantile==.2, ]
> >    groups quantile      value
> > 1:      a      0.2 -0.8413631
> > 2:      b      0.2 -0.8397591
> > 3:      c      0.2 -0.8423560
> >> dat_quants[quantile==.3, ]
> > Empty data.table (0 rows) of 3 cols: groups,quantile,value
> >>
> >>
> >> ## Changing to character will allow all of them to work
> >> dat_quants$quantile = as.character(dat_quants$quantile)
> >>
> >
> >> sessionInfo()
> > R version 2.15.2 (2012-10-26)
> > Platform: x86_64-w64-mingw32/x64 (64-bit)
> >
> > locale:
> > [1] LC_COLLATE=English_United States.1252
> > [2] LC_CTYPE=English_United States.1252
> > [3] LC_MONETARY=English_United States.1252
> > [4] LC_NUMERIC=C
> > [5] LC_TIME=English_United States.1252
> >
> > attached base packages:
> > [1] stats     graphics  grDevices utils     datasets  methods   base
> >
> > other attached packages:
> > [1] data.table_1.8.6 geneorama_1.0
> >>
> >>
> >
> >
> >
> >
> > _______________________________________________
> > datatable-help mailing list
> > datatable-help at lists.r-forge.r-project.org
> >
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
>
>
> --
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130111/51bdae81/attachment-0001.html>


More information about the datatable-help mailing list