Jim,<div><br></div><div>Thanks for your insight. I forget about that one sometimes.</div><div><br></div><div>For future readers this is a reference to the R FAQ: <a href="http://cran.r-project.org/doc/FAQ/R-FAQ.html">http://cran.r-project.org/doc/FAQ/R-FAQ.html</a></div>
<div>Not the data.table FAQ that is included in the data.table package</div><div><br></div><div><br></div><div><h3 class="section" style="background-color:white;color:rgb(102,102,102);font-family:monospace;font-size:large">
7.31 Why doesn't R think these numbers are equal?</h3><p style="margin-top:0.6ex;margin-bottom:1.2ex;font-family:'Times New Roman';font-size:medium">The only numbers that can be represented exactly in R's numeric type are integers and fractions whose denominator is a power of 2. Other numbers have to be rounded to (typically) 53 binary digits accuracy. As a result, two floating point numbers will not reliably be equal unless they have been computed by the same algorithm, and not always even then. For example</p>
<pre class="example"> R> a <- sqrt(2)
R> a * a == 2
[1] FALSE
R> a * a - 2
[1] 4.440892e-16
</pre><p style="margin-top:0.6ex;margin-bottom:1.2ex;font-family:'Times New Roman';font-size:medium">The function <code>all.equal()</code> compares two objects using a numeric tolerance of <code>.Machine$double.eps ^ 0.5</code>. If you want much greater accuracy than this you will need to consider error propagation carefully.</p>
<p style="margin-top:0.6ex;margin-bottom:1.2ex;font-family:'Times New Roman';font-size:medium">For more information, see e.g. David Goldberg (1991), “What Every Computer Scientist Should Know About Floating-Point Arithmetic”, <em>ACM Computing Surveys</em>, <strong>23/1</strong>, 5–48, also available via<a href="http://www.validlab.com/goldberg/paper.pdf">http://www.validlab.com/goldberg/paper.pdf</a>.</p>
<p style="margin-top:0.6ex;margin-bottom:1.2ex;font-family:'Times New Roman';font-size:medium">To quote from “The Elements of Programming Style” by Kernighan and Plauger:</p><blockquote style="font-family:'Times New Roman';font-size:medium">
<em>10.0 times 0.1 is hardly ever 1.0</em>.</blockquote></div><div><br></div><div><br><br><div class="gmail_quote">On Fri, Jan 11, 2013 at 9:27 AM, jim holtman <span dir="ltr"><<a href="mailto:jholtman@gmail.com" target="_blank">jholtman@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">this sounds like FAQ 7.31<br>
<br>
<br>
> x <- seq(0,1,.1)<br>
<br>
> print(x,digits = 20)<br>
[1] 0.00000000000000000000 0.10000000000000000555<br>
0.20000000000000001110 0.30000000000000004441<br>
[5] 0.40000000000000002220 0.50000000000000000000<br>
0.60000000000000008882 0.70000000000000006661<br>
[9] 0.80000000000000004441 0.90000000000000002220 1.00000000000000000000<br>
<br>
try using:<br>
<br>
quantile = seq(0,100,10)<br>
<br>
and then test for integer values<br>
<div><div class="h5"><br>
<br>
<br>
On Fri, Jan 11, 2013 at 10:10 AM, Gene Leynes <<a href="mailto:gleynes%2Br@gmail.com">gleynes+r@gmail.com</a>> wrote:<br>
><br>
> Yesterday I was having a problem subsetting based on a numeric key. I had<br>
> some quantile data and I could get the 10% and 20%, but getting the 30%<br>
> failed. I was using quantile==.1, quantile==.2, etc.<br>
><br>
> Thanks to the FAQ I realize that I should be using J to subset and setting<br>
> the key first,<br>
> Thanks to StackOverflow I realize now that I should using J to subset on<br>
> numeric keys fixes the problem.<br>
><br>
> However, this doesn't explain why using a vector search would sometimes work<br>
> and sometimes fail.<br>
><br>
> Thank you,<br>
> Gene Leynes<br>
><br>
><br>
>><br>
>> library(data.table)<br>
> data.table 1.8.6 For help type: help("data.table")<br>
>><br>
>> set.seed(1)<br>
>><br>
>> ## Make an example data table<br>
>> dat = data.table(<br>
> + index = 1:1e5,<br>
> + groups = sample(letters[1:3], 1e5, replace=TRUE),<br>
> + values = rnorm(1e5))<br>
>><br>
>> ## Calculate some quantiles for each group<br>
>> dat_quants = dat[<br>
> + i=TRUE,<br>
> + j=list(<br>
> + quantile = seq(0,1,.1),<br>
> + value = quantile(values, seq(0,1,.1))),<br>
> + keyby=groups]<br>
>><br>
>> ## Print the 10% 20% and 30% quantiles... but 30% doesn't work<br>
>> dat_quants[quantile==.1, ]<br>
> groups quantile value<br>
> 1: a 0.1 -1.284277<br>
> 2: b 0.1 -1.280095<br>
> 3: c 0.1 -1.291173<br>
>> dat_quants[quantile==.2, ]<br>
> groups quantile value<br>
> 1: a 0.2 -0.8413631<br>
> 2: b 0.2 -0.8397591<br>
> 3: c 0.2 -0.8423560<br>
>> dat_quants[quantile==.3, ]<br>
> Empty data.table (0 rows) of 3 cols: groups,quantile,value<br>
>><br>
>><br>
>> ## Changing to character will allow all of them to work<br>
>> dat_quants$quantile = as.character(dat_quants$quantile)<br>
>><br>
><br>
>> sessionInfo()<br>
> R version 2.15.2 (2012-10-26)<br>
> Platform: x86_64-w64-mingw32/x64 (64-bit)<br>
><br>
> locale:<br>
> [1] LC_COLLATE=English_United States.1252<br>
> [2] LC_CTYPE=English_United States.1252<br>
> [3] LC_MONETARY=English_United States.1252<br>
> [4] LC_NUMERIC=C<br>
> [5] LC_TIME=English_United States.1252<br>
><br>
> attached base packages:<br>
> [1] stats graphics grDevices utils datasets methods base<br>
><br>
> other attached packages:<br>
> [1] data.table_1.8.6 geneorama_1.0<br>
>><br>
>><br>
><br>
><br>
><br>
><br>
</div></div>> _______________________________________________<br>
> datatable-help mailing list<br>
> <a href="mailto:datatable-help@lists.r-forge.r-project.org">datatable-help@lists.r-forge.r-project.org</a><br>
> <a href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help" target="_blank">https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help</a><br>
<span class="HOEnZb"><font color="#888888"><br>
<br>
<br>
--<br>
Jim Holtman<br>
Data Munger Guru<br>
<br>
What is the problem that you are trying to solve?<br>
Tell me what you want to do, not how you want to do it.<br>
</font></span></blockquote></div><br></div>