<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN">
<html><body>
<p> </p>
<p>Hi Arun,</p>
<p>Hm, good point. Is data.table consistent with SQL already, for both == and !=, and so no change needed? And it was correct for Frank to be mistaken. Maybe just some more documentation and examples needed then. Are you happy that DT[!(x==.)] and DT[x!=.] do treat NA inconsistently? :</p>
<p>http://stackoverflow.com/questions/16239153/dtx-and-dtx-treat-na-in-x-inconsistently</p>
<p>"na.rm = TRUE/FALSE" sounds good to me. I'd only considered nomatch before in the context of joins, not logical subsets.</p>
<p> </p>
<p>Thanks, Matthew</p>
<p> </p>
<p>On 10.06.2013 08:11, Arunkumar Srinivasan wrote:</p>
<blockquote type="cite" style="padding-left:5px; border-left:#1010ff 2px solid; margin-left:5px; width:100%"><!-- html ignored --><!-- head ignored --><!-- meta ignored -->
<div>
<div>Matthew,</div>
<div>Regarding your suggestion of changes regarding Frank's post here: <a style="color: #006ae3;" href="http://stackoverflow.com/a/17008872/559784">http://stackoverflow.com/a/17008872/559784</a> I find it a bit more confusing and frankly not like sql.</div>
<div>You wrote: <span style="line-height: 18px; text-align: left;">"If I haven't understood correctly feel free to correct, otherwise the change will get made eventually. It will need to be done in a way that considers compound expressions; e.g., </span><code style="margin: 0px; padding: 1px 5px; border: 0px; vertical-align: baseline; background-color: #eeeeee; line-height: 18px; text-align: left;">DT[colA=="foo" & colB!="bar"]</code><span style="line-height: 18px; text-align: left;"> should exclude rows with </span><code style="margin: 0px; padding: 1px 5px; border: 0px; vertical-align: baseline; background-color: #eeeeee; line-height: 18px; text-align: left;">NA</code><span style="line-height: 18px; text-align: left;"> in </span><code style="margin: 0px; padding: 1px 5px; border: 0px; vertical-align: baseline; background-color: #eeeeee; line-height: 18px; text-align: left;">colA</code><span style="line-height: 18px; text-align: left;"> but include rows where </span><code style="margin: 0px; padding: 1px 5px; border: 0px; vertical-align: baseline; background-color: #eeeeee; line-height: 18px; text-align: left;">colA</code><span style="line-height: 18px; text-align: left;"> is non-</span><code style="margin: 0px; padding: 1px 5px; border: 0px; vertical-align: baseline; background-color: #eeeeee; line-height: 18px; text-align: left;">NA</code><span style="line-height: 18px; text-align: left;"> but </span><code style="margin: 0px; padding: 1px 5px; border: 0px; vertical-align: baseline; background-color: #eeeeee; line-height: 18px; text-align: left;">colB</code><span style="line-height: 18px; text-align: left;"> is </span><code style="margin: 0px; padding: 1px 5px; border: 0px; vertical-align: baseline; background-color: #eeeeee; line-height: 18px; text-align: left;">NA</code><span style="line-height: 18px; text-align: left;">. Similarly, </span><code style="margin: 0px; padding: 1px 5px; border: 0px; vertical-align: baseline; background-color: #eeeeee; line-height: 18px; text-align: left;">DT[colA!=colB]</code><span style="line-height: 18px; text-align: left;"> should include rows where either colA or colB is </span><code style="margin: 0px; padding: 1px 5px; border: 0px; vertical-align: baseline; background-color: #eeeeee; line-height: 18px; text-align: left;">NA</code><span style="line-height: 18px; text-align: left;"> but not both. And perhaps </span><code style="margin: 0px; padding: 1px 5px; border: 0px; vertical-align: baseline; background-color: #eeeeee; line-height: 18px; text-align: left;">DT[colA==colB]</code><span style="line-height: 18px; text-align: left;"> should include rows where both</span><code style="margin: 0px; padding: 1px 5px; border: 0px; vertical-align: baseline; background-color: #eeeeee; line-height: 18px; text-align: left;">colA</code><span style="line-height: 18px; text-align: left;"> and </span><code style="margin: 0px; padding: 1px 5px; border: 0px; vertical-align: baseline; background-color: #eeeeee; line-height: 18px; text-align: left;">colB</code><span style="line-height: 18px; text-align: left;"> are </span><code style="margin: 0px; padding: 1px 5px; border: 0px; vertical-align: baseline; background-color: #eeeeee; line-height: 18px; text-align: left;">NA</code><span style="line-height: 18px; text-align: left;"> (which it doesn't currently, I believe)."</span></div>
<div><span style="line-height: 18px; text-align: left;"><br /></span></div>
<div style="text-align: left;">Even though sql (ex: sqldf) has a different way of handling NAs when compared to data.frame, it doesn't seem to find NA == NA. That is,</div>
<div style="text-align: left;">df <- data.frame(x = c(1:3,NA), y = c(NA,4:5,NA))</div>
<div style="text-align: left;">require(sqldf)</div>
<div style="text-align: left;">sqldf("select * from df where x == y")</div>
<div style="text-align: left;"># returns empty data.frame</div>
<div>sqldf("select * from df where x != y")</div>
<div>
<div> x y</div>
<div>1 2 4</div>
<div>2 3 5</div>
</div>
<div>
<div>That is, at least in sqldf package, NA is not == NA and NA is not != NA which is very much in coherence with R's default NA == NA and NA != NA (both giving NA). But I don't think they it's considered FALSE here. It just acts like the "subset" function where all entries that were evaluated to NAs are simply dropped. But with data.table philosophy NA != NA should be evaluated to TRUE, which I don't think (from what I meagrely understand from sql) is what sql does. Please correct me if I've got it wrong.</div>
<div>I think it is clearer and simpler if "NAs are just dropped" after evaluating logical expressions. It would be also easy to document this and easier to grasp, imho. This would also explain Frank's post for NA rows being removed. </div>
<div>And probably if there is more consensus an option for "na.rm = TRUE/FALSE" could be added?</div>
<div>Arun</div>
</div>
</div>
<div></div>
</blockquote>
<p> </p>
<div> </div>
</body></html>