<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN">

<html><body>

<p> </p>

<p>On 10.06.2013 09:53, Arunkumar Srinivasan wrote:</p>

<blockquote type="cite" style="padding-left:5px; border-left:#1010ff 2px solid; margin-left:5px; width:100%">

<div>However, one inconsistency I find with the use of `!(x==.)` is this:</div>

<div>dt1 <- data.table(x = 0:4, y=5:9)</div>

<div>> dt1[!(x)]</div>

<div>

<div>   x  y</div>

<div>1: 4 10</div>

</div>

<div>Not the correct result! If `!(x==.)` is equal to `x != .`, then the correct result should be the first row, isn't it?</div>

</blockquote>

<div>That result makes perfect sense to me.   I don't think of !(x==.) being the same as  x!=.    ! is simply a prefix.    It's all the rows that aren't returned if the ! prefix wasn't there.</div>

<blockquote type="cite" style="padding-left:5px; border-left:#1010ff 2px solid; margin-left:5px; width:100%">

<div>dt2 <- data.table(x = c(0,3,4,NA), y = c(NA,4,5,NA))</div>

<div>

<div>> dt2[!(x)] # ends up in an error</div>

<div>Error in seq_len(nrow(x))[-irows] : </div>

<div>  only 0's may be mixed with negative subscripts</div>

</div>

</blockquote>

<div>That needs to be fixed.  But we're getting quite theoretical here and far away from common use cases.  Why would we ever have row numbers of the table, as a column of the table itself and want to select the rows by number not mentioned in that column?</div>

<blockquote type="cite" style="padding-left:5px; border-left:#1010ff 2px solid; margin-left:5px; width:100%">

<div>It ends up in an error because `NA` is not removed/replaced.</div>

<div>Running the same on data.frame gives the results it's supposed to.</div>

<div>

<div>Arun</div>

</div>

<p style="color: #a0a0a8;">On Monday, June 10, 2013 at 10:35 AM, Arunkumar Srinivasan wrote:</p>

<blockquote style="border-left-style: solid; border-width: 1px; margin-left: 0px; padding-left: 10px;">

<div>

<div>

<div>Hi Matthew,</div>

<div>My view (from the last reply) more or less reflects mnel's comments here: <a href="http://stackoverflow.com/questions/16239153/dtx-and-dtx-treat-na-in-x-inconsistently#comment23317096_16240143">http://stackoverflow.com/questions/16239153/dtx-and-dtx-treat-na-in-x-inconsistently#comment23317096_16240143</a></div>

<div>

<div>Pasted here for convenience:</div>

<div><span style="margin: 0px; padding: 0px; border: 0px; vertical-align: baseline; background-color: #fafafa; color: #444444; font-family: Arial,; line-height: 17px; text-align: left;"><code style="margin: 0px; padding: 1px  5px; border: 0px; vertical-align: baseline; background-color: #eeeeee; font-family: Consolas, Menlo, Monaco,;">data.table</code> is mimicing <code style="margin: 0px; padding: 1px  5px; border: 0px; vertical-align: baseline; background-color: #eeeeee; font-family: Consolas, Menlo, Monaco,;">subset</code> in its handling of <code style="margin: 0px; padding: 1px  5px; border: 0px; vertical-align: baseline; background-color: #eeeeee; font-family: Consolas, Menlo, Monaco,;">NA</code> values in logical <code style="margin: 0px; padding: 1px  5px; border: 0px; vertical-align: baseline; background-color: #eeeeee; font-family: Consolas, Menlo, Monaco,;">i</code> arguments. -- the only issue is the <code style="margin: 0px; padding: 1px  5px; border: 0px; vertical-align: baseline; background-color: #eeeeee; font-family: Consolas, Menlo, Monaco,;">!</code> prefix signifying a not-join, not the way one might expect. Perhaps the not join prefix could have been <code style="margin: 0px; padding: 1px  5px; border: 0px; vertical-align: baseline; background-color: #eeeeee; font-family: Consolas, Menlo, Monaco,;">NJ</code> not <code style="margin: 0px; padding: 1px  5px; border: 0px; vertical-align: baseline; background-color: #eeeeee; font-family: Consolas, Menlo, Monaco,;">!</code> to avoid this confusion -- this might be another discussion to have on the mailing list -- (I think it is a discussion worth having)</span><span style="color: #444444; font-family: Arial,; line-height: 17px; text-align: left; background-color: #fafafa;"> </span></div>

<div><span style="color: #444444; font-family: Arial,; line-height: 17px; text-align: left; background-color: #fafafa;"><br /></span></div>

<div>Arun</div>

</div>

<p style="color: #a0a0a8;">On Monday, June 10, 2013 at 10:28 AM, Arunkumar Srinivasan wrote:</p>

<blockquote type="cite" style="padding-left:5px; border-left:#1010ff 2px solid; margin-left:5px; width:100%">

<div>

<div>

<div>

<blockquote type="cite" style="padding-left:5px; border-left:#1010ff 2px solid; margin-left:5px; width:100%">

<div>

<p>Hm, good point.  Is data.table consistent with SQL already, for both == and !=, and so no change needed?  </p>

</div>

</blockquote>

<div>Yes, I believe it's already consistent with SQL. However, the current interpretation of NA (documentation) being treated as FALSE is not needed / untrue, imho (Please see below).</div>

<blockquote type="cite" style="padding-left:5px; border-left:#1010ff 2px solid; margin-left:5px; width:100%">

<div>

<p>And it was correct for Frank to be mistaken.  </p>

</div>

</blockquote>

<div>Yes, it seems like he was mistaken.</div>

<blockquote type="cite" style="padding-left:5px; border-left:#1010ff 2px solid; margin-left:5px; width:100%">

<div>

<p>Maybe just some more documentation and examples needed then.</p>

</div>

</blockquote>

<div>It'd be much more appropriate if the documentation reflects the role of subsetting in data.table mimicking "subset" function (in order to be in line with SQL) by dropping NA evaluated logicals. From a couple of posts before, where I pasted the code where NAs are replaced to FALSE were not necessary as `irows <- which(i)` makes clear that `which` is being used to get indices and then subset, this fits perfectly well with the interpretation of NA in data.table. </div>

<blockquote type="cite" style="padding-left:5px; border-left:#1010ff 2px solid; margin-left:5px; width:100%">

<div>

<p>Are you happy that DT[!(x==.)] and DT[x!=.] do treat NA inconsistently? :</p>

<p><a href="http://stackoverflow.com/questions/16239153/dtx-and-dtx-treat-na-in-x-inconsistently">http://stackoverflow.com/questions/16239153/dtx-and-dtx-treat-na-in-x-inconsistently</a></p>

</div>

</blockquote>

<div> Ha, I like the idea behind the use of () in evaluating expressions. It's another nice layer towards simplicity in data.table. But I still think there should not be an inconsistency in equivalent logical operations to provide different results. If !(x== .) and x != . are indeed different, then I'd suppose replacing `!` with a more appropriate name as it's much easier to get confused otherwise. </div>

<div>In essence, either !(x == .) must evaluate to (x != .) if the underlying meaning of these are the same, or the `!` in `!(x==.)` must be replaced to something that's more appropriate for what it's supposed to be. Personally, I prefer the former. It would greatly tighten the structure and consistency.</div>

<blockquote type="cite" style="padding-left:5px; border-left:#1010ff 2px solid; margin-left:5px; width:100%">

<div>

<p>"na.rm = TRUE/FALSE" sounds good to me.  I'd only considered nomatch before in the context of joins, not logical subsets.</p>

</div>

</blockquote>

<div>Yes, I find this option would give more control in evaluating expressions with ease in `i`, by providing both "subset" (default) and the typical data.frame subsetting (na.rm = FALSE).</div>

<div>Best regards,</div>

<div>

<div>Arun</div>

</div>

</div>

</div>

</div>

</blockquote>

</div>

</div>

</blockquote>

</blockquote>

<p> </p>

<div> </div>

</body></html>