[datatable-help] indexing with NA i
Matthew Dowle
mdowle at mdowle.plus.com
Wed Jun 22 20:35:14 CEST 2011
Yes, it's by intent. Thanks for raising documentation issue.
Also, there is automatic coercion of NA (type logical, and is therefore
recycled by [.data.frame) to NA_integer_ (which doesn't recycle).
These are commented internally in [.data.table :
if (is.logical(i)) {
if (identical(i,NA)) i = NA_integer_
# see DT[NA] thread re recycling of NA logical
else i[is.na(i)] = FALSE
# avoids DT[!is.na(ColA) & ColA==ColB]
}
There was a bug fix in NEWS for v1.5 :
o DT[NA] now returns 1 row of NA rather than the whole table
via standard NA logical recycling. A single NA logical is
a special case and is now replaced by NA_integer_. Thanks
to Branson Owen for highlighting the issue.
So I have just added 'Other than...' to ?data.table :
integer and logical vectors work the same way they do in [.data.frame.
Other than NAs in logical i are treated as FALSE, and a single NA
logical is not recycled to match the number of rows, as it is in
[.data.frame. Rather, a 1-row table containing NA for all columns is
returned.
and an item to NEWS :
o ?data.table now documents that logical i is not quite
the same as i in [.data.frame. NA are treated as FALSE,
and DT[NA] returns 1 row of NA, unlike [.data.frame.
Three points have been added to FAQ 2.17. Thanks to
Johann Hibschman for highlighting.
the 3 items added (to the 9) items in FAQ 2.17 :
o DT[NA] returns 1 row of NA, but DF[NA] is a copy of DF containing NA
throughout. The user probably forgot that NA is type logical in R, and
is therefore recycled by [.data.frame. Intention was probably
DF[NA_integer_]. [.data.table does this automatically for convenience.
o DT[c(TRUE,NA,FALSE)] treats the NA as FALSE, but
DF[c(TRUE,NA,FALSE)] returns NA rows for each NA.
o DT[ColA==ColB] == DF[!is.na(ColA) & !is.na(ColB) & ColA==ColB,]
Tests were already in place for DT[NA] (#206,#207) but I can't see one
for ColA==ColB where either contain NA, will add ...
Matthew
On Wed, 2011-06-22 at 10:32 -0500, Johann Hibschman wrote:
> The documentation for data.table says that logical vectors "work the
> same way they do in '[.data.frame'". They do not, since NA entries are
> treated as FALSE, instead of generating an all-NA row. Similarly,
> expressions returning NA are treated as FALSE.
>
> Personally, I like this behavior much more than the default for
> data.frame. Is it by accident or by intent? More pragmatically, can I
> count on it to remain the same, barring of course a big soul-searching
> discussion on the mailing list?
>
> If it's by intent, maybe add a line to the documentation, "with the
> exception that NA logical values are treated as FALSE," or some such?
>
> -Johann
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
More information about the datatable-help
mailing list