[datatable-help] indexing with NA i

Matthew Dowle mdowle at mdowle.plus.com
Wed Jun 22 20:35:14 CEST 2011


Yes, it's by intent. Thanks for raising documentation issue.

Also, there is automatic coercion of NA (type logical, and is therefore
recycled by [.data.frame) to NA_integer_ (which doesn't recycle).

These are commented internally in [.data.table :

if (is.logical(i)) {
    if (identical(i,NA)) i = NA_integer_ 
    # see DT[NA] thread re recycling of NA logical
    else i[is.na(i)] = FALSE  
    # avoids DT[!is.na(ColA) & ColA==ColB]
}

There was a bug fix in NEWS for v1.5 :
o   DT[NA] now returns 1 row of NA rather than the whole table
    via standard NA logical recycling. A single NA logical is
    a special case and is now replaced by NA_integer_. Thanks
    to Branson Owen for highlighting the issue.


So I have just added 'Other than...' to ?data.table :

integer and logical vectors work the same way they do in [.data.frame.
Other than NAs in logical i are treated as FALSE, and a single NA
logical is not recycled to match the number of rows, as it is in
[.data.frame. Rather, a 1-row table containing NA for all columns is
returned.

and an item to NEWS :

    o   ?data.table now documents that logical i is not quite
        the same as i in [.data.frame. NA are treated as FALSE,
        and DT[NA] returns 1 row of NA, unlike [.data.frame.
        Three points have been added to FAQ 2.17. Thanks to
        Johann Hibschman for highlighting.


the 3 items added (to the 9) items in FAQ 2.17 :

o   DT[NA] returns 1 row of NA, but DF[NA] is a copy of DF containing NA
throughout. The user probably forgot that NA is type logical in R, and
is therefore recycled by [.data.frame. Intention was probably
DF[NA_integer_]. [.data.table does this automatically for convenience.

o   DT[c(TRUE,NA,FALSE)] treats the NA as FALSE, but
DF[c(TRUE,NA,FALSE)] returns NA rows for each NA.

o   DT[ColA==ColB] == DF[!is.na(ColA) & !is.na(ColB) & ColA==ColB,]


Tests were already in place for DT[NA] (#206,#207)  but I can't see one
for ColA==ColB where either contain NA, will add ...

Matthew



On Wed, 2011-06-22 at 10:32 -0500, Johann Hibschman wrote:
> The documentation for data.table says that logical vectors "work the
> same way they do in '[.data.frame'".  They do not, since NA entries are
> treated as FALSE, instead of generating an all-NA row.  Similarly,
> expressions returning NA are treated as FALSE.
> 
> Personally, I like this behavior much more than the default for
> data.frame.  Is it by accident or by intent?  More pragmatically, can I
> count on it to remain the same, barring of course a big soul-searching
> discussion on the mailing list?
> 
> If it's by intent, maybe add a line to the documentation, "with the
> exception that NA logical values are treated as FALSE," or some such?
> 
> -Johann
> 
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help




More information about the datatable-help mailing list