[datatable-help] indexing with NA i

Matthew Dowle mdowle at mdowle.plus.com
Wed Jun 22 22:14:21 CEST 2011


Test added :

# Test x==y where either column contain NA.
DT = data.table(x=c(1,2,NA,3,4),y=c(0,2,3,NA,4),z=1:5)
test(279, DT[x==y,sum(z)], 7L)
# In data.frame the equivalent is :
# > DF = as.data.frame(DT)
# > DF[DF$x==DF$y,]
#       x  y  z
# 2     2  2  2
# NA   NA NA NA
# NA.1 NA NA NA
# 5     4  4  5
# > DF[!is.na(DF$x) & !is.na(DF$y) & DF$x==DF$y,]
#   x y z
# 2 2 2 2
# 5 4 4 5

On Wed, 2011-06-22 at 19:35 +0100, Matthew Dowle wrote:
> Yes, it's by intent. Thanks for raising documentation issue.
> 
> Also, there is automatic coercion of NA (type logical, and is therefore
> recycled by [.data.frame) to NA_integer_ (which doesn't recycle).
> 
> These are commented internally in [.data.table :
> 
> if (is.logical(i)) {
>     if (identical(i,NA)) i = NA_integer_ 
>     # see DT[NA] thread re recycling of NA logical
>     else i[is.na(i)] = FALSE  
>     # avoids DT[!is.na(ColA) & ColA==ColB]
> }
> 
> There was a bug fix in NEWS for v1.5 :
> o   DT[NA] now returns 1 row of NA rather than the whole table
>     via standard NA logical recycling. A single NA logical is
>     a special case and is now replaced by NA_integer_. Thanks
>     to Branson Owen for highlighting the issue.
> 
> 
> So I have just added 'Other than...' to ?data.table :
> 
> integer and logical vectors work the same way they do in [.data.frame.
> Other than NAs in logical i are treated as FALSE, and a single NA
> logical is not recycled to match the number of rows, as it is in
> [.data.frame. Rather, a 1-row table containing NA for all columns is
> returned.
> 
> and an item to NEWS :
> 
>     o   ?data.table now documents that logical i is not quite
>         the same as i in [.data.frame. NA are treated as FALSE,
>         and DT[NA] returns 1 row of NA, unlike [.data.frame.
>         Three points have been added to FAQ 2.17. Thanks to
>         Johann Hibschman for highlighting.
> 
> 
> the 3 items added (to the 9) items in FAQ 2.17 :
> 
> o   DT[NA] returns 1 row of NA, but DF[NA] is a copy of DF containing NA
> throughout. The user probably forgot that NA is type logical in R, and
> is therefore recycled by [.data.frame. Intention was probably
> DF[NA_integer_]. [.data.table does this automatically for convenience.
> 
> o   DT[c(TRUE,NA,FALSE)] treats the NA as FALSE, but
> DF[c(TRUE,NA,FALSE)] returns NA rows for each NA.
> 
> o   DT[ColA==ColB] == DF[!is.na(ColA) & !is.na(ColB) & ColA==ColB,]
> 
> 
> Tests were already in place for DT[NA] (#206,#207)  but I can't see one
> for ColA==ColB where either contain NA, will add ...
> 
> Matthew
> 
> 
> 
> On Wed, 2011-06-22 at 10:32 -0500, Johann Hibschman wrote:
> > The documentation for data.table says that logical vectors "work the
> > same way they do in '[.data.frame'".  They do not, since NA entries are
> > treated as FALSE, instead of generating an all-NA row.  Similarly,
> > expressions returning NA are treated as FALSE.
> > 
> > Personally, I like this behavior much more than the default for
> > data.frame.  Is it by accident or by intent?  More pragmatically, can I
> > count on it to remain the same, barring of course a big soul-searching
> > discussion on the mailing list?
> > 
> > If it's by intent, maybe add a line to the documentation, "with the
> > exception that NA logical values are treated as FALSE," or some such?
> > 
> > -Johann
> > 
> > _______________________________________________
> > datatable-help mailing list
> > datatable-help at lists.r-forge.r-project.org
> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> 
> 
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help




More information about the datatable-help mailing list