[datatable-help] indexing with NA i
Matthew Dowle
mdowle at mdowle.plus.com
Wed Jun 22 22:14:21 CEST 2011
Test added :
# Test x==y where either column contain NA.
DT = data.table(x=c(1,2,NA,3,4),y=c(0,2,3,NA,4),z=1:5)
test(279, DT[x==y,sum(z)], 7L)
# In data.frame the equivalent is :
# > DF = as.data.frame(DT)
# > DF[DF$x==DF$y,]
# x y z
# 2 2 2 2
# NA NA NA NA
# NA.1 NA NA NA
# 5 4 4 5
# > DF[!is.na(DF$x) & !is.na(DF$y) & DF$x==DF$y,]
# x y z
# 2 2 2 2
# 5 4 4 5
On Wed, 2011-06-22 at 19:35 +0100, Matthew Dowle wrote:
> Yes, it's by intent. Thanks for raising documentation issue.
>
> Also, there is automatic coercion of NA (type logical, and is therefore
> recycled by [.data.frame) to NA_integer_ (which doesn't recycle).
>
> These are commented internally in [.data.table :
>
> if (is.logical(i)) {
> if (identical(i,NA)) i = NA_integer_
> # see DT[NA] thread re recycling of NA logical
> else i[is.na(i)] = FALSE
> # avoids DT[!is.na(ColA) & ColA==ColB]
> }
>
> There was a bug fix in NEWS for v1.5 :
> o DT[NA] now returns 1 row of NA rather than the whole table
> via standard NA logical recycling. A single NA logical is
> a special case and is now replaced by NA_integer_. Thanks
> to Branson Owen for highlighting the issue.
>
>
> So I have just added 'Other than...' to ?data.table :
>
> integer and logical vectors work the same way they do in [.data.frame.
> Other than NAs in logical i are treated as FALSE, and a single NA
> logical is not recycled to match the number of rows, as it is in
> [.data.frame. Rather, a 1-row table containing NA for all columns is
> returned.
>
> and an item to NEWS :
>
> o ?data.table now documents that logical i is not quite
> the same as i in [.data.frame. NA are treated as FALSE,
> and DT[NA] returns 1 row of NA, unlike [.data.frame.
> Three points have been added to FAQ 2.17. Thanks to
> Johann Hibschman for highlighting.
>
>
> the 3 items added (to the 9) items in FAQ 2.17 :
>
> o DT[NA] returns 1 row of NA, but DF[NA] is a copy of DF containing NA
> throughout. The user probably forgot that NA is type logical in R, and
> is therefore recycled by [.data.frame. Intention was probably
> DF[NA_integer_]. [.data.table does this automatically for convenience.
>
> o DT[c(TRUE,NA,FALSE)] treats the NA as FALSE, but
> DF[c(TRUE,NA,FALSE)] returns NA rows for each NA.
>
> o DT[ColA==ColB] == DF[!is.na(ColA) & !is.na(ColB) & ColA==ColB,]
>
>
> Tests were already in place for DT[NA] (#206,#207) but I can't see one
> for ColA==ColB where either contain NA, will add ...
>
> Matthew
>
>
>
> On Wed, 2011-06-22 at 10:32 -0500, Johann Hibschman wrote:
> > The documentation for data.table says that logical vectors "work the
> > same way they do in '[.data.frame'". They do not, since NA entries are
> > treated as FALSE, instead of generating an all-NA row. Similarly,
> > expressions returning NA are treated as FALSE.
> >
> > Personally, I like this behavior much more than the default for
> > data.frame. Is it by accident or by intent? More pragmatically, can I
> > count on it to remain the same, barring of course a big soul-searching
> > discussion on the mailing list?
> >
> > If it's by intent, maybe add a line to the documentation, "with the
> > exception that NA logical values are treated as FALSE," or some such?
> >
> > -Johann
> >
> > _______________________________________________
> > datatable-help mailing list
> > datatable-help at lists.r-forge.r-project.org
> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
More information about the datatable-help
mailing list