[datatable-help] NA in joins
Arunkumar Srinivasan
aragorn168b at gmail.com
Thu Sep 18 21:34:00 CEST 2014
Thanks. It'd also be great if you could add an issue for adding the documentation.
On NA non-matching, yes you could add an FR, there isn't one to my recollection. However much of this year has been spent on internal order and binary search in tweaking quite a lot of things. So I'd not be surprised if it is not attended to anytime soon.
Arun
From: Juan Manuel Truppia <jmtruppia at gmail.com>
Reply: Juan Manuel Truppia <jmtruppia at gmail.com>>
Date: September 18, 2014 at 9:14:42 PM
To: Arunkumar Srinivasan <aragorn168b at gmail.com>>
Cc: datatable-help at lists.r-forge.r-project.org <datatable-help at lists.r-forge.r-project.org>>
Subject: Re: [datatable-help] NA in joins
It might help, specially where data.table is compared to SQL. However, I think that having merge (and maybe [.data.table) have an argument to avoid NA matching. Is there a FR already created for this? I can create it otherwise
On Thu, Sep 18, 2014 at 4:00 PM, Arunkumar Srinivasan <aragorn168b at gmail.com> wrote:
In base R `NA` matches `NA` alone, and `NaN` matches `NaN` alone:
match(NA, c(1:5, NA))
# [1] 6
data.table matches, through binary search, by design, in the same way. And in `?match`, there's this line: "Exactly what matches what is to some extent a matter of definition." In some operations it may not make sense. But, by design, we do consider Inf = Inf, -Inf = -Inf, NaN = NaN and NA = NA always. Do you think it'd help tp state this explicitly in `?data.table`?
Arun
From: Juan Manuel Truppia <jmtruppia at gmail.com>
Reply: Juan Manuel Truppia <jmtruppia at gmail.com>>
Date: September 18, 2014 at 6:14:56 PM
To: datatable-help at lists.r-forge.r-project.org <datatable-help at lists.r-forge.r-project.org>>
Subject: [datatable-help] NA in joins
Hi, this must have been discussed before, but I couldn't find anything.
In my opinion, NA shouldn't join with anything, including other NA (as to mirror what we expect from SQL, where NULL doesn't join with NULL).
However, with data.table, NA matches other NA.
I.e, this should return an empty data.table
data.table(idx = NA_real_, key = "idx")[data.table(idx = NA_real_, val = "a", key = "idx"), nomatch = 0]
Let's assume that we can't change this behavior, would it be possible to add a parameter to avoid NA matching NA in [.data.table and merge?
_______________________________________________
datatable-help mailing list
datatable-help at lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140918/f1eca862/attachment-0001.html>
More information about the datatable-help
mailing list