[datatable-help] keys that dont match
Steve Lianoglou
mailinglist.honeypot at gmail.com
Wed May 4 19:00:46 CEST 2011
Hi,
On Wed, May 4, 2011 at 12:23 PM, Santosh Srinivas
<santosh.srinivas at gmail.com> wrote:
> Hi Steve,
>
> Sorry ... strange problem .. Dont know why that happened.
>
> http://groups.google.com/group/datatable/browse_thread/thread/51a0387e95d37feb
It looks like your first email was sent to the @googlegroups.com
address (I didn't even know we had that setup), and the second one
came through the @lists.r-forge.r-project.
So (I guess) the first didn't come through because it was sent to the
wrong(?) list -- anyway, in the future you should send to the
@lists.r-forge... one.
> I had the question and my attempt to answer before someone says go read the
> manual :)
It looks like the answer you offered is reasonable, though.
In short -- the question was "How can I quickly tell which (keyed)
rows are in one data.table vs. another)".
As you mentioned, you can do this by joining using `[` -- in order to
do this easily, you could ensure that each data.table has a column
that isn't in the other.
For example, if you have data like so:
R> dt1 <- data.table(a=1:10, b=letters[1:10], key="a,b")
R> dt2 <- data.table(a=c(1, 3, 5, 10), b=letters[c(1, 3, 5, 10)], key="a,b")
Doing either `dt1[dt2]` or `dt2[dt1]` doesn't get you anywhere too
fast (especially if one is just a subset of the other (like dt2 is to
dt1):
R> dt1[dt2]
a b
[1,] 1 a
[2,] 3 c
[3,] 5 e
[4,] 10 j
R> dt2[dt1]
a b
[1,] 1 a
[2,] 2 b
[3,] 3 c
[4,] 4 d
[5,] 5 e
[6,] 6 f
[7,] 7 g
[8,] 8 h
[9,] 9 i
[10,] 10 j
Adding some 'dummy' columns may help:
R> dt1$in.1 <- TRUE
R> dt2$in.2 <- TRUE
Then you can (easily) ask which rows are in dt1 that aren't in dt2:
R> dt2[dt1] ## nomatch=NA is the default
a b in.2 in.1
[1,] 1 a TRUE TRUE
[2,] 2 b NA TRUE
[3,] 3 c TRUE TRUE
[4,] 4 d NA TRUE
[5,] 5 e TRUE TRUE
[6,] 6 f NA TRUE
[7,] 7 g NA TRUE
[8,] 8 h NA TRUE
[9,] 9 i NA TRUE
[10,] 10 j TRUE TRUE
## or more email friendly format:
R> which(is.na(dt2[dt1]$in.2))
[1] 2 4 6 7 8 9
Which are the rows in dt1 that aren't in dt2
HTH,
-steve
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
| Memorial Sloan-Kettering Cancer Center
| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
More information about the datatable-help
mailing list