[datatable-help] keys that dont match

Steve Lianoglou mailinglist.honeypot at gmail.com
Wed May 4 19:00:46 CEST 2011


Hi,

On Wed, May 4, 2011 at 12:23 PM, Santosh Srinivas
<santosh.srinivas at gmail.com> wrote:
> Hi Steve,
>
> Sorry ... strange problem .. Dont know why that happened.
>
> http://groups.google.com/group/datatable/browse_thread/thread/51a0387e95d37feb

It looks like your first email was sent to the @googlegroups.com
address (I didn't even know we had that setup), and the second one
came through the @lists.r-forge.r-project.

So (I guess) the first didn't come through because it was sent to the
wrong(?) list -- anyway, in the future you should send to the
@lists.r-forge... one.

> I had the question and my attempt to answer before someone says go read the
> manual :)

It looks like the answer you offered is reasonable, though.

In short -- the question was "How can I quickly tell which (keyed)
rows are in one data.table vs. another)".

As you mentioned, you can do this by joining using `[` -- in order to
do this easily, you could ensure that each data.table has a column
that isn't in the other.

For example, if you have data like so:


R> dt1 <- data.table(a=1:10, b=letters[1:10], key="a,b")
R> dt2 <- data.table(a=c(1, 3, 5, 10), b=letters[c(1, 3, 5, 10)], key="a,b")

Doing either `dt1[dt2]` or `dt2[dt1]` doesn't get you anywhere too
fast (especially if one is just a subset of the other (like dt2 is to
dt1):

R> dt1[dt2]
      a b
[1,]  1 a
[2,]  3 c
[3,]  5 e
[4,] 10 j

R> dt2[dt1]
       a b
 [1,]  1 a
 [2,]  2 b
 [3,]  3 c
 [4,]  4 d
 [5,]  5 e
 [6,]  6 f
 [7,]  7 g
 [8,]  8 h
 [9,]  9 i
[10,] 10 j

Adding some 'dummy' columns may help:

R> dt1$in.1 <- TRUE
R> dt2$in.2 <- TRUE

Then you can (easily) ask which rows are in dt1 that aren't in dt2:

R> dt2[dt1] ## nomatch=NA is the default
       a b in.2 in.1
 [1,]  1 a TRUE TRUE
 [2,]  2 b   NA TRUE
 [3,]  3 c TRUE TRUE
 [4,]  4 d   NA TRUE
 [5,]  5 e TRUE TRUE
 [6,]  6 f   NA TRUE
 [7,]  7 g   NA TRUE
 [8,]  8 h   NA TRUE
 [9,]  9 i   NA TRUE
[10,] 10 j TRUE TRUE

## or more email friendly format:
R> which(is.na(dt2[dt1]$in.2))
[1] 2 4 6 7 8 9

Which are the rows in dt1 that aren't in dt2

HTH,
-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact


More information about the datatable-help mailing list