[datatable-help] keys that dont match
Matthew Dowle
mdowle at mdowle.plus.com
Sat May 7 09:03:45 CEST 2011
The original post from Santosh came through as a BCC. I guess
GoogleGroups did the BCC. Will need to do more investigation.
> Which are the rows in dt1 that aren't in dt2
Another option may be a 'not join'; e.g.,
X[-X[Y,which=TRUE]]
or
seq(1,nrow(X))[-X[Y,which=TRUE]]
Will add something to docs/wiki re 'not joins'.
Matthew
On Wed, 2011-05-04 at 13:00 -0400, Steve Lianoglou wrote:
> Hi,
>
> On Wed, May 4, 2011 at 12:23 PM, Santosh Srinivas
> <santosh.srinivas at gmail.com> wrote:
> > Hi Steve,
> >
> > Sorry ... strange problem .. Dont know why that happened.
> >
> > http://groups.google.com/group/datatable/browse_thread/thread/51a0387e95d37feb
>
> It looks like your first email was sent to the @googlegroups.com
> address (I didn't even know we had that setup), and the second one
> came through the @lists.r-forge.r-project.
>
> So (I guess) the first didn't come through because it was sent to the
> wrong(?) list -- anyway, in the future you should send to the
> @lists.r-forge... one.
>
> > I had the question and my attempt to answer before someone says go read the
> > manual :)
>
> It looks like the answer you offered is reasonable, though.
>
> In short -- the question was "How can I quickly tell which (keyed)
> rows are in one data.table vs. another)".
>
> As you mentioned, you can do this by joining using `[` -- in order to
> do this easily, you could ensure that each data.table has a column
> that isn't in the other.
>
> For example, if you have data like so:
>
>
> R> dt1 <- data.table(a=1:10, b=letters[1:10], key="a,b")
> R> dt2 <- data.table(a=c(1, 3, 5, 10), b=letters[c(1, 3, 5, 10)], key="a,b")
>
> Doing either `dt1[dt2]` or `dt2[dt1]` doesn't get you anywhere too
> fast (especially if one is just a subset of the other (like dt2 is to
> dt1):
>
> R> dt1[dt2]
> a b
> [1,] 1 a
> [2,] 3 c
> [3,] 5 e
> [4,] 10 j
>
> R> dt2[dt1]
> a b
> [1,] 1 a
> [2,] 2 b
> [3,] 3 c
> [4,] 4 d
> [5,] 5 e
> [6,] 6 f
> [7,] 7 g
> [8,] 8 h
> [9,] 9 i
> [10,] 10 j
>
> Adding some 'dummy' columns may help:
>
> R> dt1$in.1 <- TRUE
> R> dt2$in.2 <- TRUE
>
> Then you can (easily) ask which rows are in dt1 that aren't in dt2:
>
> R> dt2[dt1] ## nomatch=NA is the default
> a b in.2 in.1
> [1,] 1 a TRUE TRUE
> [2,] 2 b NA TRUE
> [3,] 3 c TRUE TRUE
> [4,] 4 d NA TRUE
> [5,] 5 e TRUE TRUE
> [6,] 6 f NA TRUE
> [7,] 7 g NA TRUE
> [8,] 8 h NA TRUE
> [9,] 9 i NA TRUE
> [10,] 10 j TRUE TRUE
>
> ## or more email friendly format:
> R> which(is.na(dt2[dt1]$in.2))
> [1] 2 4 6 7 8 9
>
> Which are the rows in dt1 that aren't in dt2
>
> HTH,
> -steve
>
More information about the datatable-help
mailing list