[datatable-help] Joins in data.table

Fri Jul 16 09:04:10 CEST 2010

Question 1 -- Why get a full row of NAs with x[y]?  Bug?
Question 2 -- What is an easy way to do the equivalent of VLOOKUP in Excel?

====================================

Question 1 -- Why get a full row of NAs with x[y]?  Bug?

I was experimenting with x[y] syntax and I don't understand the use case for getting NA with nomatch=NA.

x <- data.table( a=c("a","b","c"), b=1:3, key="a" )
y <- data.table( a=c("b","d","e"), d=c(8,9,10) )

x[y] returns:

        a  b
[1,]    b  2
[2,] <NA> NA
[3,] <NA> NA

What is the point of getting an entire row of NA?  I don't even know what the NA row corresponds to in data.table y.  If I got the values in column "a", then it would make more sense.

I would expect:

     a  b
[1,] b  2
[2,] d NA
[3,] e NA

Am I misunderstanding this join?

(I do realize that nomatch=0 will prevent the NA rows.)

Also, is there still a plan to allowing access to y's columns within the "j" expression as indicated in FAQ 1.10?
   x[y, d]

(I did ask this before and I believe Matthew said that it is in the works.  I just want to make sure I didn't misunderstand.)

Otherwise, the whole x[y] join is only a subset.  (I do understand that I can use merge() to do a full merge.)

====================================

Question 2 -- What is an easy way to do the equivalent of VLOOKUP in Excel?

Now, suppose both my data.tables (x and y) had a lot of columns in it.  I want to add a new column "d" to x.  The value "d" will be from the data.table y such that x$a == y$a.

What is the easiest way for me to do it?  Do I have to merge() both the data.tables and remove the unwanted columns of y from the result?

If the feature in the FAQ 1.10 were implemented, could I could do
   x[y, list(.SD,d)]
or something to that effect?

Regards,
Harish