[datatable-help] Joins in data.table
Harish
harishv_99 at yahoo.com
Fri Jul 16 09:04:10 CEST 2010
Question 1 -- Why get a full row of NAs with x[y]? Bug?
Question 2 -- What is an easy way to do the equivalent of VLOOKUP in Excel?
====================================
Question 1 -- Why get a full row of NAs with x[y]? Bug?
I was experimenting with x[y] syntax and I don't understand the use case for getting NA with nomatch=NA.
x <- data.table( a=c("a","b","c"), b=1:3, key="a" )
y <- data.table( a=c("b","d","e"), d=c(8,9,10) )
x[y] returns:
a b
[1,] b 2
[2,] <NA> NA
[3,] <NA> NA
What is the point of getting an entire row of NA? I don't even know what the NA row corresponds to in data.table y. If I got the values in column "a", then it would make more sense.
I would expect:
a b
[1,] b 2
[2,] d NA
[3,] e NA
Am I misunderstanding this join?
(I do realize that nomatch=0 will prevent the NA rows.)
Also, is there still a plan to allowing access to y's columns within the "j" expression as indicated in FAQ 1.10?
x[y, d]
(I did ask this before and I believe Matthew said that it is in the works. I just want to make sure I didn't misunderstand.)
Otherwise, the whole x[y] join is only a subset. (I do understand that I can use merge() to do a full merge.)
====================================
Question 2 -- What is an easy way to do the equivalent of VLOOKUP in Excel?
Now, suppose both my data.tables (x and y) had a lot of columns in it. I want to add a new column "d" to x. The value "d" will be from the data.table y such that x$a == y$a.
What is the easiest way for me to do it? Do I have to merge() both the data.tables and remove the unwanted columns of y from the result?
If the feature in the FAQ 1.10 were implemented, could I could do
x[y, list(.SD,d)]
or something to that effect?
Regards,
Harish
More information about the datatable-help
mailing list