[datatable-help] merge/join/match

Arunkumar Srinivasan aragorn168b at gmail.com
Fri May 3 17:45:24 CEST 2013


(The third time, I'm growing tired of this 40KB message taking over half-hour to reach me! :) )

Gabor,

About the behaviour of X[Y]:

The current definition of X[Y] is "it's a join looking up X's rows using Y as an index". By this definition, the output of X[Y] is very much justified, I think. Y is just used as an index. To me it feels similar to, say, X[8] (which gives NA, NA with the same column names as X). 

Another thought that occurs to me is, say, in this example:
X <- data.table(x=c("a","a","b","b","b","c","c"), foo=1:7, key="x") Y <- data.table(y=c("b"), bar=c(4))
X[Y]
Here again, you query for Y's y values in X's key column and join X and Y's columns. There's no such Y-value where X gives NA. The data then is coming from "X" and "Y" (as opposed to the case "d" you showed where the data comes just from "Y"). In this case should it be named "x" or "y"?? Always "x" makes sense to me. And Y[X] would give a "y" instead. However, I am not that good with sql joins. So I may very well have missed your point here. 


Regarding `merge`:

    x <- as.data.frame(X)
    y <- as.data.frame(Y)

    merge(x, y, by.x="x", by.y="y", all=TRUE) # --- (1)
    merge(y, x, by.x="y", by.y="x", all=TRUE) # --- (2)

The (1) always gives the column name "x" and (2) always "y". And so does X[Y] as opposed to Y[X], except for the fact that the operations X[Y] and Y[X] are not identical (as opposed to merge). So, I don't see a dissimilarity here. Again, I may have gotten through your point wrongly and would love to be corrected if so.

About the case `"nomatch"`, I agree with you that the name could be changed to avoid confusion with R's `match`. Maybe "missing = NA" and "missing = 0" makes more sense? 

Best regards, 
Arun


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130503/0f8ea1c4/attachment.html>


More information about the datatable-help mailing list