<div dir="ltr">A correction - the param is called "nomatch", not "match".<div><br></div><div style>This use case seems like smth a user shouldn't really do - in an ideal world you should have them both keyed by the same-name column.</div>
<div style><br></div><div style>As is, my view on it is that data.table is correcting the user mistake of naming the column in Y - y, instead of x, and so the output makes sense and I don't see the need of complicating the behavior by adding more cases one has to go through to figure out what the output columns would be. Similar to asking for X[J(c("b", "c", "d"))] - you wouldn't want an anonymous column there, would you?<br>
</div><div style><br></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Fri, May 3, 2013 at 6:18 AM, Gabor Grothendieck <span dir="ltr"><<a href="mailto:ggrothendieck@gmail.com" target="_blank">ggrothendieck@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">I am moving this discussion which started with mdowle to the list.<br>
<br>
Consider this example slightly modified from the data.table FAQ:<br>
<br>
> X = data.table(x=c("a","a","b","b","b","c","c"), foo=1:7, key="x")<br>
> Y = data.table(y=c("b","c","d"), bar=c(4,2,3))<br>
> out <- X[Y]; out<br>
x foo bar<br>
1: b 3 4<br>
2: b 4 4<br>
3: b 5 4<br>
4: c 6 2<br>
5: c 7 2<br>
6: d NA 3<br>
<br>
Note that the first column of the output is labelled x even though the<br>
data to produce it comes from y, e.g. "d" in out$x is not in X$x but<br>
does appear in Y$y so clearly the data is coming from y as opposed to<br>
x . In terms of SQL the above would be written:<br>
<br>
select Y.y as x, ...<br>
<br>
and the need to renamne the first column of out suggests that there<br>
may be a deeper problem here.<br>
<br>
Here are some ideas to address this (they would require changes to data.table):<br>
<br>
- the default of X[Y,, match=NA] would be changed to a default of<br>
X[Y,,match=0] so that it corresponds to the defaults in R's merge and<br>
in SQL joins.<br>
<br>
- the column name of the first column in the example above would be<br>
changed to y if match=0 but be left at x if match=NA. In the case<br>
that match=0 (the proposed new default) x and y are equal so the first<br>
column can be validly labelled as x but in the case that match=NA they<br>
are not so y would be used as the column name.<br>
<br>
- the name match= does seem a bit misleading since R's match only<br>
matches one item in the target whereas in data.table match matches<br>
many if mult="all" and that is the default. Perhaps some thought<br>
should be given to a name change here?<br>
<br>
The above would seem to correspond more closely to R's merge and SQL<br>
join defaults. Any use cases or other comments?<br>
<br>
--<br>
Statistics & Software Consulting<br>
GKX Group, GKX Associates Inc.<br>
tel: 1-877-GKX-GROUP<br>
email: ggrothendieck at <a href="http://gmail.com" target="_blank">gmail.com</a><br>
_______________________________________________<br>
datatable-help mailing list<br>
<a href="mailto:datatable-help@lists.r-forge.r-project.org">datatable-help@lists.r-forge.r-project.org</a><br>
<a href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help" target="_blank">https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help</a><br>
</blockquote></div><br></div>