<div dir="ltr">yeah, I disagree with this view. I don't think [] should pursue compatibility with merge.</div><div class="gmail_extra"><br><br><div class="gmail_quote">On Fri, May 3, 2013 at 11:54 AM, Gabor Grothendieck <span dir="ltr"><<a href="mailto:ggrothendieck@gmail.com" target="_blank">ggrothendieck@gmail.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">I think that from the viewpoint of compatibility and convenience it<br>

would be best to implement all.x and all.y and not rely on swapping X<br>

and Y.  SQLite did something like this (they implemented left join but<br>

not right join based on the idea that all you have to do is swap join<br>

arguments) but the problem with it is that it adds a layer of mental<br>

specification effort if the actual problem is better stated in the<br>

unsupported orientation.<br>

<div class="HOEnZb"><div class="h5"><br>

On Fri, May 3, 2013 at 12:49 PM, Eduard Antonyan<br>

<<a href="mailto:eduard.antonyan@gmail.com">eduard.antonyan@gmail.com</a>> wrote:<br>

> Arun, it only needs the addition of smth like X[Y, keep.all = TRUE], all of<br>

> the other merge options already exist as either X[Y] or Y[X] with or without<br>

> nomatch = 0/NA.<br>

><br>

><br>

> On Fri, May 3, 2013 at 11:45 AM, Arunkumar Srinivasan<br>

> <<a href="mailto:aragorn168b@gmail.com">aragorn168b@gmail.com</a>> wrote:<br>

>><br>

>> Gabor,<br>

>><br>

>> Very true. I suppose your request is that the x[i] where `i` is a<br>

>> data.table should have the same set of options like R's base `merge`<br>

>> function, like, by.y=TRUE, by.x=TRUE or all=TRUE. I like the idea by itself.<br>

>> However, I am not able to think of a way to do this. I mean, I find the<br>

>> syntax X[Y, by.x=TRUE] weird / not making sense. That is, to me even though<br>

>><br>

>> X[Y] is equal to Y[X, by.y=TRUE] (or) X[Y, by.x=TRUE] (ignoring the<br>

>> reordered columns) the latter 2 don't seem to make sense/is redundant (maybe<br>

>> it's because I am used to this syntax).<br>

>><br>

>> Arun<br>

>><br>

>> On Friday, May 3, 2013 at 5:57 PM, Gabor Grothendieck wrote:<br>

>><br>

>> In my last post it should have read:<br>

>><br>

>> That X[Y] is not the same as Y[X] is analogous to the fact that<br>

>> merge(X, Y, all.y=TRUE) is not the same as merge(Y, X, all.y=TRUE)<br>

>><br>

>> On Fri, May 3, 2013 at 11:55 AM, Gabor Grothendieck<br>

>> <<a href="mailto:ggrothendieck@gmail.com">ggrothendieck@gmail.com</a>> wrote:<br>

>><br>

>> Assuming same-named keys, then these are all the same except possibly<br>

>> for row and column order:<br>

>><br>

>> X[Y,,nomatch=0]<br>

>> Y[X,,nomatch=0]<br>

>> merge(X, Y)<br>

>> merge(Y, X)<br>

>><br>

>> That X[Y] is not the same as Y[X] is analogous to the fact that<br>

>> merge(X, Y, all.x=TRUE) is not the same as merge(Y, X, all.x=TRUE)<br>

>><br>

>> On Fri, May 3, 2013 at 11:46 AM, Arunkumar Srinivasan<br>

>> <<a href="mailto:aragorn168b@gmail.com">aragorn168b@gmail.com</a>> wrote:<br>

>><br>

>> Gabor,<br>

>><br>

>> X[Y] and Y[X] are not necessarily the same operations (meaning, they don't<br>

>> *have* to give the same output). However, merge(X,Y) and merge(Y,X) *have*<br>

>> to provide the same output (except for the column order and names). In<br>

>> that<br>

>> sense, a join is a bit different from a merge, no?<br>

>><br>

>> Arun<br>

>><br>

>> On Friday, May 3, 2013 at 5:36 PM, Gabor Grothendieck wrote:<br>

>><br>

>> Yes, except that is not really what happens since match() only matches<br>

>> one row whereas with mult="all", the default, all rows are matched<br>

>> which is not really matching in the sense of match(). The current<br>

>> naming confuses matching with joining and its really the latter that<br>

>> is being done.<br>

>><br>

>> Regarding the existence of merge the advantage of [ is that it will<br>

>> automatically only take the columns needed so merge is not really<br>

>> equivalent to [ in all respects. Furthermore having to use different<br>

>> constructs for different types of merge seems awkward.<br>

>><br>

>><br>

>> On Fri, May 3, 2013 at 11:27 AM, Eduard Antonyan<br>

>> <<a href="mailto:eduard.antonyan@gmail.com">eduard.antonyan@gmail.com</a>> wrote:<br>

>><br>

>> Btw the way I think about the "nomatch" name is as follows - normally X[Y]<br>

>> tries to match rows of Y with rows of X, and then "nomatch" tells it what<br>

>> to<br>

>> do when there is *no match*.<br>

>><br>

>><br>

>> On Fri, May 3, 2013 at 10:23 AM, Eduard Antonyan<br>

>> <<a href="mailto:eduard.antonyan@gmail.com">eduard.antonyan@gmail.com</a>><br>

>> wrote:<br>

>><br>

>><br>

>> To clarify - that behavior is already implemented in merge (more<br>

>> specifically merge.data.table). I don't really have a view on having it in<br>

>> X[Y] as well - I don't like all.x and all.y as the names, since there are<br>

>> no<br>

>> params named 'x' and 'y' in [.data.table (as opposed to merge), but some<br>

>> param that would do a full outer join could certainly be added.<br>

>><br>

>><br>

>> On Fri, May 3, 2013 at 10:09 AM, Gabor Grothendieck<br>

>> <<a href="mailto:ggrothendieck@gmail.com">ggrothendieck@gmail.com</a>> wrote:<br>

>><br>

>><br>

>> Yes, sorry. Its nomatch= which presumably derives from the parameter<br>

>> of the same name in the match() function. If the idea of the nomatch=<br>

>> name was to leverage off existing argument names in R then I would<br>

>> prefer all.y= to be consistent with merge() in place of nomatch= since<br>

>> we are really merging/joining rather than just matching. That would<br>

>> also allow extension to all types of join by adding <a href="http://all.an" target="_blank">all.an</a> x= argument<br>

>> too.<br>

>><br>

>> On Fri, May 3, 2013 at 10:59 AM, Eduard Antonyan<br>

>> <<a href="mailto:eduard.antonyan@gmail.com">eduard.antonyan@gmail.com</a>> wrote:<br>

>><br>

>> I would prefer nomatch=0 as a default though, simply because that's<br>

>> what I<br>

>> do most of the time :)<br>

>><br>

>><br>

>> On Fri, May 3, 2013 at 9:57 AM, Eduard Antonyan<br>

>> <<a href="mailto:eduard.antonyan@gmail.com">eduard.antonyan@gmail.com</a>><br>

>> wrote:<br>

>><br>

>><br>

>> A correction - the param is called "nomatch", not "match".<br>

>><br>

>> This use case seems like smth a user shouldn't really do - in an ideal<br>

>> world you should have them both keyed by the same-name column.<br>

>><br>

>> As is, my view on it is that data.table is correcting the user mistake<br>

>> of<br>

>> naming the column in Y - y, instead of x, and so the output makes<br>

>> sense and<br>

>> I don't see the need of complicating the behavior by adding more cases<br>

>> one<br>

>> has to go through to figure out what the output columns would be.<br>

>> Similar to<br>

>> asking for X[J(c("b", "c", "d"))] - you wouldn't want an anonymous<br>

>> column<br>

>> there, would you?<br>

>><br>

>><br>

>><br>

>> On Fri, May 3, 2013 at 6:18 AM, Gabor Grothendieck<br>

>> <<a href="mailto:ggrothendieck@gmail.com">ggrothendieck@gmail.com</a>> wrote:<br>

>><br>

>><br>

>> I am moving this discussion which started with mdowle to the list.<br>

>><br>

>> Consider this example slightly modified from the data.table FAQ:<br>

>><br>

>> X = data.table(x=c("a","a","b","b","b","c","c"), foo=1:7, key="x")<br>

>> Y = data.table(y=c("b","c","d"), bar=c(4,2,3))<br>

>> out <- X[Y]; out<br>

>><br>

>> x foo bar<br>

>> 1: b 3 4<br>

>> 2: b 4 4<br>

>> 3: b 5 4<br>

>> 4: c 6 2<br>

>> 5: c 7 2<br>

>> 6: d NA 3<br>

>><br>

>> Note that the first column of the output is labelled x even though<br>

>> the<br>

>> data to produce it comes from y, e.g. "d" in out$x is not in X$x but<br>

>> does appear in Y$y so clearly the data is coming from y as opposed to<br>

>> x . In terms of SQL the above would be written:<br>

>><br>

>> select Y.y as x, ...<br>

>><br>

>> and the need to renamne the first column of out suggests that there<br>

>> may be a deeper problem here.<br>

>><br>

>> Here are some ideas to address this (they would require changes to<br>

>> data.table):<br>

>><br>

>> - the default of X[Y,, match=NA] would be changed to a default of<br>

>> X[Y,,match=0] so that it corresponds to the defaults in R's merge and<br>

>> in SQL joins.<br>

>><br>

>> - the column name of the first column in the example above would be<br>

>> changed to y if match=0 but be left at x if match=NA. In the case<br>

>> that match=0 (the proposed new default) x and y are equal so the<br>

>> first<br>

>> column can be validly labelled as x but in the case that match=NA<br>

>> they<br>

>> are not so y would be used as the column name.<br>

>><br>

>> - the name match= does seem a bit misleading since R's match only<br>

>> matches one item in the target whereas in data.table match matches<br>

>> many if mult="all" and that is the default. Perhaps some thought<br>

>> should be given to a name change here?<br>

>><br>

>> The above would seem to correspond more closely to R's merge and SQL<br>

>> join defaults. Any use cases or other comments?<br>

>><br>

>> --<br>

>> Statistics & Software Consulting<br>

>> GKX Group, GKX Associates Inc.<br>

>> tel: 1-877-GKX-GROUP<br>

>> email: ggrothendieck at <a href="http://gmail.com" target="_blank">gmail.com</a><br>

>> _______________________________________________<br>

>> datatable-help mailing list<br>

>> <a href="mailto:datatable-help@lists.r-forge.r-project.org">datatable-help@lists.r-forge.r-project.org</a><br>

>><br>

>><br>

>><br>

>> <a href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help" target="_blank">https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help</a><br>

>><br>

>><br>

>><br>

>><br>

>> --<br>

>> Statistics & Software Consulting<br>

>> GKX Group, GKX Associates Inc.<br>

>> tel: 1-877-GKX-GROUP<br>

>> email: ggrothendieck at <a href="http://gmail.com" target="_blank">gmail.com</a><br>

>><br>

>><br>

>><br>

>><br>

>> --<br>

>> Statistics & Software Consulting<br>

>> GKX Group, GKX Associates Inc.<br>

>> tel: 1-877-GKX-GROUP<br>

>> email: ggrothendieck at <a href="http://gmail.com" target="_blank">gmail.com</a><br>

>> _______________________________________________<br>

>> datatable-help mailing list<br>

>> <a href="mailto:datatable-help@lists.r-forge.r-project.org">datatable-help@lists.r-forge.r-project.org</a><br>

>><br>

>> <a href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help" target="_blank">https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help</a><br>

>><br>

>><br>

>><br>

>><br>

>> --<br>

>> Statistics & Software Consulting<br>

>> GKX Group, GKX Associates Inc.<br>

>> tel: 1-877-GKX-GROUP<br>

>> email: ggrothendieck at <a href="http://gmail.com" target="_blank">gmail.com</a><br>

>><br>

>><br>

>><br>

>><br>

>> --<br>

>> Statistics & Software Consulting<br>

>> GKX Group, GKX Associates Inc.<br>

>> tel: 1-877-GKX-GROUP<br>

>> email: ggrothendieck at <a href="http://gmail.com" target="_blank">gmail.com</a><br>

>><br>

>><br>

><br>

<br>

<br>

<br>

--<br>

Statistics & Software Consulting<br>

GKX Group, GKX Associates Inc.<br>

tel: 1-877-GKX-GROUP<br>

email: ggrothendieck at <a href="http://gmail.com" target="_blank">gmail.com</a><br>

</div></div></blockquote></div><br></div>