[datatable-help] indexing with nomatch=0

Arunkumar Srinivasan aragorn168b at gmail.com
Sat May 4 13:47:13 CEST 2013


hmm, I see what you mean. The `i` in `all.i = TRUE/FALSE` (in addition to having T/F instead of 0/NA) kind of delineates the behaviour of X[Y] against "merge" sufficiently that users don't fall into the "unexpected output" scenario. 

I vote for this change, if there's one :).
Arun


On Saturday, May 4, 2013 at 1:40 PM, Gabor Grothendieck wrote:

> I am not sure but I think that could be handled as a separate issue if
> it becomes important. By using all.i= it makes it sufficiently
> different from all.y= that users won't expect the same default and
> further they will not necessarily expect that there be an all argument
> for the left participant in the merge.
> 
> On Sat, May 4, 2013 at 7:35 AM, Arunkumar Srinivasan
> <aragorn168b at gmail.com (mailto:aragorn168b at gmail.com)> wrote:
> > Gabor,
> > Both points I agree with. It brings enough clarity and consistency to the
> > syntax.
> > Does this mean that you don't mind X[Y] not having all functionalities of
> > `merge`? Because this takes care of the confusion of `nomatch` but still
> > does not do all merges, iiuc.
> > 
> > Arun
> > 
> > On Saturday, May 4, 2013 at 1:26 PM, Gabor Grothendieck wrote:
> > 
> > The proposal at this point would be:
> > 
> > 1. nomatch= would be replaced by all.i= such that
> > X[Y,,nomatch=NA] is the same as X[Y,,all.i=TRUE]
> > X[Y,,nomatch=0] is the same as X[Y,,all.i=FALSE]
> > nomatch= would be deprecated and ultimately removed.
> > 
> > Note that #1 is simple to implement as it only involves changing names
> > and values of arguments and does not really change any behavior;
> > however, its easier to think about because X[Y,,all.i=Z] now has the
> > same behavior as merge(X, Y, all.y=Z) and so can be quickly understood
> > by anyone who knows merge in R. In contrast nomatch= did not even
> > have the same meaning as in match() since match matches the first
> > occurrence whereas with mult="all", the default, matching in
> > data.table matches all occurrences. Note that the default of merge's
> > all.y= is all.y=FALSE but the default of all.i= is all.i=TRUE in order
> > that the default behave as indices do. Also note that this solves the
> > problem that nomatch= can only be 0 or NA since a logical can only
> > have two non-NA values anyways.
> > 
> > 2. If Y were a numeric index vector then all.i= will have the same
> > effect as if Y were a data.table with Y as its column and is merged
> > with the row numbers of X. e.g. X[1:4,,all.i=FALSE] would be the
> > same as X[1:3] if X only had 3 rows since 4 does not match a row
> > number of X and is dropped because all.i=FALSE. If Y were a numeric
> > vector with negative values it would be converted to one with positive
> > values in such a way as to have the established meaning and then the
> > same strategy is applied. If Y were logical then its recycled giving
> > YY and the same strategy is applied to which(YY). This description is
> > intended to be conceptual and the actual internal mechanism could be
> > different.
> > 
> > Thus #2 allows one to think of **all** i indexing as merging rather
> > than as multiple separate concepts (which I believe is consistent with
> > the original intention of data.table).
> > 
> > 
> > 
> > 
> > 
> > 
> > On Fri, May 3, 2013 at 8:02 PM, Eduard Antonyan
> > <eduard.antonyan at gmail.com (mailto:eduard.antonyan at gmail.com)> wrote:
> > 
> > I think I like this proposal - maybe you should write up a few examples of
> > what current behavior is, vs the proposed behavior.
> > 
> > 
> > On Fri, May 3, 2013 at 6:54 PM, Gabor Grothendieck <ggrothendieck at gmail.com (mailto:ggrothendieck at gmail.com)>
> > wrote:
> > 
> > 
> > data.table is supposed to generalize indexing and although not
> > explicitly stated the generalization seems to be that indexing is
> > merging with the row numbers so there is indeed merging going on and
> > that merging should respect nomatch= for consistency.
> > 
> > On Fri, May 3, 2013 at 6:54 PM, Eduard Antonyan
> > <eduard.antonyan at gmail.com (mailto:eduard.antonyan at gmail.com)> wrote:
> > 
> > There is no join'ing happening here, thus nomatch=0 has no effect.
> > 
> > 
> > On Fri, May 3, 2013 at 5:52 PM, Gabor Grothendieck
> > <ggrothendieck at gmail.com (mailto:ggrothendieck at gmail.com)>
> > wrote:
> > 
> > 
> > The definition of DT was left out by mistake. It should be:
> > 
> > DT <- data.table(a=letters[1:3])
> > 
> > 
> > On Fri, May 3, 2013 at 6:50 PM, Gabor Grothendieck
> > <ggrothendieck at gmail.com (mailto:ggrothendieck at gmail.com)> wrote:
> > 
> > Consider this example:
> > 
> > DT[1:4,,nomatch=0]
> > 
> > a
> > 1: a
> > 2: b
> > 3: c
> > 4: NA
> > 
> > Should it not return only the first 3 rows? It seems to be ignoring
> > the nomatch=0.
> > 
> > --
> > Statistics & Software Consulting
> > GKX Group, GKX Associates Inc.
> > tel: 1-877-GKX-GROUP
> > email: ggrothendieck at gmail.com (http://gmail.com)
> > 
> > 
> > 
> > 
> > --
> > Statistics & Software Consulting
> > GKX Group, GKX Associates Inc.
> > tel: 1-877-GKX-GROUP
> > email: ggrothendieck at gmail.com (http://gmail.com)
> > _______________________________________________
> > datatable-help mailing list
> > datatable-help at lists.r-forge.r-project.org (mailto:datatable-help at lists.r-forge.r-project.org)
> > 
> > 
> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> > 
> > 
> > 
> > 
> > --
> > Statistics & Software Consulting
> > GKX Group, GKX Associates Inc.
> > tel: 1-877-GKX-GROUP
> > email: ggrothendieck at gmail.com (http://gmail.com)
> > 
> > 
> > 
> > 
> > --
> > Statistics & Software Consulting
> > GKX Group, GKX Associates Inc.
> > tel: 1-877-GKX-GROUP
> > email: ggrothendieck at gmail.com (http://gmail.com)
> > _______________________________________________
> > datatable-help mailing list
> > datatable-help at lists.r-forge.r-project.org (mailto:datatable-help at lists.r-forge.r-project.org)
> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> > 
> 
> 
> 
> 
> --
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com (http://gmail.com)
> 
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130504/d2f162e1/attachment.html>


More information about the datatable-help mailing list