[datatable-help] indexing with nomatch=0

Gabor Grothendieck ggrothendieck at gmail.com
Sat May 4 13:40:41 CEST 2013


I am not sure but I think that could be handled as a separate issue if
it becomes important.  By using all.i= it makes it sufficiently
different from all.y= that users won't expect the same default and
further they will not necessarily expect that there be an all argument
for the left participant in the merge.

On Sat, May 4, 2013 at 7:35 AM, Arunkumar Srinivasan
<aragorn168b at gmail.com> wrote:
> Gabor,
> Both points I agree with. It brings enough clarity and consistency to the
> syntax.
> Does this mean that you don't mind X[Y] not having all functionalities of
> `merge`? Because this takes care of the confusion of `nomatch` but still
> does not do all merges, iiuc.
>
> Arun
>
> On Saturday, May 4, 2013 at 1:26 PM, Gabor Grothendieck wrote:
>
> The proposal at this point would be:
>
> 1. nomatch= would be replaced by all.i= such that
> X[Y,,nomatch=NA] is the same as X[Y,,all.i=TRUE]
> X[Y,,nomatch=0] is the same as X[Y,,all.i=FALSE]
> nomatch= would be deprecated and ultimately removed.
>
> Note that #1 is simple to implement as it only involves changing names
> and values of arguments and does not really change any behavior;
> however, its easier to think about because X[Y,,all.i=Z] now has the
> same behavior as merge(X, Y, all.y=Z) and so can be quickly understood
> by anyone who knows merge in R. In contrast nomatch= did not even
> have the same meaning as in match() since match matches the first
> occurrence whereas with mult="all", the default, matching in
> data.table matches all occurrences. Note that the default of merge's
> all.y= is all.y=FALSE but the default of all.i= is all.i=TRUE in order
> that the default behave as indices do. Also note that this solves the
> problem that nomatch= can only be 0 or NA since a logical can only
> have two non-NA values anyways.
>
> 2. If Y were a numeric index vector then all.i= will have the same
> effect as if Y were a data.table with Y as its column and is merged
> with the row numbers of X. e.g. X[1:4,,all.i=FALSE] would be the
> same as X[1:3] if X only had 3 rows since 4 does not match a row
> number of X and is dropped because all.i=FALSE. If Y were a numeric
> vector with negative values it would be converted to one with positive
> values in such a way as to have the established meaning and then the
> same strategy is applied. If Y were logical then its recycled giving
> YY and the same strategy is applied to which(YY). This description is
> intended to be conceptual and the actual internal mechanism could be
> different.
>
> Thus #2 allows one to think of **all** i indexing as merging rather
> than as multiple separate concepts (which I believe is consistent with
> the original intention of data.table).
>
>
>
>
>
>
> On Fri, May 3, 2013 at 8:02 PM, Eduard Antonyan
> <eduard.antonyan at gmail.com> wrote:
>
> I think I like this proposal - maybe you should write up a few examples of
> what current behavior is, vs the proposed behavior.
>
>
> On Fri, May 3, 2013 at 6:54 PM, Gabor Grothendieck <ggrothendieck at gmail.com>
> wrote:
>
>
> data.table is supposed to generalize indexing and although not
> explicitly stated the generalization seems to be that indexing is
> merging with the row numbers so there is indeed merging going on and
> that merging should respect nomatch= for consistency.
>
> On Fri, May 3, 2013 at 6:54 PM, Eduard Antonyan
> <eduard.antonyan at gmail.com> wrote:
>
> There is no join'ing happening here, thus nomatch=0 has no effect.
>
>
> On Fri, May 3, 2013 at 5:52 PM, Gabor Grothendieck
> <ggrothendieck at gmail.com>
> wrote:
>
>
> The definition of DT was left out by mistake. It should be:
>
> DT <- data.table(a=letters[1:3])
>
>
> On Fri, May 3, 2013 at 6:50 PM, Gabor Grothendieck
> <ggrothendieck at gmail.com> wrote:
>
> Consider this example:
>
> DT[1:4,,nomatch=0]
>
> a
> 1: a
> 2: b
> 3: c
> 4: NA
>
> Should it not return only the first 3 rows? It seems to be ignoring
> the nomatch=0.
>
> --
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com
>
>
>
>
> --
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
>
>
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
>
>
>
> --
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com
>
>
>
>
> --
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
>



--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com


More information about the datatable-help mailing list