[datatable-help] indexing with nomatch=0

Gabor Grothendieck ggrothendieck at gmail.com
Sat May 4 13:26:18 CEST 2013


The proposal at this point would be:

1. nomatch= would be replaced by all.i= such that
     X[Y,,nomatch=NA] is the same as X[Y,,all.i=TRUE]
     X[Y,,nomatch=0] is the same as X[Y,,all.i=FALSE]
nomatch= would be deprecated and ultimately removed.

Note that #1 is simple to implement as it only involves changing names
and values of arguments and does not really change any behavior;
however, its easier to think about because X[Y,,all.i=Z] now has the
same behavior as merge(X, Y, all.y=Z) and so can be quickly understood
by anyone who knows merge in R.  In contrast nomatch= did not even
have the same meaning as in match() since match matches the first
occurrence whereas with mult="all", the default, matching in
data.table matches all occurrences.  Note that the default of merge's
all.y= is all.y=FALSE but the default of all.i= is all.i=TRUE in order
that the default behave as indices do.  Also note that this solves the
problem that nomatch= can only be 0 or NA since a logical can only
have two non-NA values anyways.

2. If Y were a numeric index vector then all.i= will have the same
effect as if Y were a data.table with Y as its column and is merged
with the row numbers of X.  e.g.  X[1:4,,all.i=FALSE] would be the
same as X[1:3] if X only had 3 rows since 4 does not match a row
number of X and is dropped because all.i=FALSE.  If Y were a numeric
vector with negative values it would be converted to one with positive
values in such a way as to have the established meaning and then the
same strategy is applied. If Y were logical then its recycled giving
YY and the same strategy is applied to which(YY). This description is
intended to be conceptual and the actual internal mechanism could be
different.

Thus #2 allows one to think of **all** i indexing as merging rather
than as multiple separate concepts (which I believe is consistent with
the original intention of data.table).






On Fri, May 3, 2013 at 8:02 PM, Eduard Antonyan
<eduard.antonyan at gmail.com> wrote:
> I think I like this proposal - maybe you should write up a few examples of
> what current behavior is, vs the proposed behavior.
>
>
> On Fri, May 3, 2013 at 6:54 PM, Gabor Grothendieck <ggrothendieck at gmail.com>
> wrote:
>>
>> data.table is supposed to generalize indexing and although not
>> explicitly stated the generalization seems to be that indexing is
>> merging with the row numbers so there is indeed merging going on and
>> that merging should respect nomatch= for consistency.
>>
>> On Fri, May 3, 2013 at 6:54 PM, Eduard Antonyan
>> <eduard.antonyan at gmail.com> wrote:
>> > There is no join'ing happening here, thus nomatch=0 has no effect.
>> >
>> >
>> > On Fri, May 3, 2013 at 5:52 PM, Gabor Grothendieck
>> > <ggrothendieck at gmail.com>
>> > wrote:
>> >>
>> >> The definition of DT was left out by mistake.  It should be:
>> >>
>> >> DT <- data.table(a=letters[1:3])
>> >>
>> >>
>> >> On Fri, May 3, 2013 at 6:50 PM, Gabor Grothendieck
>> >> <ggrothendieck at gmail.com> wrote:
>> >> > Consider this example:
>> >> >
>> >> >> DT[1:4,,nomatch=0]
>> >> >     a
>> >> > 1:  a
>> >> > 2:  b
>> >> > 3:  c
>> >> > 4: NA
>> >> >
>> >> > Should it not return only the first 3 rows?  It seems to be ignoring
>> >> > the nomatch=0.
>> >> >
>> >> > --
>> >> > Statistics & Software Consulting
>> >> > GKX Group, GKX Associates Inc.
>> >> > tel: 1-877-GKX-GROUP
>> >> > email: ggrothendieck at gmail.com
>> >>
>> >>
>> >>
>> >> --
>> >> Statistics & Software Consulting
>> >> GKX Group, GKX Associates Inc.
>> >> tel: 1-877-GKX-GROUP
>> >> email: ggrothendieck at gmail.com
>> >> _______________________________________________
>> >> datatable-help mailing list
>> >> datatable-help at lists.r-forge.r-project.org
>> >>
>> >>
>> >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>> >
>> >
>>
>>
>>
>> --
>> Statistics & Software Consulting
>> GKX Group, GKX Associates Inc.
>> tel: 1-877-GKX-GROUP
>> email: ggrothendieck at gmail.com
>
>



-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com


More information about the datatable-help mailing list