[datatable-help] unkey when I use rbind and/or warn when I try a broken key

Frank Erickson FErickson at psu.edu
Sun Oct 13 23:17:37 CEST 2013


Okay, posted. Thanks, Ed. --Frank


On Sun, Oct 13, 2013 at 1:54 PM, Eduard Antonyan
<eduard.antonyan at gmail.com>wrote:

> Frank,
>
> Great examples!
>
> 1) it's a bug, please file a report
>
> 2-3) those sound like good FRs to me
>
> Ed
>
>
> On Sat, Oct 12, 2013 at 10:40 PM, Frank Erickson <FErickson at psu.edu>wrote:
>
>> Quick follow-up: I should use rbindlist, which unsets the key.
>>
>> yy <-
>> rbindlist(list(setnames(data.table('No','NON',0L),names(DT)),DT,list('Extra','XTR',3L)))
>>
>> but maybe an rbind.data.table could be made that behaves better (in terms
>> of key maintenance) than the rbind.data.frame that is apparently called. I
>> guess this is related to my earlier thread on using unique.data.frame, in
>> that sense.
>>
>> My takeaway is: Bad things happen when creating data.tables using
>> functions designed for data.frames.
>>
>> --Frank
>>
>>
>> On Sat, Oct 12, 2013 at 11:20 PM, Frank Erickson <FErickson at psu.edu>wrote:
>>
>>> So, I recently did something like this:
>>>
>>> DT <- data.table(name=c('Guff','Aw'),id=101:102,id2=1:2,key='id')
>>>  y   <- rbind(list('No','NON',0L),DT,list('Extra','XTR',3L))
>>> x   <- data.table(id=as.character(101:102),z=1:2,key='id')
>>>
>>> Those rows I added on do not belong in the positions I pasted them into,
>>> so when I tried...
>>>
>>> options(datatable.verbose=TRUE)
>>> x[y,newcol:=name]
>>>
>>> ...it failed, silently.
>>>
>>> I'm guessing it saw the invalid key column in y and then proceeded to
>>> merge by y's column order instead. Because "name" comes before "id" (the
>>> column I thought was my key), no matches are found and newcol is not
>>> created. This is very, very confusing to see. Even with verbose on, I see
>>> no mention of "assigned to zero rows of x" or "matched on zero groups in y".
>>>
>>> I've got several problems with how this worked:
>>>
>>> (1) y should not inherit DT's key when I rbind it, or I should get a
>>> warning when rbinding a keyed data.table suggesting a better approach (that
>>> I clearly do not know about yet...?).
>>>
>>> (2) I really don't like the silent failure to assign to or create
>>> newcol. Warnings are nice.
>>>
>>> (3) It failed because DT1 had an invalid key (i.e., a "sorted" attribute
>>> on which it is not actually sorted). When I merge DT2[DT1] and it is found
>>> that DT1's key is invalid, I'd like to see (3a) a warning and (3b) it tell
>>> me explicitly that its merging on column order instead.
>>>
>>> Note that there's a nice warning message when I reset the key:
>>>
>>> setkey(y,id)
>>> # Warning message:
>>> # In setkeyv(x, cols, verbose = verbose) :
>>> #   Already keyed by this key but had invalid row order, key rebuilt. If
>>> you didn't go under the hood please let datatable-help know so the root
>>> cause can be fixed.
>>>
>>> What do you all think? Also, is there a right or safe way to do rbinding?
>>>
>>> Thanks,
>>>
>>> Frank
>>>
>>
>>
>> _______________________________________________
>> datatable-help mailing list
>> datatable-help at lists.r-forge.r-project.org
>>
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20131013/71316e17/attachment.html>


More information about the datatable-help mailing list