[datatable-help] unique.data.frame should create a copy, right?

Steve Lianoglou mailinglist.honeypot at gmail.com
Tue Aug 13 22:24:04 CEST 2013


Thanks for the suggestions, folks.

Matthew: do you have a preference?

-steve

On Mon, Aug 12, 2013 at 11:12 AM, Ricardo Saporta
<saporta at scarletmail.rutgers.edu> wrote:
> Steve,
>
> I like your suggestion a lot.  I can see putting column specification to
> good use.
>
> As for the argument name, perhaps
>    'use.columns'
>
> And where a value of NULL or FALSE will yield same results as
> `unique.data.frame`
>
>     use.columns=key(x)   # default behavior
>     use.columns=c("col1name", "col7name")   #etc
>     use.columns=NULL
>
>
> Thanks as always,
> Rick
>
>
>
> On Mon, Aug 12, 2013 at 1:51 PM, Steve Lianoglou
> <mailinglist.honeypot at gmail.com> wrote:
>>
>> Hi folks,
>>
>> I actually want to revisit the fix I made here.
>>
>> Instead of having `use.key` in the signature to unique.data.table (and
>> duplicated.data.table) to be:
>>
>> function(x,
>>              incomparables=FALSE,
>>              tolerance=.Machine$double.eps ^ 0.5,
>>              use.key=TRUE, ...)
>>
>> How about we switch out use.key for a parameter that specifies the
>> column names to use in the uniqueness check, which defaults to key(x)
>> to keep backwards compatibility.
>>
>> For argument's sake (like that?), lets call this parameter `columns`
>> (by.columns? with.columns? whatever) so:
>>
>> function(x,
>>              incomparables=FALSE,
>>              tolerance=.Machine$double.eps ^ 0.5,
>>              columns=key(x), ...)
>>
>> Then:
>>
>> (1) leaving it alone is the backward compatibile behavior;
>> (2) Perhaps setting it to NULL will use all columns, and make it
>> equivalent to unique.data.frame (also the same when x has no key); and
>> (3) setting it to any other combo of columns uses those columns as the
>> uniqueness key and filters the rows (only) out of x accordingly.
>>
>> What do you folks think? Personally I think this is better on all
>> accounts then just specifying to use the key or not and the only
>> question in my mind is the name of the argument -- happy to hear other
>> world views, however, so don't be shy.
>>
>> Thanks,
>> -steve
>>
>> --
>> Steve Lianoglou
>> Computational Biologist
>> Bioinformatics and Computational Biology
>> Genentech
>
>



-- 
Steve Lianoglou
Computational Biologist
Bioinformatics and Computational Biology
Genentech


More information about the datatable-help mailing list