[datatable-help] unique.data.frame should create a copy, right?

Steve Lianoglou mailinglist.honeypot at gmail.com
Mon Aug 12 19:51:28 CEST 2013


Hi folks,

I actually want to revisit the fix I made here.

Instead of having `use.key` in the signature to unique.data.table (and
duplicated.data.table) to be:

function(x,
             incomparables=FALSE,
             tolerance=.Machine$double.eps ^ 0.5,
             use.key=TRUE, ...)

How about we switch out use.key for a parameter that specifies the
column names to use in the uniqueness check, which defaults to key(x)
to keep backwards compatibility.

For argument's sake (like that?), lets call this parameter `columns`
(by.columns? with.columns? whatever) so:

function(x,
             incomparables=FALSE,
             tolerance=.Machine$double.eps ^ 0.5,
             columns=key(x), ...)

Then:

(1) leaving it alone is the backward compatibile behavior;
(2) Perhaps setting it to NULL will use all columns, and make it
equivalent to unique.data.frame (also the same when x has no key); and
(3) setting it to any other combo of columns uses those columns as the
uniqueness key and filters the rows (only) out of x accordingly.

What do you folks think? Personally I think this is better on all
accounts then just specifying to use the key or not and the only
question in my mind is the name of the argument -- happy to hear other
world views, however, so don't be shy.

Thanks,
-steve

-- 
Steve Lianoglou
Computational Biologist
Bioinformatics and Computational Biology
Genentech


More information about the datatable-help mailing list