[datatable-help] unique.data.frame should create a copy, right?

Arunkumar Srinivasan aragorn168b at gmail.com
Thu Aug 15 22:57:08 CEST 2013


Awesome! Thanks, Steve. 

Arun


On Thursday, August 15, 2013 at 2:30 AM, Steve Lianoglou wrote:

> Hi all,
> 
> As I needed this sooner than I had expected, I just committed this
> change. It's in svn revision 889.
> 
> I chose 'by.columns' as the parameter names -- seemed to make more
> sense to me, and using the short hand interactively saves a letter,
> eg: unique(dt, by=c('some', 'columns')) ;-)
> 
> Here's the note from the NEWS file:
> 
> o "Uniqueness" tests can now specify arbirtray combinations of
> columns to use to test for duplicates. `by.columns` parameter added to
> unique.data.table and duplicated.data.table. This allows the user to
> test for uniqueness using any combination of columns in the
> data.table, where previously the user only had the option to use the
> keyed columns (if keyed) or all columns (if not). The default behavior
> sets `by.columns=key(dt)` to maintain backward compatability. See
> man/duplicated.Rd and tests 986:991 for more information. Thanks to
> Arunkumar Srinivasan, Ricardo Saporta, and Frank Erickson for useful
> discussions.
> 
> Should work as advertised assuming my unit tests weren't too simplistic.
> 
> Cheers,
> 
> -steve
> 
> 
> 
> 
> On Tue, Aug 13, 2013 at 1:24 PM, Steve Lianoglou
> <mailinglist.honeypot at gmail.com (mailto:mailinglist.honeypot at gmail.com)> wrote:
> > Thanks for the suggestions, folks.
> > 
> > Matthew: do you have a preference?
> > 
> > -steve
> > 
> > On Mon, Aug 12, 2013 at 11:12 AM, Ricardo Saporta
> > <saporta at scarletmail.rutgers.edu (mailto:saporta at scarletmail.rutgers.edu)> wrote:
> > > Steve,
> > > 
> > > I like your suggestion a lot. I can see putting column specification to
> > > good use.
> > > 
> > > As for the argument name, perhaps
> > > 'use.columns'
> > > 
> > > And where a value of NULL or FALSE will yield same results as
> > > `unique.data.frame`
> > > 
> > > use.columns=key(x) # default behavior
> > > use.columns=c("col1name", "col7name") #etc
> > > use.columns=NULL
> > > 
> > > 
> > > Thanks as always,
> > > Rick
> > > 
> > > 
> > > 
> > > On Mon, Aug 12, 2013 at 1:51 PM, Steve Lianoglou
> > > <mailinglist.honeypot at gmail.com (mailto:mailinglist.honeypot at gmail.com)> wrote:
> > > > 
> > > > Hi folks,
> > > > 
> > > > I actually want to revisit the fix I made here.
> > > > 
> > > > Instead of having `use.key` in the signature to unique.data.table (and
> > > > duplicated.data.table) to be:
> > > > 
> > > > function(x,
> > > > incomparables=FALSE,
> > > > tolerance=.Machine$double.eps ^ 0.5,
> > > > use.key=TRUE, ...)
> > > > 
> > > > How about we switch out use.key for a parameter that specifies the
> > > > column names to use in the uniqueness check, which defaults to key(x)
> > > > to keep backwards compatibility.
> > > > 
> > > > For argument's sake (like that?), lets call this parameter `columns`
> > > > (by.columns? with.columns? whatever) so:
> > > > 
> > > > function(x,
> > > > incomparables=FALSE,
> > > > tolerance=.Machine$double.eps ^ 0.5,
> > > > columns=key(x), ...)
> > > > 
> > > > Then:
> > > > 
> > > > (1) leaving it alone is the backward compatibile behavior;
> > > > (2) Perhaps setting it to NULL will use all columns, and make it
> > > > equivalent to unique.data.frame (also the same when x has no key); and
> > > > (3) setting it to any other combo of columns uses those columns as the
> > > > uniqueness key and filters the rows (only) out of x accordingly.
> > > > 
> > > > What do you folks think? Personally I think this is better on all
> > > > accounts then just specifying to use the key or not and the only
> > > > question in my mind is the name of the argument -- happy to hear other
> > > > world views, however, so don't be shy.
> > > > 
> > > > Thanks,
> > > > -steve
> > > > 
> > > > --
> > > > Steve Lianoglou
> > > > Computational Biologist
> > > > Bioinformatics and Computational Biology
> > > > Genentech
> > > > 
> > > 
> > > 
> > 
> > 
> > 
> > 
> > --
> > Steve Lianoglou
> > Computational Biologist
> > Bioinformatics and Computational Biology
> > Genentech
> > 
> 
> 
> 
> 
> -- 
> Steve Lianoglou
> Computational Biologist
> Bioinformatics and Computational Biology
> Genentech
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org (mailto:datatable-help at lists.r-forge.r-project.org)
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> 
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130815/6c8d9f15/attachment-0001.html>


More information about the datatable-help mailing list