<div>
Awesome! Thanks, Steve.
</div>
<div><div><br></div><div>Arun</div><div><br></div></div>
<p style="color: #A0A0A8;">On Thursday, August 15, 2013 at 2:30 AM, Steve Lianoglou wrote:</p>
<blockquote type="cite" style="border-left-style:solid;border-width:1px;margin-left:0px;padding-left:10px;">
<span><div><div><div>Hi all,</div><div><br></div><div>As I needed this sooner than I had expected, I just committed this</div><div>change. It's in svn revision 889.</div><div><br></div><div>I chose 'by.columns' as the parameter names -- seemed to make more</div><div>sense to me, and using the short hand interactively saves a letter,</div><div>eg: unique(dt, by=c('some', 'columns')) ;-)</div><div><br></div><div>Here's the note from the NEWS file:</div><div><br></div><div>o "Uniqueness" tests can now specify arbirtray combinations of</div><div>columns to use to test for duplicates. `by.columns` parameter added to</div><div>unique.data.table and duplicated.data.table. This allows the user to</div><div>test for uniqueness using any combination of columns in the</div><div>data.table, where previously the user only had the option to use the</div><div>keyed columns (if keyed) or all columns (if not). The default behavior</div><div>sets `by.columns=key(dt)` to maintain backward compatability. See</div><div>man/duplicated.Rd and tests 986:991 for more information. Thanks to</div><div>Arunkumar Srinivasan, Ricardo Saporta, and Frank Erickson for useful</div><div>discussions.</div><div><br></div><div>Should work as advertised assuming my unit tests weren't too simplistic.</div><div><br></div><div>Cheers,</div><div><br></div><div>-steve</div><div><br></div><div><br></div><div><br></div><div><br></div><div>On Tue, Aug 13, 2013 at 1:24 PM, Steve Lianoglou</div><div><<a href="mailto:mailinglist.honeypot@gmail.com">mailinglist.honeypot@gmail.com</a>> wrote:</div><blockquote type="cite"><div><div>Thanks for the suggestions, folks.</div><div><br></div><div>Matthew: do you have a preference?</div><div><br></div><div>-steve</div><div><br></div><div>On Mon, Aug 12, 2013 at 11:12 AM, Ricardo Saporta</div><div><<a href="mailto:saporta@scarletmail.rutgers.edu">saporta@scarletmail.rutgers.edu</a>> wrote:</div><blockquote type="cite"><div><div>Steve,</div><div><br></div><div>I like your suggestion a lot. I can see putting column specification to</div><div>good use.</div><div><br></div><div>As for the argument name, perhaps</div><div> 'use.columns'</div><div><br></div><div>And where a value of NULL or FALSE will yield same results as</div><div>`unique.data.frame`</div><div><br></div><div> use.columns=key(x) # default behavior</div><div> use.columns=c("col1name", "col7name") #etc</div><div> use.columns=NULL</div><div><br></div><div><br></div><div>Thanks as always,</div><div>Rick</div><div><br></div><div><br></div><div><br></div><div>On Mon, Aug 12, 2013 at 1:51 PM, Steve Lianoglou</div><div><<a href="mailto:mailinglist.honeypot@gmail.com">mailinglist.honeypot@gmail.com</a>> wrote:</div><blockquote type="cite"><div><div><br></div><div>Hi folks,</div><div><br></div><div>I actually want to revisit the fix I made here.</div><div><br></div><div>Instead of having `use.key` in the signature to unique.data.table (and</div><div>duplicated.data.table) to be:</div><div><br></div><div>function(x,</div><div> incomparables=FALSE,</div><div> tolerance=.Machine$double.eps ^ 0.5,</div><div> use.key=TRUE, ...)</div><div><br></div><div>How about we switch out use.key for a parameter that specifies the</div><div>column names to use in the uniqueness check, which defaults to key(x)</div><div>to keep backwards compatibility.</div><div><br></div><div>For argument's sake (like that?), lets call this parameter `columns`</div><div>(by.columns? with.columns? whatever) so:</div><div><br></div><div>function(x,</div><div> incomparables=FALSE,</div><div> tolerance=.Machine$double.eps ^ 0.5,</div><div> columns=key(x), ...)</div><div><br></div><div>Then:</div><div><br></div><div>(1) leaving it alone is the backward compatibile behavior;</div><div>(2) Perhaps setting it to NULL will use all columns, and make it</div><div>equivalent to unique.data.frame (also the same when x has no key); and</div><div>(3) setting it to any other combo of columns uses those columns as the</div><div>uniqueness key and filters the rows (only) out of x accordingly.</div><div><br></div><div>What do you folks think? Personally I think this is better on all</div><div>accounts then just specifying to use the key or not and the only</div><div>question in my mind is the name of the argument -- happy to hear other</div><div>world views, however, so don't be shy.</div><div><br></div><div>Thanks,</div><div>-steve</div><div><br></div><div>--</div><div>Steve Lianoglou</div><div>Computational Biologist</div><div>Bioinformatics and Computational Biology</div><div>Genentech</div></div></blockquote></div></blockquote><div><br></div><div><br></div><div><br></div><div>--</div><div>Steve Lianoglou</div><div>Computational Biologist</div><div>Bioinformatics and Computational Biology</div><div>Genentech</div></div></blockquote><div><br></div><div><br></div><div><br></div><div>-- </div><div>Steve Lianoglou</div><div>Computational Biologist</div><div>Bioinformatics and Computational Biology</div><div>Genentech</div><div>_______________________________________________</div><div>datatable-help mailing list</div><div><a href="mailto:datatable-help@lists.r-forge.r-project.org">datatable-help@lists.r-forge.r-project.org</a></div><div><a href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help">https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help</a></div></div></div></span>
</blockquote>
<div>
<br>
</div>