<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">Oh, good point. <br>
<br>
How about putting 'by' first in those situations :<br>
<br>
> DT = data.table(A=rep(1:3,2),B=1:2)<br>
> unique(by="A",DT)<br>
A B<br>
1: 1 1<br>
2: 2 2<br>
3: 3 1<br>
> unique(by="B",DT)<br>
A B<br>
1: 1 1<br>
2: 2 2<br>
> <br>
<br>
On 27/09/13 20:09, Ricardo Saporta wrote:<br>
</div>
<blockquote
cite="mid:CAE7Aa4Q+K+DDNipRtjTk2vgoHC+ziKLFqigQ7PKrUm8QEBTE-g@mail.gmail.com"
type="cite">
<div dir="ltr">Steve, not to beat a dead horse on the "what to
name the new parameter" discussion, but I'm wondering what
your/others' thoughts are on using something other than 'by".
Maybe even "uby"
<div>
<br>
</div>
<div>Or perhaps we can have a synonym in the function
definition: </div>
<div> .. function(........ , by=uby, uby)</div>
<div><br>
</div>
<div>The reason I bring this up is that as I begin to use this
and I am reading over my own code, I realize that it takes a
lot of visual parsing to distinguish when the "by" in a
complex call belongs to "[.data.table" and when the "by"
belongs to "unique.data.table"</div>
<div><br>
</div>
<div> </div>
<div class="gmail_extra">Cheers, </div>
<div class="gmail_extra">Rick<br>
<br>
<br>
<div class="gmail_quote">On Tue, Aug 27, 2013 at 1:23 PM,
Steve Lianoglou <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:mailinglist.honeypot@gmail.com"
target="_blank">mailinglist.honeypot@gmail.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">Last
update here :-)<br>
<br>
After more hemming and hawing, I've changed the name of
the new<br>
parameter added to duplicated.data.table and
unique.data.table from<br>
`by.columnss` to just `by`, as it (more or less) is the
same idea as<br>
the `by` in dt[x, i,j,by,...]<br>
<br>
Sorry for any inconveniences caused if you've been working
off of the<br>
development version.<br>
<br>
-steve<br>
<br>
<br>
On Thu, Aug 15, 2013 at 9:35 PM, Ricardo Saporta<br>
<div class="HOEnZb">
<div class="h5"><<a moz-do-not-send="true"
href="mailto:saporta@scarletmail.rutgers.edu">saporta@scarletmail.rutgers.edu</a>>
wrote:<br>
> Steve, great stuff!!<br>
> thanks for making that happen<br>
><br>
> Rick<br>
><br>
><br>
> On Wed, Aug 14, 2013 at 8:30 PM, Steve Lianoglou<br>
> <<a moz-do-not-send="true"
href="mailto:mailinglist.honeypot@gmail.com">mailinglist.honeypot@gmail.com</a>>
wrote:<br>
>><br>
>> Hi all,<br>
>><br>
>> As I needed this sooner than I had expected,
I just committed this<br>
>> change. It's in svn revision 889.<br>
>><br>
>> I chose 'by.columns' as the parameter names
-- seemed to make more<br>
>> sense to me, and using the short hand
interactively saves a letter,<br>
>> eg: unique(dt, by=c('some', 'columns')) ;-)<br>
>><br>
>> Here's the note from the NEWS file:<br>
>><br>
>> o "Uniqueness" tests can now specify
arbirtray combinations of<br>
>> columns to use to test for duplicates.
`by.columns` parameter added to<br>
>> unique.data.table and duplicated.data.table.
This allows the user to<br>
>> test for uniqueness using any combination of
columns in the<br>
>> data.table, where previously the user only
had the option to use the<br>
>> keyed columns (if keyed) or all columns (if
not). The default behavior<br>
>> sets `by.columns=key(dt)` to maintain
backward compatability. See<br>
>> man/duplicated.Rd and tests 986:991 for more
information. Thanks to<br>
>> Arunkumar Srinivasan, Ricardo Saporta, and
Frank Erickson for useful<br>
>> discussions.<br>
>><br>
>> Should work as advertised assuming my unit
tests weren't too simplistic.<br>
>><br>
>> Cheers,<br>
>><br>
>> -steve<br>
>><br>
>><br>
>><br>
>><br>
>> On Tue, Aug 13, 2013 at 1:24 PM, Steve
Lianoglou<br>
>> <<a moz-do-not-send="true"
href="mailto:mailinglist.honeypot@gmail.com">mailinglist.honeypot@gmail.com</a>>
wrote:<br>
>> > Thanks for the suggestions, folks.<br>
>> ><br>
>> > Matthew: do you have a preference?<br>
>> ><br>
>> > -steve<br>
>> ><br>
>> > On Mon, Aug 12, 2013 at 11:12 AM,
Ricardo Saporta<br>
>> > <<a moz-do-not-send="true"
href="mailto:saporta@scarletmail.rutgers.edu">saporta@scarletmail.rutgers.edu</a>>
wrote:<br>
>> >> Steve,<br>
>> >><br>
>> >> I like your suggestion a lot. I can
see putting column specification<br>
>> >> to<br>
>> >> good use.<br>
>> >><br>
>> >> As for the argument name, perhaps<br>
>> >> 'use.columns'<br>
>> >><br>
>> >> And where a value of NULL or FALSE
will yield same results as<br>
>> >> `unique.data.frame`<br>
>> >><br>
>> >> use.columns=key(x) # default
behavior<br>
>> >> use.columns=c("col1name",
"col7name") #etc<br>
>> >> use.columns=NULL<br>
>> >><br>
>> >><br>
>> >> Thanks as always,<br>
>> >> Rick<br>
>> >><br>
>> >><br>
>> >><br>
>> >> On Mon, Aug 12, 2013 at 1:51 PM,
Steve Lianoglou<br>
>> >> <<a moz-do-not-send="true"
href="mailto:mailinglist.honeypot@gmail.com">mailinglist.honeypot@gmail.com</a>>
wrote:<br>
>> >>><br>
>> >>> Hi folks,<br>
>> >>><br>
>> >>> I actually want to revisit the
fix I made here.<br>
>> >>><br>
>> >>> Instead of having `use.key` in
the signature to unique.data.table (and<br>
>> >>> duplicated.data.table) to be:<br>
>> >>><br>
>> >>> function(x,<br>
>> >>>
incomparables=FALSE,<br>
>> >>>
tolerance=.Machine$double.eps ^ 0.5,<br>
>> >>> use.key=TRUE, ...)<br>
>> >>><br>
>> >>> How about we switch out use.key
for a parameter that specifies the<br>
>> >>> column names to use in the
uniqueness check, which defaults to key(x)<br>
>> >>> to keep backwards compatibility.<br>
>> >>><br>
>> >>> For argument's sake (like
that?), lets call this parameter `columns`<br>
>> >>> (by.columns? with.columns?
whatever) so:<br>
>> >>><br>
>> >>> function(x,<br>
>> >>>
incomparables=FALSE,<br>
>> >>>
tolerance=.Machine$double.eps ^ 0.5,<br>
>> >>> columns=key(x),
...)<br>
>> >>><br>
>> >>> Then:<br>
>> >>><br>
>> >>> (1) leaving it alone is the
backward compatibile behavior;<br>
>> >>> (2) Perhaps setting it to NULL
will use all columns, and make it<br>
>> >>> equivalent to unique.data.frame
(also the same when x has no key); and<br>
>> >>> (3) setting it to any other
combo of columns uses those columns as the<br>
>> >>> uniqueness key and filters the
rows (only) out of x accordingly.<br>
>> >>><br>
>> >>> What do you folks think?
Personally I think this is better on all<br>
>> >>> accounts then just specifying to
use the key or not and the only<br>
>> >>> question in my mind is the name
of the argument -- happy to hear other<br>
>> >>> world views, however, so don't
be shy.<br>
>> >>><br>
>> >>> Thanks,<br>
>> >>> -steve<br>
>> >>><br>
>> >>> --<br>
>> >>> Steve Lianoglou<br>
>> >>> Computational Biologist<br>
>> >>> Bioinformatics and Computational
Biology<br>
>> >>> Genentech<br>
>> >><br>
>> >><br>
>> ><br>
>> ><br>
>> ><br>
>> > --<br>
>> > Steve Lianoglou<br>
>> > Computational Biologist<br>
>> > Bioinformatics and Computational Biology<br>
>> > Genentech<br>
>><br>
>><br>
>><br>
>> --<br>
>> Steve Lianoglou<br>
>> Computational Biologist<br>
>> Bioinformatics and Computational Biology<br>
>> Genentech<br>
><br>
><br>
<br>
<br>
<br>
--<br>
Steve Lianoglou<br>
Computational Biologist<br>
Bioinformatics and Computational Biology<br>
Genentech<br>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
datatable-help mailing list
<a class="moz-txt-link-abbreviated" href="mailto:datatable-help@lists.r-forge.r-project.org">datatable-help@lists.r-forge.r-project.org</a>
<a class="moz-txt-link-freetext" href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help">https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help</a></pre>
</blockquote>
<br>
</body>
</html>