[datatable-help] unique.data.frame should create a copy, right?

Steve Lianoglou lianoglou.steve at gene.com
Thu Aug 1 18:58:07 CEST 2013


Hi,

On Thu, Aug 1, 2013 at 12:27 AM, Arunkumar Srinivasan
<aragorn168b at gmail.com> wrote:
> Steve,
>
> Yes, exactly. If you dint have to subset the data.table as in your example,

Not sure what subsetting (or not) has to do with it, but ...

> the equivalent operation would be to set the key of DT1 to NULL and then
> doing `unique` and storing it in DT2 and then setting the key back to "A" on
> DT1.
>
> And it'd be nice to be able to do: `unique(DT1, usekey=FALSE)` or something
> like that so that we don't have to NULL and set the key of DT1.

Ask and you shall receive :-)

I added a `use.key=TRUE` parameter to unique.data.table and
duplicated.data.table which is in SVN revision 888. This runs the
relevant functions on the data.table as if it were not keyed at all.

R> DT1 <- CJ(A=0:1,B=1:6,D0=0:1,D=0:1)[D>=D0]
R> setkey(DT1,A)
R> DT2 <- unique.data.frame(DT1[,-which(names(DT1)%in%'B'),with=FALSE])
R> dt2 <- unique(DT1[,-which(names(DT1) %in% 'B'),with=FALSE], use.key=FALSE)
R> all.equal(DT2, dt2, check.attributes=FALSE)
[1] TRUE

The all.equal test will fail when check.attributers is TRUE because
dt2 is still keyed by 'A'.

R> key(DT1)
[1] "A"

R> key(DT2)
NULL

R> key(dt2)
[1] "A"

Hope that covers it,
-steve

-- 
Steve Lianoglou
Computational Biologist
Bioinformatics and Computational Biology
Genentech


More information about the datatable-help mailing list