[datatable-help] behavior of unique on data.tables with strings

Steven C. Bagley steven.bagley at gmail.com
Tue Dec 27 04:33:51 CET 2011


In data.table 1.7.7: 

The function unique works for datatables (without keys) that have factors, but not if they have strings. In the latter case, setting the key will convert the strings to factors. I can't figure out from the documentation if this is the intended behavior or not. (The documentation does say that keys can't be characters/strings). It would be nice if unique would work without having to convert strings to factors because of the conversion cost in very large datatables, but maybe this isn't possible.

--Steve

> library(data.table)
> foo1=as.data.table(data.frame(a=c("1", "1"), b=c(2,2)))
> foo1
     a b
[1,] 1 2
[2,] 1 2
> str(foo1)
Classes ‘data.table’ and 'data.frame':	2 obs. of  2 variables:
 $ a: Factor w/ 1 level "1": 1 1
 $ b: num  2 2
> unique(foo1)
     a b
[1,] 1 2
> foo2=as.data.table(data.frame(a=c("1", "1"), b=c(2,2), stringsAsFactors=FALSE))
> foo2
     a b
[1,] 1 2
[2,] 1 2
> str(foo2)
Classes ‘data.table’ and 'data.frame':	2 obs. of  2 variables:
 $ a: chr  "1" "1"
 $ b: num  2 2
> unique(foo2)
     a b
[1,] 1 2
[2,] 1 2
> setkey(foo2, a)
> str(foo2)
Classes ‘data.table’ and 'data.frame':	2 obs. of  2 variables:
 $ a: Factor w/ 1 level "1": 1 1
 $ b: num  2 2
 - attr(*, "sorted")= chr "a"
> unique(foo2)
     a b
[1,] 1 2


More information about the datatable-help mailing list