[datatable-help] behavior of unique on data.tables with strings
Steven C. Bagley
steven.bagley at gmail.com
Tue Dec 27 04:33:51 CET 2011
In data.table 1.7.7:
The function unique works for datatables (without keys) that have factors, but not if they have strings. In the latter case, setting the key will convert the strings to factors. I can't figure out from the documentation if this is the intended behavior or not. (The documentation does say that keys can't be characters/strings). It would be nice if unique would work without having to convert strings to factors because of the conversion cost in very large datatables, but maybe this isn't possible.
--Steve
> library(data.table)
> foo1=as.data.table(data.frame(a=c("1", "1"), b=c(2,2)))
> foo1
a b
[1,] 1 2
[2,] 1 2
> str(foo1)
Classes ‘data.table’ and 'data.frame': 2 obs. of 2 variables:
$ a: Factor w/ 1 level "1": 1 1
$ b: num 2 2
> unique(foo1)
a b
[1,] 1 2
> foo2=as.data.table(data.frame(a=c("1", "1"), b=c(2,2), stringsAsFactors=FALSE))
> foo2
a b
[1,] 1 2
[2,] 1 2
> str(foo2)
Classes ‘data.table’ and 'data.frame': 2 obs. of 2 variables:
$ a: chr "1" "1"
$ b: num 2 2
> unique(foo2)
a b
[1,] 1 2
[2,] 1 2
> setkey(foo2, a)
> str(foo2)
Classes ‘data.table’ and 'data.frame': 2 obs. of 2 variables:
$ a: Factor w/ 1 level "1": 1 1
$ b: num 2 2
- attr(*, "sorted")= chr "a"
> unique(foo2)
a b
[1,] 1 2
More information about the datatable-help
mailing list