[datatable-help] behavior of unique on data.tables with strings

Chris Neff caneff at gmail.com
Tue Jan 3 11:48:18 CET 2012


I'll confirm that I get the same behavior Steven does on 64-bit linux
on 1.7.8.  So 64-bit sounds like the culprit?

On 3 January 2012 03:01, Matthew Dowle <mdowle at mdowle.plus.com> wrote:
>
> Ok thanks. Please file a bug report (mentioning it might be a 64bit
> and/or mac only problem), so it's not forgotten. Trying to fix the Chris
> crash so will have to come back to it ...
>
> On Mon, 2012-01-02 at 20:13 -0800, Steven C. Bagley wrote:
>> It still happens. (I deleted R and all packages, then reinstalled just to check.)
>>
>> test.data.table() completes without errors.
>>
>> Here's the session info.
>>
>> > sessionInfo()
>> R version 2.14.0 (2011-10-31)
>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>>
>> locale:
>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] data.table_1.7.7
>>
>> > .Machine$double.eps ^ 0.5
>> [1] 1.490116e-08
>>
>> --Steve
>>
>> On Jan 2, 2012, at 3:27 PM, Matthew Dowle wrote:
>>
>> > Thanks for the nice report. Oddly though, it seems to work ok for me
>> > both in 1.7.7 and latest 1.7.8.
>> >
>> > $ R --vanilla
>> > R version 2.14.1 (2011-12-22)
>> > Platform: i686-pc-linux-gnu (32-bit)
>> >> require(data.table)
>> > Loading required package: data.table
>> > data.table 1.7.7  For help type: help("data.table")
>> >> foo2=as.data.table(data.frame(a=c("1", "1"), b=c(2,2),
>> > stringsAsFactors=FALSE))
>> >> unique(foo2)
>> >     a b
>> > [1,] 1 2
>> >> str(foo2)
>> > Classes ‘data.table’ and 'data.frame':      2 obs. of  2 variables:
>> > $ a: chr  "1" "1"
>> > $ b: num  2 2
>> >> .Machine$double.eps ^ 0.5
>> > [1] 1.490116e-08
>> >
>> > Could you rerun and confirm please. If you are 64bit, please include
>> > sessionInfo(). I've included tolerance as a long shot - the numeric 2's
>> > are considered equal by data.table's unique() using tolerance. Perhaps
>> > that part is not working for you. Does test.data.table() work? It should
>> > test unique and tolerance fairly thoroughly. Otherwise I can't think why
>> > the character column isn't liked by unique, should be ok.
>> >
>> > A fast unique for character columns is a good feature request, please
>> > could you add to the tracker. That is now possible to implement as we
>> > now have fast character methods.
>> >
>> > Matthew
>> >
>> > On Mon, 2011-12-26 at 19:33 -0800, Steven C. Bagley wrote:
>> >> In data.table 1.7.7:
>> >>
>> >> The function unique works for datatables (without keys) that have factors, but not if they have strings. In the latter case, setting the key will convert the strings to factors. I can't figure out from the documentation if this is the intended behavior or not. (The documentation does say that keys can't be characters/strings). It would be nice if unique would work without having to convert strings to factors because of the conversion cost in very large datatables, but maybe this isn't possible.
>> >>
>> >> --Steve
>> >>
>> >>> library(data.table)
>> >>> foo1=as.data.table(data.frame(a=c("1", "1"), b=c(2,2)))
>> >>> foo1
>> >>     a b
>> >> [1,] 1 2
>> >> [2,] 1 2
>> >>> str(foo1)
>> >> Classes ‘data.table’ and 'data.frame':     2 obs. of  2 variables:
>> >> $ a: Factor w/ 1 level "1": 1 1
>> >> $ b: num  2 2
>> >>> unique(foo1)
>> >>     a b
>> >> [1,] 1 2
>> >>> foo2=as.data.table(data.frame(a=c("1", "1"), b=c(2,2), stringsAsFactors=FALSE))
>> >>> foo2
>> >>     a b
>> >> [1,] 1 2
>> >> [2,] 1 2
>> >>> str(foo2)
>> >> Classes ‘data.table’ and 'data.frame':     2 obs. of  2 variables:
>> >> $ a: chr  "1" "1"
>> >> $ b: num  2 2
>> >>> unique(foo2)
>> >>     a b
>> >> [1,] 1 2
>> >> [2,] 1 2
>> >>> setkey(foo2, a)
>> >>> str(foo2)
>> >> Classes ‘data.table’ and 'data.frame':     2 obs. of  2 variables:
>> >> $ a: Factor w/ 1 level "1": 1 1
>> >> $ b: num  2 2
>> >> - attr(*, "sorted")= chr "a"
>> >>> unique(foo2)
>> >>     a b
>> >> [1,] 1 2
>> >> _______________________________________________
>> >> datatable-help mailing list
>> >> datatable-help at lists.r-forge.r-project.org
>> >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>> >
>> >
>>
>
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help


More information about the datatable-help mailing list