[datatable-help] Auto-convert characters to factors when settings keys?

mdowle at mdowle.plus.com mdowle at mdowle.plus.com
Tue May 25 10:46:11 CEST 2010


Steve,

Try data.table(df) rather than as.data.table(df). In the vignettes and
examples I think data.table(df) is used but let us know otherwise.
as.data.table() is a raw convert of the class only, currently.

Agreed something needs to be tidied up here. Thanks for reporting it.

All,

Any preferences on the following options ? :

1. Change as.data.table to use data.table. It already does when
keep.rownames=TRUE but not when FALSE.  If a user really wants a raw class
change they can use class(x)="data.table" directly. No change to
data.table or setkey.  Since ?as.data.table is an alias to ?data.table
this would be consistent.

2. Change data.table and setkey. Only convert character to factor at the
point of setkey.  That may prevent radix being used for an ad hoc by on
character columns that are not in the key. Would we then want to do
auto-conversion in ad hoc by too?  No change to as.data.table.

3. Steve's suggestion. Change setkey. Catch character columns in setkey
and auto convert them to factor at that point. No change to data.table or
as.data.table.

4. Change ?as.data.table to say its a class change only, and to use
data.table() if checks and auto-conversion of character to factor is
required. No code changes.

5. Another solution.

Any views?  I currently lean towards option 1.

Matthew


> Hi all,
>
> Would it make sense to autoconvert characters to factors when setting
> a character column as a key?
>
> For example, when converting a data.frame to a data.table, picking a
> character column as a key throws an error:
>
> R> df <- data.frame(a=as.character(sample(LETTERS[1:10])), b=1:10,
> stringsAsFactors=F)
> R> dt <- as.data.table(df)
> R> key(dt) <- 'a'
> Error in setkey("x", value) :
>   All keyed columns must be storage mode integer
>
> Converting dt$a to a factor before setting the key works fine, as
> expected:
>
> R> dt$a <- factor(dt$a)
> R> key(dt) <- 'a'
>
> So, I'm wondering if it would make sense to auto convert character
> columns to factors when calling `key` on these columns ... maybe with
> a warning, perhaps ... ?
>
> Thanks,
> -steve
>
> --
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
>  | Memorial Sloan-Kettering Cancer Center
>  | Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>




More information about the datatable-help mailing list