[datatable-help] Auto-convert characters to factors when settings keys?

Rob Forler rforler at uchicago.edu
Tue May 25 15:00:51 CEST 2010


What is the big value to having a class change anyways? I would generally
say that since the use case is that people want to be able to use
characters/dates as factors it should automatically convert these as
data.table does. As.data.table is confusing if it doesn't have consistent
behavior to data.table.

Thanks,
Rob

On Tue, May 25, 2010 at 3:46 AM, <mdowle at mdowle.plus.com> wrote:

> Steve,
>
> Try data.table(df) rather than as.data.table(df). In the vignettes and
> examples I think data.table(df) is used but let us know otherwise.
> as.data.table() is a raw convert of the class only, currently.
>
> Agreed something needs to be tidied up here. Thanks for reporting it.
>
> All,
>
> Any preferences on the following options ? :
>
> 1. Change as.data.table to use data.table. It already does when
> keep.rownames=TRUE but not when FALSE.  If a user really wants a raw class
> change they can use class(x)="data.table" directly. No change to
> data.table or setkey.  Since ?as.data.table is an alias to ?data.table
> this would be consistent.
>
> 2. Change data.table and setkey. Only convert character to factor at the
> point of setkey.  That may prevent radix being used for an ad hoc by on
> character columns that are not in the key. Would we then want to do
> auto-conversion in ad hoc by too?  No change to as.data.table.
>
> 3. Steve's suggestion. Change setkey. Catch character columns in setkey
> and auto convert them to factor at that point. No change to data.table or
> as.data.table.
>
> 4. Change ?as.data.table to say its a class change only, and to use
> data.table() if checks and auto-conversion of character to factor is
> required. No code changes.
>
> 5. Another solution.
>
> Any views?  I currently lean towards option 1.
>
> Matthew
>
>
> > Hi all,
> >
> > Would it make sense to autoconvert characters to factors when setting
> > a character column as a key?
> >
> > For example, when converting a data.frame to a data.table, picking a
> > character column as a key throws an error:
> >
> > R> df <- data.frame(a=as.character(sample(LETTERS[1:10])), b=1:10,
> > stringsAsFactors=F)
> > R> dt <- as.data.table(df)
> > R> key(dt) <- 'a'
> > Error in setkey("x", value) :
> >   All keyed columns must be storage mode integer
> >
> > Converting dt$a to a factor before setting the key works fine, as
> > expected:
> >
> > R> dt$a <- factor(dt$a)
> > R> key(dt) <- 'a'
> >
> > So, I'm wondering if it would make sense to auto convert character
> > columns to factors when calling `key` on these columns ... maybe with
> > a warning, perhaps ... ?
> >
> > Thanks,
> > -steve
> >
> > --
> > Steve Lianoglou
> > Graduate Student: Computational Systems Biology
> >  | Memorial Sloan-Kettering Cancer Center
> >  | Weill Medical College of Cornell University
> > Contact Info: http://cbio.mskcc.org/~lianos/contact<http://cbio.mskcc.org/%7Elianos/contact>
> > _______________________________________________
> > datatable-help mailing list
> > datatable-help at lists.r-forge.r-project.org
> >
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> >
>
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20100525/d18277d9/attachment.htm>


More information about the datatable-help mailing list