[datatable-help] Auto-convert characters to factors when settings keys?
Short, Tom
TShort at epri.com
Tue May 25 15:15:45 CEST 2010
> -----Original Message-----
> From: datatable-help-bounces at lists.r-forge.r-project.org
> [mailto:datatable-help-bounces at lists.r-forge.r-project.org]
> On Behalf Of mdowle at mdowle.plus.com
> Sent: Tuesday, May 25, 2010 04:46
> To: Steve Lianoglou
> Cc: datatable-help at lists.r-forge.r-project.org
> Subject: Re: [datatable-help] Auto-convert characters to
> factors when settings keys?
>
> Steve,
>
> Try data.table(df) rather than as.data.table(df). In the
> vignettes and examples I think data.table(df) is used but let
> us know otherwise.
> as.data.table() is a raw convert of the class only, currently.
>
> Agreed something needs to be tidied up here. Thanks for reporting it.
>
> All,
>
> Any preferences on the following options ? :
>
> 1. Change as.data.table to use data.table. It already does
> when keep.rownames=TRUE but not when FALSE. If a user really
> wants a raw class change they can use class(x)="data.table"
> directly. No change to data.table or setkey. Since
> ?as.data.table is an alias to ?data.table this would be consistent.
>
> 2. Change data.table and setkey. Only convert character to
> factor at the point of setkey. That may prevent radix being
> used for an ad hoc by on character columns that are not in
> the key. Would we then want to do auto-conversion in ad hoc
> by too? No change to as.data.table.
>
> 3. Steve's suggestion. Change setkey. Catch character columns
> in setkey and auto convert them to factor at that point. No
> change to data.table or as.data.table.
>
> 4. Change ?as.data.table to say its a class change only, and to use
> data.table() if checks and auto-conversion of character to
> factor is required. No code changes.
>
> 5. Another solution.
>
I lean towards #4 and also maybe #3. It's nice to be able to "raw"
convert back and forth between data tables and data frames, and
as.data.table seems useful for that. A direct class assignment is okay,
but a data frame also needs a row.names attribute. I tend not to like
autoconversions.
A couple of utility functions to do in-place raw conversions would be
useful:
setdf(d) # changes class to "data.frame", creates the "row.names"
attribute
# possibly removes the "sorted" attribute
setdt(d) # changes class to "data.table", possibly deletes "row.names"
This avoids a copy. I haven't needed them enough to write them, yet.
This might be something to consider if we're making a change related to
conversions.
- Tom
More information about the datatable-help
mailing list