[datatable-help] Auto-convert characters to factors when settings keys?

Short, Tom TShort at epri.com
Tue May 25 15:15:45 CEST 2010


> -----Original Message-----
> From: datatable-help-bounces at lists.r-forge.r-project.org 
> [mailto:datatable-help-bounces at lists.r-forge.r-project.org] 
> On Behalf Of mdowle at mdowle.plus.com
> Sent: Tuesday, May 25, 2010 04:46
> To: Steve Lianoglou
> Cc: datatable-help at lists.r-forge.r-project.org
> Subject: Re: [datatable-help] Auto-convert characters to 
> factors when settings keys?
> 
> Steve,
> 
> Try data.table(df) rather than as.data.table(df). In the 
> vignettes and examples I think data.table(df) is used but let 
> us know otherwise.
> as.data.table() is a raw convert of the class only, currently.
> 
> Agreed something needs to be tidied up here. Thanks for reporting it.
> 
> All,
> 
> Any preferences on the following options ? :
> 
> 1. Change as.data.table to use data.table. It already does 
> when keep.rownames=TRUE but not when FALSE.  If a user really 
> wants a raw class change they can use class(x)="data.table" 
> directly. No change to data.table or setkey.  Since 
> ?as.data.table is an alias to ?data.table this would be consistent.
> 
> 2. Change data.table and setkey. Only convert character to 
> factor at the point of setkey.  That may prevent radix being 
> used for an ad hoc by on character columns that are not in 
> the key. Would we then want to do auto-conversion in ad hoc 
> by too?  No change to as.data.table.
> 
> 3. Steve's suggestion. Change setkey. Catch character columns 
> in setkey and auto convert them to factor at that point. No 
> change to data.table or as.data.table.
> 
> 4. Change ?as.data.table to say its a class change only, and to use
> data.table() if checks and auto-conversion of character to 
> factor is required. No code changes.
> 
> 5. Another solution.
> 

I lean towards #4 and also maybe #3. It's nice to be able to "raw"
convert back and forth between data tables and data frames, and
as.data.table seems useful for that. A direct class assignment is okay,
but a data frame also needs a row.names attribute. I tend not to like
autoconversions.

A couple of utility functions to do in-place raw conversions would be
useful:

setdf(d) # changes class to "data.frame", creates the "row.names"
attribute
         # possibly removes the "sorted" attribute
setdt(d) # changes class to "data.table", possibly deletes "row.names"

This avoids a copy. I haven't needed them enough to write them, yet.
This might be something to consider if we're making a change related to
conversions.

- Tom



More information about the datatable-help mailing list