[datatable-help] Auto-convert characters to factors when settings keys?

mdowle at mdowle.plus.com mdowle at mdowle.plus.com
Mon Jun 28 11:51:02 CEST 2010


Thanks Steve,

There are two types of users of data.table. Ones who know they are using
it, and ones that don't. This change is for the latter. If a base function
that only accepts data.frame, such as subset for example, does
dt[,c('b','c')] inside it, then that actually does work and returns the
columns, not c('b','c').

For example, at the end of subset.data.frame there is :

    x[r, vars, drop = drop]

and that will use [.data.frame even though x is data.table.  subset is a
base function and as such isn't a data.table aware user.

However, when a user (such as you or I) uses data.table, then we can use
its features such as i and j expressions of column names, joins using
x[y][z] syntax, etc.  No changes there.  If we want dt[,c("b","c")] to
return the columns, then we will still have to convert to data.frame, as
before.  Thats because we work in our user workspace which is data.table
aware.

It depends on where [.data.table was called from.

Its just so that packages (e.g. ggplot), and base functions, can work with
data.table more easily now, without removing any of the advantages of the
[.data.table syntax, and without requiring conversion.

Makes more sense now hopefully ?

Matthew



> Hi,
>
> On Sun, Jun 27, 2010 at 6:47 PM, Matthew Dowle <mdowle at mdowle.plus.com>
> wrote:
>> I went back to try again with S3 inheritance, discussed further up in
>> this thread.  Just committed as it seems, so far, to work.
>>
>> * data.table now inherits from data.frame i.e. class =
>> c("data.table","data.frame")
>> * is.data.frame() now returns TRUE for data.table
>> * data.table should now be compatible with functions and packages that
>> _only_ accept data.frame.
>
> Perhaps I lost the point of this conversation somewhere along the way,
> but this change makes it *technically* compatible since a data.table
> passes an is.data.frame test, but it doesn't work in ways that are
> perfectly acceptable for some function accepting a data.frame to work,
> ie:
>
> R> library(data.table)
> R> dt <- data.table(a=1:5, b=letters[1:5], c=sample(1:100, 5))
> R> dt[,c('b', 'c')]
> [1] "b" "c"
>
> instead of
>
> R> df <- as.data.frame(dt)
> R> df[,c('b', 'c')]
>   b  c
> 1 a 18
> 2 b 11
> 3 c  2
> 4 d 50
> 5 e 96
>
> Unless you were talking about making more changes to make a data.table
> act more like a data.frame, I'm not sure allowing the user to ignore
> the differences between data.table/frames is really a win/win
> situation.
>
> Sorry if I missed something.
> -steve
>
> --
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
>  | Memorial Sloan-Kettering Cancer Center
>  | Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact
>




More information about the datatable-help mailing list