[datatable-help] data.table inherits from data.frame [was: Auto-convert characters to factors when settings keys?]

mdowle at mdowle.plus.com mdowle at mdowle.plus.com
Mon Jun 28 12:03:17 CEST 2010


The subject of this thread got misleading too. Changing that now.
Was: Auto-convert characters to factors when settings keys?


Thanks Steve,

There are two types of users of data.table. Ones who know they are using
it, and ones that don't. This change is for the latter. If a base function
that only accepts data.frame, such as subset for example, does
dt[,c('b','c')] inside it, then that actually does work and returns the
columns, not c('b','c').

For example, at the end of subset.data.frame there is :

    x[r, vars, drop = drop]

and that will use [.data.frame even though x is data.table.  subset is a
base function and as such isn't a data.table aware user.

However, when a user (such as you or I) uses data.table, then we can use
its features such as i and j expressions of column names, joins using
x[y][z] syntax, etc.  No changes there.  If we want dt[,c("b","c")] to
return the columns, then we will still have to convert to data.frame, as
before.  Thats because we work in our user workspace which is data.table
aware.

It depends on where [.data.table was called from.

Its just so that packages (e.g. ggplot), and base functions, can work with
data.table more easily now, without removing any of the advantages of the
[.data.table syntax, and without requiring conversion.

Makes more sense now hopefully ?

Matthew


> Hi,
>
> On Sun, Jun 27, 2010 at 6:47 PM, Matthew Dowle <mdowle at mdowle.plus.com>
> wrote:
>> I went back to try again with S3 inheritance, discussed further up in
>> this thread.  Just committed as it seems, so far, to work.
>>
>> * data.table now inherits from data.frame i.e. class =
>> c("data.table","data.frame")
>> * is.data.frame() now returns TRUE for data.table
>> * data.table should now be compatible with functions and packages that
>> _only_ accept data.frame.
>
> Perhaps I lost the point of this conversation somewhere along the way,
> but this change makes it *technically* compatible since a data.table
> passes an is.data.frame test, but it doesn't work in ways that are
> perfectly acceptable for some function accepting a data.frame to work,
> ie:
>
> R> library(data.table)
> R> dt <- data.table(a=1:5, b=letters[1:5], c=sample(1:100, 5))
> R> dt[,c('b', 'c')]
> [1] "b" "c"
>
> instead of
>
> R> df <- as.data.frame(dt)
> R> df[,c('b', 'c')]
>   b  c
> 1 a 18
> 2 b 11
> 3 c  2
> 4 d 50
> 5 e 96
>
> Unless you were talking about making more changes to make a data.table
> act more like a data.frame, I'm not sure allowing the user to ignore
> the differences between data.table/frames is really a win/win
> situation.
>
> Sorry if I missed something.
> -steve
>
> --
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
>  | Memorial Sloan-Kettering Cancer Center
>  | Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact
>







More information about the datatable-help mailing list