[datatable-help] data.table inherits from data.frame [was:Auto-convert characters to factors when settings keys?]

mdowle at mdowle.plus.com mdowle at mdowle.plus.com
Mon Jun 28 19:33:09 CEST 2010


Yes if the function uses data.frame() to create the object it returns,
then it should return a data.frame e.g. melt (untested).  If it just calls
"[", or similar, on the object then it should keep it as data.table e.g.
subset (tested).  The main thing is they work though, so that other
packages using those functions also work e.g. ggplot.

For returning results back to a data.table aware environment though ...

i) We could add data.table methods for melt, reshape, etc. These could
either use the data.table equivalent syntax and efficiency, easing users
into it (think someone mentioned that before), or these methods could call
the package's function returning a data.frame, and then make the class
change before returning data.table.  Is it just/mainly reshape people
would want data.table methods for?

ii) Or, possibly, try this in .GlobalEnv :

   data.frame = function(...) as.data.table(base::data.frame(...))

and those functions, such as melt, may well then return data.table
(untested).  IF that works, then it could be made more flexible :

   data.frame = function(...) {
      if (cendta()) as.data.table(base::data.frame(...))
      else base::data.frame(...)
   }

so users such as you or I, know we really do want a true data.frame, but
non-data-data-aware environments would really create data.table for us
(while still using [.data.frame on them). We would need to export cendta
if users were to do that, in the meantime try data.table:::cendta().

Anyway ... way further than I expected to go. If anyone feels like testing
that, would be interested if it works! Thanks for the interest and
question.

Matthew


> Harish, you'll still (usually) get back a data.frame in that situation.
> There may be a situation involving just manipulation of the original where
> the function keeps the object as a data.table. Auto-converting return
> values would be a challenge.
>
> - Tom
>
>
>
>
>> -----Original Message-----
>> From: datatable-help-bounces at lists.r-forge.r-project.org
>> [mailto:datatable-help-bounces at lists.r-forge.r-project.org]
>> On Behalf Of Harish
>> Sent: Monday, June 28, 2010 11:57
>> To: datatable-help at lists.r-forge.r-project.org
>> Subject: Re: [datatable-help] data.table inherits from
>> data.frame [was:Auto-convert characters to factors when
>> settings keys?]
>>
>> Removing the requirement for explicit conversions simplifies
>> things a bit.  Thanks.
>>
>> You commented on arguments passed into a function.  What
>> happens to return values?  For example, the reshape functions
>> (reshape(), cast(), melt(), etc.) returns a data.frame.  Are
>> those automatically treated to be data.table in the workspace
>> that is data.table aware?
>>
>>
>> Harish
>>
>>
>> --- On Mon, 6/28/10, mdowle at mdowle.plus.com
>> <mdowle at mdowle.plus.com> wrote:
>>
>> > From: mdowle at mdowle.plus.com <mdowle at mdowle.plus.com>
>> > Subject: [datatable-help] data.table inherits from data.frame [was:
>> > Auto-convert characters to factors when settings keys?]
>> > To: mailinglist.honeypot at gmail.com
>> > Cc: datatable-help at lists.r-forge.r-project.org
>> > Date: Monday, June 28, 2010, 3:03 AM
>> >
>> > The subject of this thread got misleading too. Changing that now.
>> > Was: Auto-convert characters to factors when settings keys?
>> >
>> >
>> > Thanks Steve,
>> >
>> > There are two types of users of data.table. Ones who know they are
>> > using it, and ones that don't. This change is for the latter. If a
>> > base function that only accepts data.frame, such as subset for
>> > example, does dt[,c('b','c')] inside it, then that actually
>> does work
>> > and returns the columns, not c('b','c').
>> >
>> > For example, at the end of subset.data.frame there is :
>> >
>> >     x[r, vars, drop = drop]
>> >
>> > and that will use [.data.frame even though x is data.table. 
>>  subset is
>> > a base function and as such isn't a data.table aware user.
>> >
>> > However, when a user (such as you or I) uses data.table,
>> then we can
>> > use its features such as i and j expressions of column names, joins
>> > using x[y][z] syntax, etc.  No changes there.  If we want
>> > dt[,c("b","c")] to return the columns, then we will still have to
>> > convert to data.frame, as before.  Thats because we work in
>> our user
>> > workspace which is data.table aware.
>> >
>> > It depends on where [.data.table was called from.
>> >
>> > Its just so that packages (e.g. ggplot), and base
>> functions, can work
>> > with data.table more easily now, without removing any of the
>> > advantages of the [.data.table syntax, and without requiring
>> > conversion.
>> >
>> > Makes more sense now hopefully ?
>> >
>> > Matthew
>> >
>> >
>> > > Hi,
>> > >
>> > > On Sun, Jun 27, 2010 at 6:47 PM, Matthew Dowle
>> > > <mdowle at mdowle.plus.com>
>> > > wrote:
>> > >> I went back to try again with S3 inheritance,
>> > discussed further up in
>> > >> this thread.  Just committed as it seems, so far,
>> > to work.
>> > >>
>> > >> * data.table now inherits from data.frame i.e.
>> > class =
>> > >> c("data.table","data.frame")
>> > >> * is.data.frame() now returns TRUE for data.table
>> > >> * data.table should now be compatible with
>> > functions and packages that
>> > >> _only_ accept data.frame.
>> > >
>> > > Perhaps I lost the point of this conversation
>> > somewhere along the way,
>> > > but this change makes it *technically* compatible
>> > since a data.table
>> > > passes an is.data.frame test, but it doesn't work in
>> > ways that are
>> > > perfectly acceptable for some function accepting a
>> > data.frame to work,
>> > > ie:
>> > >
>> > > R> library(data.table)
>> > > R> dt <- data.table(a=1:5, b=letters[1:5],
>> > c=sample(1:100, 5))
>> > > R> dt[,c('b', 'c')]
>> > > [1] "b" "c"
>> > >
>> > > instead of
>> > >
>> > > R> df <- as.data.frame(dt)
>> > > R> df[,c('b', 'c')]
>> > >   b  c
>> > > 1 a 18
>> > > 2 b 11
>> > > 3 c  2
>> > > 4 d 50
>> > > 5 e 96
>> > >
>> > > Unless you were talking about making more changes to
>> > make a data.table
>> > > act more like a data.frame, I'm not sure allowing the
>> > user to ignore
>> > > the differences between data.table/frames is really a
>> > win/win
>> > > situation.
>> > >
>> > > Sorry if I missed something.
>> > > -steve
>> > >
>> > > --
>> > > Steve Lianoglou
>> > > Graduate Student: Computational Systems Biology
>> > >  | Memorial Sloan-Kettering Cancer Center
>> > >  | Weill Medical College of Cornell University  Contact Info:
>> > >http://cbio.mskcc.org/~lianos/contact
>> > >
>> >
>> >
>> >
>> >
>> >
>> > _______________________________________________
>> > datatable-help mailing list
>> > datatable-help at lists.r-forge.r-project.org
>> >
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable
>> > -help
>> >
>>
>>
>>
>> _______________________________________________
>> datatable-help mailing list
>> datatable-help at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/d
> atatable-help
>>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>




More information about the datatable-help mailing list