[datatable-help] data.table inherits from data.frame [was:Auto-convert characters to factors when settings keys?]

Harish harishv_99 at yahoo.com
Thu Jul 1 06:37:46 CEST 2010


Matthew,

Option 2 -- 

I have been installing the binaries; I did not realize there was an option for me to install the source.  I used to look at the Rev which contained the changes (via R-Forge) and waited for the corresponding binaries to be published.  Hence my responses to fixes were delayed.

I missed the data.frame creation.  I mistakenly ended up looking at reshape() code rather than melt().

==== I tried this ====

data.frame = function(...) as.data.table(base::data.frame(...))
DT <- data.table( A=letters[1:3], B=11:22, C=1:2 )
class( melt( DT ) )
class( data.frame( a=1:5 ) )
test.data.table()

==== Output ====

Using A as id variables
[1] "data.frame"                 # output of melt()
[1] "data.table" "data.frame"    # output of data.frame I created
All 170 tests in test.data.table() completed ok in 11.166sec

================

You can see that the melt() still returned a data.frame.  However, creating a data.frame in .GlobalEnv used the conversion function to create a data.table.

Conceptually, I understand why you are thinking that the conversion function should be executed.  However, I am not sure why it is not being executed.


Option 1 -- I'd be happy to help in any way.


Harish


--- On Wed, 6/30/10, mdowle at mdowle.plus.com <mdowle at mdowle.plus.com> wrote:

> From: mdowle at mdowle.plus.com <mdowle at mdowle.plus.com>
> Subject: Re: [datatable-help] data.table inherits from data.frame   [was:Auto-convert characters to factors when settings keys?]
> To: "Harish" <harishv_99 at yahoo.com>
> Cc: datatable-help at lists.r-forge.r-project.org
> Date: Wednesday, June 30, 2010, 3:28 AM
> > Option ii -- creating a wrapper
> around the data.frame creation function --
> > does not work with melt() when I tested it.  When
> I glanced through the
> > code, I did not see any calls to data.frame(). 
> It looks like it uses
> > rbind() and cbind() to create the new data.frame.
> 
> Just to check a few things then ...
> melt.data.frame contains a call to data.frame on the line
> after the do.call.
> Whats the error you get ?
> Does test.data.table() work ok - how many tests does it
> run?
> How do you install the latest version from r-forge - using
> type='source',
> or using the binary which can take a few days on r-forge
> depending on your
> platform.
> 
> > Option i -- I'm not sure what this entails.  What
> is the advantage over
> > having wrapper functions?  Reshape-type functions
> interest me at the
> > moment.
> 
> It involves me or Tom, or someone, adding those methods to
> the data.table
> package. The advantage of the wrapper around data.frame()
> is just that we
> don't have to i.e. a quick fix-all. Probably though, we'd
> want to
> implement those methods in a more efficient way in
> data.table, rather than
> dispatch to reshape to do that as a data.frame.
> 
> I'll give it some more thought. There were some other
> threads on this list
> along these lines too, need to review them again ...
> 
> >
> >
> > Harish
> >
> >
> > --- On Mon, 6/28/10, mdowle at mdowle.plus.com
> <mdowle at mdowle.plus.com>
> > wrote:
> >
> >> From: mdowle at mdowle.plus.com
> <mdowle at mdowle.plus.com>
> >> Subject: Re: [datatable-help] data.table inherits
> from data.frame
> >> [was:Auto-convert characters to factors when
> settings keys?]
> >> To: "Short, Tom" <TShort at epri.com>
> >> Cc: "Harish" <harishv_99 at yahoo.com>,
> >> datatable-help at lists.r-forge.r-project.org
> >> Date: Monday, June 28, 2010, 10:33 AM
> >> Yes if the function uses data.frame()
> >> to create the object it returns,
> >> then it should return a data.frame e.g. melt
> >> (untested).  If it just calls
> >> "[", or similar, on the object then it should keep
> it as
> >> data.table e.g.
> >> subset (tested).  The main thing is they work
> though,
> >> so that other
> >> packages using those functions also work e.g.
> ggplot.
> >>
> >> For returning results back to a data.table aware
> >> environment though ...
> >>
> >> i) We could add data.table methods for melt,
> reshape, etc.
> >> These could
> >> either use the data.table equivalent syntax and
> efficiency,
> >> easing users
> >> into it (think someone mentioned that before), or
> these
> >> methods could call
> >> the package's function returning a data.frame, and
> then
> >> make the class
> >> change before returning data.table.  Is it
> just/mainly
> >> reshape people
> >> would want data.table methods for?
> >>
> >> ii) Or, possibly, try this in .GlobalEnv :
> >>
> >>    data.frame = function(...)
> >> as.data.table(base::data.frame(...))
> >>
> >> and those functions, such as melt, may well then
> return
> >> data.table
> >> (untested).  IF that works, then it could be made
> more
> >> flexible :
> >>
> >>    data.frame = function(...) {
> >>       if (cendta())
> >> as.data.table(base::data.frame(...))
> >>       else base::data.frame(...)
> >>    }
> >>
> >> so users such as you or I, know we really do want
> a true
> >> data.frame, but
> >> non-data-data-aware environments would really
> create
> >> data.table for us
> >> (while still using [.data.frame on them). We would
> need to
> >> export cendta
> >> if users were to do that, in the meantime try
> >> data.table:::cendta().
> >>
> >> Anyway ... way further than I expected to go. If
> anyone
> >> feels like testing
> >> that, would be interested if it works! Thanks for
> the
> >> interest and
> >> question.
> >>
> >> Matthew
> >>
> >>
> >> > Harish, you'll still (usually) get back a
> data.frame
> >> in that situation.
> >> > There may be a situation involving just
> manipulation
> >> of the original where
> >> > the function keeps the object as a
> data.table.
> >> Auto-converting return
> >> > values would be a challenge.
> >> >
> >> > - Tom
> >> >
> >> >
> >> >
> >> >
> >> >> -----Original Message-----
> >> >> From: datatable-help-bounces at lists.r-forge.r-project.org
> >> >> [mailto:datatable-help-bounces at lists.r-forge.r-project.org]
> >> >> On Behalf Of Harish
> >> >> Sent: Monday, June 28, 2010 11:57
> >> >> To: datatable-help at lists.r-forge.r-project.org
> >> >> Subject: Re: [datatable-help] data.table
> inherits
> >> from
> >> >> data.frame [was:Auto-convert characters
> to factors
> >> when
> >> >> settings keys?]
> >> >>
> >> >> Removing the requirement for explicit
> conversions
> >> simplifies
> >> >> things a bit.  Thanks.
> >> >>
> >> >> You commented on arguments passed into a
> >> function.  What
> >> >> happens to return values?  For example,
> the
> >> reshape functions
> >> >> (reshape(), cast(), melt(), etc.) returns
> a
> >> data.frame.  Are
> >> >> those automatically treated to be
> data.table in
> >> the workspace
> >> >> that is data.table aware?
> >> >>
> >> >>
> >> >> Harish
> >> >>
> >> >>
> >> >> --- On Mon, 6/28/10, mdowle at mdowle.plus.com
> >> >> <mdowle at mdowle.plus.com>
> >> wrote:
> >> >>
> >> >> > From: mdowle at mdowle.plus.com
> >> <mdowle at mdowle.plus.com>
> >> >> > Subject: [datatable-help] data.table
> inherits
> >> from data.frame [was:
> >> >> > Auto-convert characters to factors
> when
> >> settings keys?]
> >> >> > To: mailinglist.honeypot at gmail.com
> >> >> > Cc: datatable-help at lists.r-forge.r-project.org
> >> >> > Date: Monday, June 28, 2010, 3:03
> AM
> >> >> >
> >> >> > The subject of this thread got
> misleading
> >> too. Changing that now.
> >> >> > Was: Auto-convert characters to
> factors when
> >> settings keys?
> >> >> >
> >> >> >
> >> >> > Thanks Steve,
> >> >> >
> >> >> > There are two types of users of
> data.table.
> >> Ones who know they are
> >> >> > using it, and ones that don't. This
> change is
> >> for the latter. If a
> >> >> > base function that only accepts
> data.frame,
> >> such as subset for
> >> >> > example, does dt[,c('b','c')] inside
> it, then
> >> that actually
> >> >> does work
> >> >> > and returns the columns, not
> c('b','c').
> >> >> >
> >> >> > For example, at the end of
> subset.data.frame
> >> there is :
> >> >> >
> >> >> >     x[r, vars, drop = drop]
> >> >> >
> >> >> > and that will use [.data.frame even
> though x
> >> is data.table. 
> >> >>  subset is
> >> >> > a base function and as such isn't a
> >> data.table aware user.
> >> >> >
> >> >> > However, when a user (such as you or
> I) uses
> >> data.table,
> >> >> then we can
> >> >> > use its features such as i and j
> expressions
> >> of column names, joins
> >> >> > using x[y][z] syntax, etc.  No
> changes
> >> there.  If we want
> >> >> > dt[,c("b","c")] to return the
> columns, then
> >> we will still have to
> >> >> > convert to data.frame, as before. 
> Thats
> >> because we work in
> >> >> our user
> >> >> > workspace which is data.table
> aware.
> >> >> >
> >> >> > It depends on where [.data.table was
> called
> >> from.
> >> >> >
> >> >> > Its just so that packages (e.g.
> ggplot), and
> >> base
> >> >> functions, can work
> >> >> > with data.table more easily now,
> without
> >> removing any of the
> >> >> > advantages of the [.data.table
> syntax, and
> >> without requiring
> >> >> > conversion.
> >> >> >
> >> >> > Makes more sense now hopefully ?
> >> >> >
> >> >> > Matthew
> >> >> >
> >> >> >
> >> >> > > Hi,
> >> >> > >
> >> >> > > On Sun, Jun 27, 2010 at 6:47
> PM, Matthew
> >> Dowle
> >> >> > > <mdowle at mdowle.plus.com>
> >> >> > > wrote:
> >> >> > >> I went back to try again
> with S3
> >> inheritance,
> >> >> > discussed further up in
> >> >> > >> this thread.  Just
> committed as it
> >> seems, so far,
> >> >> > to work.
> >> >> > >>
> >> >> > >> * data.table now inherits
> from
> >> data.frame i.e.
> >> >> > class =
> >> >> > >>
> c("data.table","data.frame")
> >> >> > >> * is.data.frame() now
> returns TRUE
> >> for data.table
> >> >> > >> * data.table should now be
> >> compatible with
> >> >> > functions and packages that
> >> >> > >> _only_ accept data.frame.
> >> >> > >
> >> >> > > Perhaps I lost the point of
> this
> >> conversation
> >> >> > somewhere along the way,
> >> >> > > but this change makes it
> *technically*
> >> compatible
> >> >> > since a data.table
> >> >> > > passes an is.data.frame test,
> but it
> >> doesn't work in
> >> >> > ways that are
> >> >> > > perfectly acceptable for some
> function
> >> accepting a
> >> >> > data.frame to work,
> >> >> > > ie:
> >> >> > >
> >> >> > > R> library(data.table)
> >> >> > > R> dt <-
> data.table(a=1:5,
> >> b=letters[1:5],
> >> >> > c=sample(1:100, 5))
> >> >> > > R> dt[,c('b', 'c')]
> >> >> > > [1] "b" "c"
> >> >> > >
> >> >> > > instead of
> >> >> > >
> >> >> > > R> df <-
> as.data.frame(dt)
> >> >> > > R> df[,c('b', 'c')]
> >> >> > >   b  c
> >> >> > > 1 a 18
> >> >> > > 2 b 11
> >> >> > > 3 c  2
> >> >> > > 4 d 50
> >> >> > > 5 e 96
> >> >> > >
> >> >> > > Unless you were talking about
> making
> >> more changes to
> >> >> > make a data.table
> >> >> > > act more like a data.frame, I'm
> not sure
> >> allowing the
> >> >> > user to ignore
> >> >> > > the differences between
> >> data.table/frames is really a
> >> >> > win/win
> >> >> > > situation.
> >> >> > >
> >> >> > > Sorry if I missed something.
> >> >> > > -steve
> >> >> > >
> >> >> > > --
> >> >> > > Steve Lianoglou
> >> >> > > Graduate Student: Computational
> Systems
> >> Biology
> >> >> > >  | Memorial Sloan-Kettering
> Cancer
> >> Center
> >> >> > >  | Weill Medical College of
> Cornell
> >> University  Contact Info:
> >> >> > >http://cbio.mskcc.org/~lianos/contact
> >> >> > >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> _______________________________________________
> >> >> > datatable-help mailing list
> >> >> > datatable-help at lists.r-forge.r-project.org
> >> >> >
> >> >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable
> >> >> > -help
> >> >> >
> >> >>
> >> >>
> >> >>
> >> >>
> _______________________________________________
> >> >> datatable-help mailing list
> >> >> datatable-help at lists.r-forge.r-project.org
> >> >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/d
> >> > atatable-help
> >> >>
> >> >
> _______________________________________________
> >> > datatable-help mailing list
> >> > datatable-help at lists.r-forge.r-project.org
> >> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> >> >
> >>
> >>
> >>
> >
> >
> >
> >
> 
> 
> 


      


More information about the datatable-help mailing list