[datatable-help] data.table inherits from data.frame [was:Auto-convert characters to factors when settings keys?]

mdowle at mdowle.plus.com mdowle at mdowle.plus.com
Thu Jul 1 11:29:00 CEST 2010


I too only recently realised you can install from source. On Linux its
built-in but on Windows you need the Rtools toolset installed.

Thanks for reporting back. Interesting. I'm not sure why the melt()
doesn't see the data.frame wrapper. Could you paste that email into a
feature request on the tracker please so we don't forget.

Btw, it seems you have ggplot loaded, the vast majority of that 11 second
test time is the 2 new tests of ggplot.  I don't have it loaded and it
skips those tests to save time.  Just an option when developing the
package and running test.data.table quite a lot.

Option 1 - the task is basically to create melt.data.table,
reshape.data.table, etc,  and add them to data.table.R or perhaps a new
file reshape.R would be tidier, and implement them in the data.table
syntax probably. If thats something you'd like to do then click to join
the project, I'll approve you, and you'll have commit rights to the code,
the documentation and the www page, all in one place.

Matthew


> Matthew,
>
> Option 2 --
>
> I have been installing the binaries; I did not realize there was an option
> for me to install the source.  I used to look at the Rev which contained
> the changes (via R-Forge) and waited for the corresponding binaries to be
> published.  Hence my responses to fixes were delayed.
>
> I missed the data.frame creation.  I mistakenly ended up looking at
> reshape() code rather than melt().
>
> ==== I tried this ====
>
> data.frame = function(...) as.data.table(base::data.frame(...))
> DT <- data.table( A=letters[1:3], B=11:22, C=1:2 )
> class( melt( DT ) )
> class( data.frame( a=1:5 ) )
> test.data.table()
>
> ==== Output ====
>
> Using A as id variables
> [1] "data.frame"                 # output of melt()
> [1] "data.table" "data.frame"    # output of data.frame I created
> All 170 tests in test.data.table() completed ok in 11.166sec
>
> ================
>
> You can see that the melt() still returned a data.frame.  However,
> creating a data.frame in .GlobalEnv used the conversion function to create
> a data.table.
>
> Conceptually, I understand why you are thinking that the conversion
> function should be executed.  However, I am not sure why it is not being
> executed.
>
>
> Option 1 -- I'd be happy to help in any way.
>
>
> Harish
>
>
> --- On Wed, 6/30/10, mdowle at mdowle.plus.com <mdowle at mdowle.plus.com>
> wrote:
>
>> From: mdowle at mdowle.plus.com <mdowle at mdowle.plus.com>
>> Subject: Re: [datatable-help] data.table inherits from data.frame
>> [was:Auto-convert characters to factors when settings keys?]
>> To: "Harish" <harishv_99 at yahoo.com>
>> Cc: datatable-help at lists.r-forge.r-project.org
>> Date: Wednesday, June 30, 2010, 3:28 AM
>> > Option ii -- creating a wrapper
>> around the data.frame creation function --
>> > does not work with melt() when I tested it.  When
>> I glanced through the
>> > code, I did not see any calls to data.frame(). 
>> It looks like it uses
>> > rbind() and cbind() to create the new data.frame.
>>
>> Just to check a few things then ...
>> melt.data.frame contains a call to data.frame on the line
>> after the do.call.
>> Whats the error you get ?
>> Does test.data.table() work ok - how many tests does it
>> run?
>> How do you install the latest version from r-forge - using
>> type='source',
>> or using the binary which can take a few days on r-forge
>> depending on your
>> platform.
>>
>> > Option i -- I'm not sure what this entails.  What
>> is the advantage over
>> > having wrapper functions?  Reshape-type functions
>> interest me at the
>> > moment.
>>
>> It involves me or Tom, or someone, adding those methods to
>> the data.table
>> package. The advantage of the wrapper around data.frame()
>> is just that we
>> don't have to i.e. a quick fix-all. Probably though, we'd
>> want to
>> implement those methods in a more efficient way in
>> data.table, rather than
>> dispatch to reshape to do that as a data.frame.
>>
>> I'll give it some more thought. There were some other
>> threads on this list
>> along these lines too, need to review them again ...
>>
>> >
>> >
>> > Harish
>> >
>> >
>> > --- On Mon, 6/28/10, mdowle at mdowle.plus.com
>> <mdowle at mdowle.plus.com>
>> > wrote:
>> >
>> >> From: mdowle at mdowle.plus.com
>> <mdowle at mdowle.plus.com>
>> >> Subject: Re: [datatable-help] data.table inherits
>> from data.frame
>> >> [was:Auto-convert characters to factors when
>> settings keys?]
>> >> To: "Short, Tom" <TShort at epri.com>
>> >> Cc: "Harish" <harishv_99 at yahoo.com>,
>> >> datatable-help at lists.r-forge.r-project.org
>> >> Date: Monday, June 28, 2010, 10:33 AM
>> >> Yes if the function uses data.frame()
>> >> to create the object it returns,
>> >> then it should return a data.frame e.g. melt
>> >> (untested).  If it just calls
>> >> "[", or similar, on the object then it should keep
>> it as
>> >> data.table e.g.
>> >> subset (tested).  The main thing is they work
>> though,
>> >> so that other
>> >> packages using those functions also work e.g.
>> ggplot.
>> >>
>> >> For returning results back to a data.table aware
>> >> environment though ...
>> >>
>> >> i) We could add data.table methods for melt,
>> reshape, etc.
>> >> These could
>> >> either use the data.table equivalent syntax and
>> efficiency,
>> >> easing users
>> >> into it (think someone mentioned that before), or
>> these
>> >> methods could call
>> >> the package's function returning a data.frame, and
>> then
>> >> make the class
>> >> change before returning data.table.  Is it
>> just/mainly
>> >> reshape people
>> >> would want data.table methods for?
>> >>
>> >> ii) Or, possibly, try this in .GlobalEnv :
>> >>
>> >>    data.frame = function(...)
>> >> as.data.table(base::data.frame(...))
>> >>
>> >> and those functions, such as melt, may well then
>> return
>> >> data.table
>> >> (untested).  IF that works, then it could be made
>> more
>> >> flexible :
>> >>
>> >>    data.frame = function(...) {
>> >>       if (cendta())
>> >> as.data.table(base::data.frame(...))
>> >>       else base::data.frame(...)
>> >>    }
>> >>
>> >> so users such as you or I, know we really do want
>> a true
>> >> data.frame, but
>> >> non-data-data-aware environments would really
>> create
>> >> data.table for us
>> >> (while still using [.data.frame on them). We would
>> need to
>> >> export cendta
>> >> if users were to do that, in the meantime try
>> >> data.table:::cendta().
>> >>
>> >> Anyway ... way further than I expected to go. If
>> anyone
>> >> feels like testing
>> >> that, would be interested if it works! Thanks for
>> the
>> >> interest and
>> >> question.
>> >>
>> >> Matthew
>> >>
>> >>
>> >> > Harish, you'll still (usually) get back a
>> data.frame
>> >> in that situation.
>> >> > There may be a situation involving just
>> manipulation
>> >> of the original where
>> >> > the function keeps the object as a
>> data.table.
>> >> Auto-converting return
>> >> > values would be a challenge.
>> >> >
>> >> > - Tom
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >> -----Original Message-----
>> >> >> From: datatable-help-bounces at lists.r-forge.r-project.org
>> >> >> [mailto:datatable-help-bounces at lists.r-forge.r-project.org]
>> >> >> On Behalf Of Harish
>> >> >> Sent: Monday, June 28, 2010 11:57
>> >> >> To: datatable-help at lists.r-forge.r-project.org
>> >> >> Subject: Re: [datatable-help] data.table
>> inherits
>> >> from
>> >> >> data.frame [was:Auto-convert characters
>> to factors
>> >> when
>> >> >> settings keys?]
>> >> >>
>> >> >> Removing the requirement for explicit
>> conversions
>> >> simplifies
>> >> >> things a bit.  Thanks.
>> >> >>
>> >> >> You commented on arguments passed into a
>> >> function.  What
>> >> >> happens to return values?  For example,
>> the
>> >> reshape functions
>> >> >> (reshape(), cast(), melt(), etc.) returns
>> a
>> >> data.frame.  Are
>> >> >> those automatically treated to be
>> data.table in
>> >> the workspace
>> >> >> that is data.table aware?
>> >> >>
>> >> >>
>> >> >> Harish
>> >> >>
>> >> >>
>> >> >> --- On Mon, 6/28/10, mdowle at mdowle.plus.com
>> >> >> <mdowle at mdowle.plus.com>
>> >> wrote:
>> >> >>
>> >> >> > From: mdowle at mdowle.plus.com
>> >> <mdowle at mdowle.plus.com>
>> >> >> > Subject: [datatable-help] data.table
>> inherits
>> >> from data.frame [was:
>> >> >> > Auto-convert characters to factors
>> when
>> >> settings keys?]
>> >> >> > To: mailinglist.honeypot at gmail.com
>> >> >> > Cc: datatable-help at lists.r-forge.r-project.org
>> >> >> > Date: Monday, June 28, 2010, 3:03
>> AM
>> >> >> >
>> >> >> > The subject of this thread got
>> misleading
>> >> too. Changing that now.
>> >> >> > Was: Auto-convert characters to
>> factors when
>> >> settings keys?
>> >> >> >
>> >> >> >
>> >> >> > Thanks Steve,
>> >> >> >
>> >> >> > There are two types of users of
>> data.table.
>> >> Ones who know they are
>> >> >> > using it, and ones that don't. This
>> change is
>> >> for the latter. If a
>> >> >> > base function that only accepts
>> data.frame,
>> >> such as subset for
>> >> >> > example, does dt[,c('b','c')] inside
>> it, then
>> >> that actually
>> >> >> does work
>> >> >> > and returns the columns, not
>> c('b','c').
>> >> >> >
>> >> >> > For example, at the end of
>> subset.data.frame
>> >> there is :
>> >> >> >
>> >> >> >     x[r, vars, drop = drop]
>> >> >> >
>> >> >> > and that will use [.data.frame even
>> though x
>> >> is data.table. 
>> >> >>  subset is
>> >> >> > a base function and as such isn't a
>> >> data.table aware user.
>> >> >> >
>> >> >> > However, when a user (such as you or
>> I) uses
>> >> data.table,
>> >> >> then we can
>> >> >> > use its features such as i and j
>> expressions
>> >> of column names, joins
>> >> >> > using x[y][z] syntax, etc.  No
>> changes
>> >> there.  If we want
>> >> >> > dt[,c("b","c")] to return the
>> columns, then
>> >> we will still have to
>> >> >> > convert to data.frame, as before. 
>> Thats
>> >> because we work in
>> >> >> our user
>> >> >> > workspace which is data.table
>> aware.
>> >> >> >
>> >> >> > It depends on where [.data.table was
>> called
>> >> from.
>> >> >> >
>> >> >> > Its just so that packages (e.g.
>> ggplot), and
>> >> base
>> >> >> functions, can work
>> >> >> > with data.table more easily now,
>> without
>> >> removing any of the
>> >> >> > advantages of the [.data.table
>> syntax, and
>> >> without requiring
>> >> >> > conversion.
>> >> >> >
>> >> >> > Makes more sense now hopefully ?
>> >> >> >
>> >> >> > Matthew
>> >> >> >
>> >> >> >
>> >> >> > > Hi,
>> >> >> > >
>> >> >> > > On Sun, Jun 27, 2010 at 6:47
>> PM, Matthew
>> >> Dowle
>> >> >> > > <mdowle at mdowle.plus.com>
>> >> >> > > wrote:
>> >> >> > >> I went back to try again
>> with S3
>> >> inheritance,
>> >> >> > discussed further up in
>> >> >> > >> this thread.  Just
>> committed as it
>> >> seems, so far,
>> >> >> > to work.
>> >> >> > >>
>> >> >> > >> * data.table now inherits
>> from
>> >> data.frame i.e.
>> >> >> > class =
>> >> >> > >>
>> c("data.table","data.frame")
>> >> >> > >> * is.data.frame() now
>> returns TRUE
>> >> for data.table
>> >> >> > >> * data.table should now be
>> >> compatible with
>> >> >> > functions and packages that
>> >> >> > >> _only_ accept data.frame.
>> >> >> > >
>> >> >> > > Perhaps I lost the point of
>> this
>> >> conversation
>> >> >> > somewhere along the way,
>> >> >> > > but this change makes it
>> *technically*
>> >> compatible
>> >> >> > since a data.table
>> >> >> > > passes an is.data.frame test,
>> but it
>> >> doesn't work in
>> >> >> > ways that are
>> >> >> > > perfectly acceptable for some
>> function
>> >> accepting a
>> >> >> > data.frame to work,
>> >> >> > > ie:
>> >> >> > >
>> >> >> > > R> library(data.table)
>> >> >> > > R> dt <-
>> data.table(a=1:5,
>> >> b=letters[1:5],
>> >> >> > c=sample(1:100, 5))
>> >> >> > > R> dt[,c('b', 'c')]
>> >> >> > > [1] "b" "c"
>> >> >> > >
>> >> >> > > instead of
>> >> >> > >
>> >> >> > > R> df <-
>> as.data.frame(dt)
>> >> >> > > R> df[,c('b', 'c')]
>> >> >> > >   b  c
>> >> >> > > 1 a 18
>> >> >> > > 2 b 11
>> >> >> > > 3 c  2
>> >> >> > > 4 d 50
>> >> >> > > 5 e 96
>> >> >> > >
>> >> >> > > Unless you were talking about
>> making
>> >> more changes to
>> >> >> > make a data.table
>> >> >> > > act more like a data.frame, I'm
>> not sure
>> >> allowing the
>> >> >> > user to ignore
>> >> >> > > the differences between
>> >> data.table/frames is really a
>> >> >> > win/win
>> >> >> > > situation.
>> >> >> > >
>> >> >> > > Sorry if I missed something.
>> >> >> > > -steve
>> >> >> > >
>> >> >> > > --
>> >> >> > > Steve Lianoglou
>> >> >> > > Graduate Student: Computational
>> Systems
>> >> Biology
>> >> >> > >  | Memorial Sloan-Kettering
>> Cancer
>> >> Center
>> >> >> > >  | Weill Medical College of
>> Cornell
>> >> University  Contact Info:
>> >> >> > >http://cbio.mskcc.org/~lianos/contact
>> >> >> > >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> _______________________________________________
>> >> >> > datatable-help mailing list
>> >> >> > datatable-help at lists.r-forge.r-project.org
>> >> >> >
>> >> >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable
>> >> >> > -help
>> >> >> >
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> _______________________________________________
>> >> >> datatable-help mailing list
>> >> >> datatable-help at lists.r-forge.r-project.org
>> >> >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/d
>> >> > atatable-help
>> >> >>
>> >> >
>> _______________________________________________
>> >> > datatable-help mailing list
>> >> > datatable-help at lists.r-forge.r-project.org
>> >> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>> >> >
>> >>
>> >>
>> >>
>> >
>> >
>> >
>> >
>>
>>
>>
>
>
>
>




More information about the datatable-help mailing list