[datatable-help] data.table inherits from data.frame [was:Auto-convert characters to factors when settings keys?]

mdowle at mdowle.plus.com mdowle at mdowle.plus.com
Wed Jun 30 12:28:53 CEST 2010


> Option ii -- creating a wrapper around the data.frame creation function --
> does not work with melt() when I tested it.  When I glanced through the
> code, I did not see any calls to data.frame().  It looks like it uses
> rbind() and cbind() to create the new data.frame.

Just to check a few things then ...
melt.data.frame contains a call to data.frame on the line after the do.call.
Whats the error you get ?
Does test.data.table() work ok - how many tests does it run?
How do you install the latest version from r-forge - using type='source',
or using the binary which can take a few days on r-forge depending on your
platform.

> Option i -- I'm not sure what this entails.  What is the advantage over
> having wrapper functions?  Reshape-type functions interest me at the
> moment.

It involves me or Tom, or someone, adding those methods to the data.table
package. The advantage of the wrapper around data.frame() is just that we
don't have to i.e. a quick fix-all. Probably though, we'd want to
implement those methods in a more efficient way in data.table, rather than
dispatch to reshape to do that as a data.frame.

I'll give it some more thought. There were some other threads on this list
along these lines too, need to review them again ...

>
>
> Harish
>
>
> --- On Mon, 6/28/10, mdowle at mdowle.plus.com <mdowle at mdowle.plus.com>
> wrote:
>
>> From: mdowle at mdowle.plus.com <mdowle at mdowle.plus.com>
>> Subject: Re: [datatable-help] data.table inherits from data.frame
>> [was:Auto-convert characters to factors when settings keys?]
>> To: "Short, Tom" <TShort at epri.com>
>> Cc: "Harish" <harishv_99 at yahoo.com>,
>> datatable-help at lists.r-forge.r-project.org
>> Date: Monday, June 28, 2010, 10:33 AM
>> Yes if the function uses data.frame()
>> to create the object it returns,
>> then it should return a data.frame e.g. melt
>> (untested).  If it just calls
>> "[", or similar, on the object then it should keep it as
>> data.table e.g.
>> subset (tested).  The main thing is they work though,
>> so that other
>> packages using those functions also work e.g. ggplot.
>>
>> For returning results back to a data.table aware
>> environment though ...
>>
>> i) We could add data.table methods for melt, reshape, etc.
>> These could
>> either use the data.table equivalent syntax and efficiency,
>> easing users
>> into it (think someone mentioned that before), or these
>> methods could call
>> the package's function returning a data.frame, and then
>> make the class
>> change before returning data.table.  Is it just/mainly
>> reshape people
>> would want data.table methods for?
>>
>> ii) Or, possibly, try this in .GlobalEnv :
>>
>>    data.frame = function(...)
>> as.data.table(base::data.frame(...))
>>
>> and those functions, such as melt, may well then return
>> data.table
>> (untested).  IF that works, then it could be made more
>> flexible :
>>
>>    data.frame = function(...) {
>>       if (cendta())
>> as.data.table(base::data.frame(...))
>>       else base::data.frame(...)
>>    }
>>
>> so users such as you or I, know we really do want a true
>> data.frame, but
>> non-data-data-aware environments would really create
>> data.table for us
>> (while still using [.data.frame on them). We would need to
>> export cendta
>> if users were to do that, in the meantime try
>> data.table:::cendta().
>>
>> Anyway ... way further than I expected to go. If anyone
>> feels like testing
>> that, would be interested if it works! Thanks for the
>> interest and
>> question.
>>
>> Matthew
>>
>>
>> > Harish, you'll still (usually) get back a data.frame
>> in that situation.
>> > There may be a situation involving just manipulation
>> of the original where
>> > the function keeps the object as a data.table.
>> Auto-converting return
>> > values would be a challenge.
>> >
>> > - Tom
>> >
>> >
>> >
>> >
>> >> -----Original Message-----
>> >> From: datatable-help-bounces at lists.r-forge.r-project.org
>> >> [mailto:datatable-help-bounces at lists.r-forge.r-project.org]
>> >> On Behalf Of Harish
>> >> Sent: Monday, June 28, 2010 11:57
>> >> To: datatable-help at lists.r-forge.r-project.org
>> >> Subject: Re: [datatable-help] data.table inherits
>> from
>> >> data.frame [was:Auto-convert characters to factors
>> when
>> >> settings keys?]
>> >>
>> >> Removing the requirement for explicit conversions
>> simplifies
>> >> things a bit.  Thanks.
>> >>
>> >> You commented on arguments passed into a
>> function.  What
>> >> happens to return values?  For example, the
>> reshape functions
>> >> (reshape(), cast(), melt(), etc.) returns a
>> data.frame.  Are
>> >> those automatically treated to be data.table in
>> the workspace
>> >> that is data.table aware?
>> >>
>> >>
>> >> Harish
>> >>
>> >>
>> >> --- On Mon, 6/28/10, mdowle at mdowle.plus.com
>> >> <mdowle at mdowle.plus.com>
>> wrote:
>> >>
>> >> > From: mdowle at mdowle.plus.com
>> <mdowle at mdowle.plus.com>
>> >> > Subject: [datatable-help] data.table inherits
>> from data.frame [was:
>> >> > Auto-convert characters to factors when
>> settings keys?]
>> >> > To: mailinglist.honeypot at gmail.com
>> >> > Cc: datatable-help at lists.r-forge.r-project.org
>> >> > Date: Monday, June 28, 2010, 3:03 AM
>> >> >
>> >> > The subject of this thread got misleading
>> too. Changing that now.
>> >> > Was: Auto-convert characters to factors when
>> settings keys?
>> >> >
>> >> >
>> >> > Thanks Steve,
>> >> >
>> >> > There are two types of users of data.table.
>> Ones who know they are
>> >> > using it, and ones that don't. This change is
>> for the latter. If a
>> >> > base function that only accepts data.frame,
>> such as subset for
>> >> > example, does dt[,c('b','c')] inside it, then
>> that actually
>> >> does work
>> >> > and returns the columns, not c('b','c').
>> >> >
>> >> > For example, at the end of subset.data.frame
>> there is :
>> >> >
>> >> >     x[r, vars, drop = drop]
>> >> >
>> >> > and that will use [.data.frame even though x
>> is data.table. 
>> >>  subset is
>> >> > a base function and as such isn't a
>> data.table aware user.
>> >> >
>> >> > However, when a user (such as you or I) uses
>> data.table,
>> >> then we can
>> >> > use its features such as i and j expressions
>> of column names, joins
>> >> > using x[y][z] syntax, etc.  No changes
>> there.  If we want
>> >> > dt[,c("b","c")] to return the columns, then
>> we will still have to
>> >> > convert to data.frame, as before.  Thats
>> because we work in
>> >> our user
>> >> > workspace which is data.table aware.
>> >> >
>> >> > It depends on where [.data.table was called
>> from.
>> >> >
>> >> > Its just so that packages (e.g. ggplot), and
>> base
>> >> functions, can work
>> >> > with data.table more easily now, without
>> removing any of the
>> >> > advantages of the [.data.table syntax, and
>> without requiring
>> >> > conversion.
>> >> >
>> >> > Makes more sense now hopefully ?
>> >> >
>> >> > Matthew
>> >> >
>> >> >
>> >> > > Hi,
>> >> > >
>> >> > > On Sun, Jun 27, 2010 at 6:47 PM, Matthew
>> Dowle
>> >> > > <mdowle at mdowle.plus.com>
>> >> > > wrote:
>> >> > >> I went back to try again with S3
>> inheritance,
>> >> > discussed further up in
>> >> > >> this thread.  Just committed as it
>> seems, so far,
>> >> > to work.
>> >> > >>
>> >> > >> * data.table now inherits from
>> data.frame i.e.
>> >> > class =
>> >> > >> c("data.table","data.frame")
>> >> > >> * is.data.frame() now returns TRUE
>> for data.table
>> >> > >> * data.table should now be
>> compatible with
>> >> > functions and packages that
>> >> > >> _only_ accept data.frame.
>> >> > >
>> >> > > Perhaps I lost the point of this
>> conversation
>> >> > somewhere along the way,
>> >> > > but this change makes it *technically*
>> compatible
>> >> > since a data.table
>> >> > > passes an is.data.frame test, but it
>> doesn't work in
>> >> > ways that are
>> >> > > perfectly acceptable for some function
>> accepting a
>> >> > data.frame to work,
>> >> > > ie:
>> >> > >
>> >> > > R> library(data.table)
>> >> > > R> dt <- data.table(a=1:5,
>> b=letters[1:5],
>> >> > c=sample(1:100, 5))
>> >> > > R> dt[,c('b', 'c')]
>> >> > > [1] "b" "c"
>> >> > >
>> >> > > instead of
>> >> > >
>> >> > > R> df <- as.data.frame(dt)
>> >> > > R> df[,c('b', 'c')]
>> >> > >   b  c
>> >> > > 1 a 18
>> >> > > 2 b 11
>> >> > > 3 c  2
>> >> > > 4 d 50
>> >> > > 5 e 96
>> >> > >
>> >> > > Unless you were talking about making
>> more changes to
>> >> > make a data.table
>> >> > > act more like a data.frame, I'm not sure
>> allowing the
>> >> > user to ignore
>> >> > > the differences between
>> data.table/frames is really a
>> >> > win/win
>> >> > > situation.
>> >> > >
>> >> > > Sorry if I missed something.
>> >> > > -steve
>> >> > >
>> >> > > --
>> >> > > Steve Lianoglou
>> >> > > Graduate Student: Computational Systems
>> Biology
>> >> > >  | Memorial Sloan-Kettering Cancer
>> Center
>> >> > >  | Weill Medical College of Cornell
>> University  Contact Info:
>> >> > >http://cbio.mskcc.org/~lianos/contact
>> >> > >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> _______________________________________________
>> >> > datatable-help mailing list
>> >> > datatable-help at lists.r-forge.r-project.org
>> >> >
>> >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable
>> >> > -help
>> >> >
>> >>
>> >>
>> >>
>> >> _______________________________________________
>> >> datatable-help mailing list
>> >> datatable-help at lists.r-forge.r-project.org
>> >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/d
>> > atatable-help
>> >>
>> > _______________________________________________
>> > datatable-help mailing list
>> > datatable-help at lists.r-forge.r-project.org
>> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>> >
>>
>>
>>
>
>
>
>




More information about the datatable-help mailing list