[datatable-help] Auto-convert characters to factors when settings keys?

Short, Tom TShort at epri.com
Tue May 25 18:24:06 CEST 2010


I don't know much about S4, but I think it'd be tough in practice to
make it work. If a package didn't have a namespace or the user loads
functions with "source", those functions would end up dispatching on
[.data.table, and you'd probably have confusing errors. 

I've thought about a lite verson of data.table that operated on data
frames. You'd have to get rid of [.data.table and use some alias to
[.data.table (pick dtbl which acts like a twisted combination of "with"
and "subset"), so instead of:

  dt[, sum(x), by = a]

It'd be:

  dtbl(dt, j = sum(x), by = a)

An advantage of this approach is that it helps newcomers and eases folks
into more complicated indexing/merging/subsetting possible with data
tables.

You could still have setkey operate on data frames. A big downside is
that it'd be hard to keep the key in sync if the user re-orders the data
frame. One option is to have a very light [.data.table that just does
[.data.frame and zaps the key.

- Tom


> -----Original Message-----
> From: datatable-help-bounces at lists.r-forge.r-project.org 
> [mailto:datatable-help-bounces at lists.r-forge.r-project.org] 
> On Behalf Of mdowle at mdowle.plus.com
> Sent: Tuesday, May 25, 2010 11:23
> To: Steve Lianoglou
> Cc: datatable-help at lists.r-forge.r-project.org
> Subject: Re: [datatable-help] Auto-convert characters to 
> factors when settings keys?
> 
> Perhaps radical suggestion ... should/could we move to S4? 
> Only user code, or packages that call data.table would see 
> the [.data.table signature, otherwise the object would appear 
> to be a data.frame to packages that require data.frame.  Not 
> sure if this is possible or not - I did try once.
> 
> Also tried to inherit data.table from data.frame in S3 using 
> class(df)=c("data.table","data.frame"). As long as the 
> package that only works with data.frame uses is.data.frame() 
> or inherits(), and not class(x)=="data.frame", then it gets 
> past that point but then [.data.table is still dispatched 
> rather than [.data.frame.  I'm thinking S4 signatures might 
> another option if they can be local. Advantage being that no 
> class conversion would then be needed.
> 
> Above is from memory a few years back so may well be wrong. 
> Plus R has moved on in the meantime.
> 
> Tom,  do you have a good example to hand where the 
> data.table<=>data.frame conversion causes grief (syntax as 
> well as speed) that we could use for testing ?
> 
> This also links to the not yet implemented .SDF object. The 
> way I was thinking to do that wouldn't allow use of both .SD 
> and .SDF in the same j, but it would be fast. Would be nicer 
> not to need .SDF though if S4 works.
> 
> 
> > On Tue, May 25, 2010 at 9:39 AM, Short, Tom <TShort at epri.com> wrote:
> >
> >>> I guess I don't understand why you'd want to make setdf and setdt 
> >>> instead of using the as.data.frame/as.data.table functions?
> >>>
> >>> Isn't the as.* more idiomatic S3-OOized R?
> >>
> >> You're right, it's not very standard for R, but if I have 
> a 9-GB data 
> >> table, I may not want the memory consumption of keeping 
> both copies 
> >> around. Because it's nonstandard, maybe it shouldn't be in 
> >> data.table, but it's worth discussing.
> >
> > Ahh ... good point.
> >
> > I see their use case now ;-)
> >
> > +1 for having something like these so people who know what they're
> > doing can use them.
> >
> > --
> > Steve Lianoglou
> > Graduate Student: Computational Systems Biology  | Memorial 
> > Sloan-Kettering Cancer Center  | Weill Medical College of Cornell 
> > University Contact Info: http://cbio.mskcc.org/~lianos/contact
> >
> 
> 
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/d
atatable-help
> 


More information about the datatable-help mailing list