[datatable-help] Auto-convert characters to factors when settings keys?

Matthew Dowle mdowle at mdowle.plus.com
Mon Jun 28 00:47:49 CEST 2010


I went back to try again with S3 inheritance, discussed further up in
this thread.  Just committed as it seems, so far, to work.

* data.table now inherits from data.frame i.e. class =
c("data.table","data.frame")
* is.data.frame() now returns TRUE for data.table
* data.table should now be compatible with functions and packages that
_only_ accept data.frame.


Before change (v 1.4.1)
=======================
> DT = data.table(a=1:6,b=6:11,grp=1:2)

> ggplot(DT,aes(a,b))+geom_point()
Error: ggplot2 doesn't know how to deal with data of class data.table

> mean(DT)
[1] NA
Warning message:
In mean.default(DT) : argument is not numeric or logical: returning NA


After change (v 1.5 on r-forge)
===============================
> ggplot(DT,aes(a,b))+geom_point()  # works

> DT[, print(ggplot(.SD,aes(a,b))+geom_point()), by=grp]  # works even
though .SD is a data.table, no need for .SDF or as.data.frame.

> mean(DT)
  a   b grp 
3.5 8.5 1.5 
> 

So hopefully no longer any need for conversion of DT or .SD to
data.frame, saving time and memory. If this works, we don't need to
implement .SDF and things become simpler. For example, there is no
mean.data.table method needed. The mean is working there via dispatch to
mean.data.frame.

It uses topenv() to find the package [.data.table was immediately called
from. If that is .GlobalEnv, data.table itself, or any package that
calls require(data.table), then the caller is considered 'data.table
aware' and [.data.table continues. Otherwise (e.g. when called from base
or ggplot) it redirects to [.data.frame. Similar for other methods.

Conversion to data.frame may still be a valid thing to do of course, and
code that does that should still work.

If anyone has time to test and report back, much appreciated. Then we
can decide whether to go with this, or not.

Matthew



On Tue, 2010-05-25 at 12:12 -0700, Short, Tom wrote:
> > -----Original Message-----
> > From: datatable-help-bounces at lists.r-forge.r-project.org 
> > [mailto:datatable-help-bounces at lists.r-forge.r-project.org] 
> > On Behalf Of mdowle at mdowle.plus.com
> > Sent: Tuesday, May 25, 2010 13:10
> > To: Short, Tom
> > Cc: datatable-help at lists.r-forge.r-project.org
> > Subject: Re: [datatable-help] Auto-convert characters to 
> > factors when settings keys?
> > 
> > 
> > I'm not up to speed with S4 either. Hm.
> > 
> > I don't mind the dtbl approach other than it gets ugly when 
> > compounding statements :
> > 
> >    dt[where,select,by][having][join,...][where][join,...]...
> > 
> > translates to
> > 
> >    
> > dtbl(dtbl(dtbl(dtbl(dtbl(dt,where,select,by),having),join),whe
> > re),join)
> > 
> > Wouldn't be a problem if we could allow both and let the user chose.
> 
> Good point. That's messy. We could have a dtbl that worked on either
> data tables or data frames and a [.data.table that worked just with
> data.tables. Or, we could skip the whole idea:)
> 
> - Tom




More information about the datatable-help mailing list