[datatable-help] Subsetting that behaves right for both data frames and data.tables?
Chris Neff
caneff at gmail.com
Wed Jul 20 15:01:39 CEST 2011
Mainly it is that I am writing some library functions that I and a few
others may be using. I don't want those functions to have to depend on
data.table because I don't want it to need to be installed for a purpose
that has nothing to do with it. But I use data.tables as input. Here is a
psuedo example
MyFunc <- function(data, numerator.var, denominator.var)
{
data <- data[order(data[,numerator.var])]
data$metric <- data[, numerator.var] / data[, denominator.var]
data$cum.metric <- cumsum(data$metric)
return(data)
}
I make this example to show that I need to preserve the whole data variable
the whole way through and return a modified version. If I do
data <- as.data.frame(data)
as the first line of that function, then I lose the keys in a potential
data.table that is passed in. If I use
data <- as.data.table(data)
and change the subsetting to be data.table compliant, then I am forcing
someone to have a whole package loaded for something that can be done in the
base language fine. There must be an agnostic way to do this. Apparently
subset doesn't do it either if keys get lost.
-Chris
On 20 July 2011 08:48, Matthew Dowle <mdowle at mdowle.plus.com> wrote:
>
> Hi Chris,
>
> If you're writing a package and don't want to worry if someone passes your
> package a data.table, then don't worry; just use data.frame syntax and
> your non-datatable-aware package will work fine.
>
> If you're writing your own code you're in control of, just embrace the
> data.table ;)
>
> If you're writing a function in an environment which is data.table aware,
> but you want your function to accept either data.frame or data.table, then
> at the beginning of your function just do :
>
> f = myfunction(x) {
> x = as.data.table(x)
> # proceed with data.table syntax
> }
>
> or
>
> f = myfunction(x) {
> x = as.data.frame(x)
> # proceed with data.frame syntax
> }
>
> Some of the CRAN packages that depend on data.table are doing that, I
> think.
>
> In R itself it is common practice to coerce arguments to a common type and
> then proceed with the appropriate syntax for that type. Consider that
> matrix syntax is different syntax to data.frame syntax. You often see
> as.classiwant() at the beginning of functions, or switches depending on
> the type of object.
>
> Remember that is.data.frame() is TRUE for both data.frame and data.table,
> but is.data.table() is TRUE only for data.table. as.data.table() does
> nothing if x is already a data.table, and is an efficient class change if
> x is a data.frame. Is efficiency the issue?
>
> Does that help? If not, more info about the problem will be needed please.
>
> Matthew
>
>
> > I'm used to seeing the column names at the bottom of the column too, but
> > that is only if the data.table is long enough. My example was too short
> > for
> > that, so I made the same sort of mistake you did :(
> >
> > Okay, that is a way, but is it a good way? Not sure...
> >
> > 2011/7/20 Timothée Carayol <timothee.carayol at gmail.com>
> >
> >> Sorry my mistake -- subset does return a data.table.
> >> (I was using as an example a data.table with 100 rows, and stupidly
> >> using
> >> the fact that it printed the whole thing rather than the 10 first rows
> >> only
> >> as my criterion for whether it worked or not.. Omitting that
> >> print.data.table does print up to 100 rows. I feel a bit stupid.)
> >>
> >> Why doesn't it work for you if that is the case?
> >>
> >> DF <- data.frame(a=1:200, b=1:10)
> >> DT <- as.data.table(DF)
> >> subDT <- subset(DT, select=a)
> >> class(DT)
> >> subDF <- subset(DF, select=a)
> >> class(DF)
> >> identical(as.data.frame(DT), DF)
> >>
> >>
> >>
> >> On Wed, Jul 20, 2011 at 12:50 PM, Chris Neff <caneff at gmail.com> wrote:
> >>
> >>> Yeah I realized that myself.
> >>>
> >>> Another one: the function "with" doesn't seem to do what I want... but
> >>> at
> >>> least it is consistent!
> >>>
> >>>
> >>> 2011/7/20 Timothée Carayol <timothee.carayol at gmail.com>
> >>>
> >>>> Sorry --
> >>>>
> >>>> subset() was a poor idea, as it will return a data.frame even if the
> >>>> argument is a data.table..
> >>>>
> >>>>
> >>>>
> >>>> 2011/7/20 Timothée Carayol <timothee.carayol at gmail.com>
> >>>>
> >>>>> Hi--
> >>>>>
> >>>>> You can use the subset() command with the select= option; not sure
> >>>>> it's
> >>>>> the best solution, though.
> >>>>>
> >>>>> Timothee
> >>>>>
> >>>>>
> >>>>> On Wed, Jul 20, 2011 at 12:26 PM, Chris Neff <caneff at gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> I have a function where I pass a data frame and some variable names
> >>>>>> to
> >>>>>> calculate statistics on. However, I am at a loss as to how to write
> >>>>>> it
> >>>>>> correctly so that both data.frame and data.table work with it. If I
> >>>>>> have:
> >>>>>>
> >>>>>> DF = data.frame(x=1:10,y=2:11,z=3:12)
> >>>>>>
> >>>>>> DT = data.table(DF)
> >>>>>>
> >>>>>> var.names = c("x","y")
> >>>>>>
> >>>>>>
> >>>>>> I can do the following things to subset:
> >>>>>>
> >>>>>> DT[,var.names,with=FALSE]
> >>>>>> DF[,var.names]
> >>>>>>
> >>>>>>
> >>>>>> but of course DT[,var.names] won't give me back what I want, and
> >>>>>> DF[,var.names,with=FALSE] returns an error because with doesn't
> >>>>>> exist there.
> >>>>>> So how do I do this?
> >>>>>>
> >>>>>> Thanks,
> >>>>>> -Chris
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> datatable-help mailing list
> >>>>>> datatable-help at lists.r-forge.r-project.org
> >>>>>>
> >>>>>>
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> >>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> > _______________________________________________
> > datatable-help mailing list
> > datatable-help at lists.r-forge.r-project.org
> >
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> >
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20110720/bf22118f/attachment.htm>
More information about the datatable-help
mailing list