[datatable-help] Subsetting that behaves right for both data frames and data.tables?

Matthew Dowle mdowle at mdowle.plus.com
Wed Jul 20 14:48:38 CEST 2011


Hi Chris,

If you're writing a package and don't want to worry if someone passes your
package a data.table, then don't worry; just use data.frame syntax and
your non-datatable-aware package will work fine.

If you're writing your own code you're in control of, just embrace the
data.table ;)

If you're writing a function in an environment which is data.table aware,
but you want your function to accept either data.frame or data.table, then
at the beginning of your function just do :

f = myfunction(x) {
    x = as.data.table(x)
    # proceed with data.table syntax
}

or

f = myfunction(x) {
    x = as.data.frame(x)
    # proceed with data.frame syntax
}

Some of the CRAN packages that depend on data.table are doing that, I think.

In R itself it is common practice to coerce arguments to a common type and
then proceed with the appropriate syntax for that type.  Consider that
matrix syntax is different syntax to data.frame syntax. You often see
as.classiwant() at the beginning of functions, or switches depending on
the type of object.

Remember that is.data.frame() is TRUE for both data.frame and data.table,
but is.data.table() is TRUE only for data.table.  as.data.table() does
nothing if x is already a data.table, and is an efficient class change if
x is a data.frame.  Is efficiency the issue?

Does that help?  If not, more info about the problem will be needed please.

Matthew


> I'm used to seeing the column names at the bottom of the column too, but
> that is only if the data.table is long enough. My example was too short
> for
> that, so I made the same sort of mistake you did :(
>
> Okay, that is a way, but is it a good way? Not sure...
>
> 2011/7/20 Timothée Carayol <timothee.carayol at gmail.com>
>
>> Sorry my mistake -- subset does return a data.table.
>> (I was using as an example a data.table with 100 rows, and stupidly
>> using
>> the fact that it printed the whole thing rather than the 10 first rows
>> only
>> as my criterion for whether it worked or not.. Omitting that
>> print.data.table does print up to 100 rows. I feel a bit stupid.)
>>
>> Why doesn't it work for you if that is the case?
>>
>> DF <- data.frame(a=1:200, b=1:10)
>> DT <- as.data.table(DF)
>> subDT <- subset(DT, select=a)
>> class(DT)
>> subDF <- subset(DF, select=a)
>> class(DF)
>> identical(as.data.frame(DT), DF)
>>
>>
>>
>> On Wed, Jul 20, 2011 at 12:50 PM, Chris Neff <caneff at gmail.com> wrote:
>>
>>> Yeah I realized that myself.
>>>
>>> Another one: the function "with" doesn't seem to do what I want... but
>>> at
>>> least it is consistent!
>>>
>>>
>>> 2011/7/20 Timothée Carayol <timothee.carayol at gmail.com>
>>>
>>>> Sorry --
>>>>
>>>> subset() was a poor idea, as it will return a data.frame even if the
>>>> argument is a data.table..
>>>>
>>>>
>>>>
>>>> 2011/7/20 Timothée Carayol <timothee.carayol at gmail.com>
>>>>
>>>>> Hi--
>>>>>
>>>>> You can use the subset() command with the select= option; not sure
>>>>> it's
>>>>> the best solution, though.
>>>>>
>>>>> Timothee
>>>>>
>>>>>
>>>>> On Wed, Jul 20, 2011 at 12:26 PM, Chris Neff <caneff at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I have a function where I pass a data frame and some variable names
>>>>>> to
>>>>>> calculate statistics on. However, I am at a loss as to how to write
>>>>>> it
>>>>>> correctly so that both data.frame and data.table work with it. If I
>>>>>> have:
>>>>>>
>>>>>> DF = data.frame(x=1:10,y=2:11,z=3:12)
>>>>>>
>>>>>> DT = data.table(DF)
>>>>>>
>>>>>> var.names = c("x","y")
>>>>>>
>>>>>>
>>>>>> I can do the following things to subset:
>>>>>>
>>>>>> DT[,var.names,with=FALSE]
>>>>>> DF[,var.names]
>>>>>>
>>>>>>
>>>>>> but of course DT[,var.names] won't give me back what I want, and
>>>>>> DF[,var.names,with=FALSE] returns an error because with doesn't
>>>>>> exist there.
>>>>>> So how do I do this?
>>>>>>
>>>>>> Thanks,
>>>>>> -Chris
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> datatable-help mailing list
>>>>>> datatable-help at lists.r-forge.r-project.org
>>>>>>
>>>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>




More information about the datatable-help mailing list