[datatable-help] select * and getting the full sub data.table/frame

Akhil Behl akhil at igidr.ac.in
Thu Jan 17 18:25:43 CET 2013


Hey David,

I thought your problem may have been a typo, but I realized that it is
in fact a subtle difference between the way data.table and data.frame
work.

One must provide unquoted names in the `j' expression for a
data.table, i.e. one can say x.dt[ , y] but not x.dt[ , "y"] (which
will evaluate to just "y" and hence the error).

There are tricks around it like using with=FALSE, or using the
data.frame notation x.dt[["y"]]. But once again, you will find such
examples and explanations of idiomatic data.table expressions in the
vignettes.

--
ASB.

On Thu, Jan 17, 2013 at 10:42 PM, David Bellot <david.bellot at gmail.com> wrote:
> Hi Matthew,
>
> I read indeed the introduction but I wasn't sure about the way to write it.
> Hence my question.
>
> In fact, I do agree if the function would sum(sqrt(y)), but in my case, I
> would like to do something like
>
> f <- function(d)  head(d,1)
>
> It's a small example for the sake of simplicity, just to illustrate that I
> really want to have access to the full sub data.frame (the d variable) and
> not just one column.
>
> Best,
> David
>
> On Thu, Jan 17, 2013 at 5:07 PM, Matthew Dowle <mdowle at mdowle.plus.com>
> wrote:
>>
>>
>> Akhil,
>>
>> Kind of, but defining :
>>
>> my.func <- function (d) {
>>     sum(sqrt(d[["y"]]))
>> }
>>
>> followed by
>>
>> x.dt[ , my.func(.SD), by=x]
>>
>> isn't very data.table'ish. In fact the
>> advice is to avoid .SD if possible, for speed.
>>
>> We'd forget my.funct, and just do :
>>
>> x.dt[, sum(sqrt(y)), by=x]
>>
>> That is how we recommend it to be used, and
>> allows data.table to optimize the query (which
>> use of .SD may prevent).
>>
>> David - have you read the introduction vignette and have
>> you worked through example(data.table) at the prompt?
>>
>> Matthew
>>
>>
>>
>> On 17.01.2013 16:53, Akhil Behl wrote:
>>>
>>> If I am not wrong, you are looking for `.SD'. In fact you can put in
>>> the exact function you were throwing at ddply earlier. There are other
>>> special names like .SD that you can find in the data.table FAQs.
>>>
>>> Let's see:
>>> R> require(plyr)
>>> Loading required package: plyr
>>>
>>> R> require(data.table)
>>> Loading required package: data.table
>>> data.table 1.8.7  For help type: help("data.table")
>>>
>>> R> x.df <- data.frame(x=letters[1:2], y=1:10)
>>> R> x.dt <- data.table(x.df)
>>> R>
>>> R> my.func <- function (d) { # Define a function on the subset
>>> + sum(sqrt(d[["y"]]))
>>> + }
>>> R>
>>> R> # The plyr way:
>>> R> ddply(x.df, "x", my.func) -> ans.plyr
>>> R>
>>> R> # The data.table way:
>>> R> x.dt[ , my.func(.SD), by=x] -> ans.dt
>>> R>
>>> R> ans.plyr
>>>   x       V1
>>> 1 a 10.61387
>>> 2 b 11.85441
>>>
>>> R> ans.dt
>>>    x       V1
>>> 1: a 10.61387
>>> 2: b 11.85441
>>>
>>> For more help, try this on an R prompt:
>>>
>>> R> vignette('datatable-faq')
>>>
>>> --
>>> ASB.
>>>
>>> On Thu, Jan 17, 2013 at 9:49 PM, David Bellot <david.bellot at gmail.com>
>>> wrote:
>>>>
>>>> Hi,
>>>>
>>>> I've been looking all around the web without a clear answer to this
>>>> trivial
>>>> problem. I'm sure I'm not looking where I should:
>>>>
>>>> in fact, I want to replace my use of ddply from the plyr package by
>>>> data.table. One of my main use is to group a big data.frame by a group
>>>> of
>>>> variable and do something on this sub data.frame:
>>>>
>>>> ddply( my_df, my_grouping_var, function (d)   { do something with d } )
>>>> ----> d is a data.frame again
>>>>
>>>> and it's slow on big data.frame.
>>>>
>>>>
>>>> However, I don't really understand how to redo the same thing with a
>>>> data.table. Basically if "j" in a data.table is equivalent to the select
>>>> clause in SQL, then how do I do SELECT * FROM etc...
>>>>
>>>> I want to be able to pass a function like in ddply that will receive not
>>>> only a few columns but the full subset that is selected by the "by"
>>>> clause.
>>>>
>>>> Thanks...
>>>> Best,
>>>> David
>>>>
>>>> _______________________________________________
>>>> datatable-help mailing list
>>>> datatable-help at lists.r-forge.r-project.org
>>>>
>>>>
>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>>
>>> _______________________________________________
>>> datatable-help mailing list
>>> datatable-help at lists.r-forge.r-project.org
>>>
>>>
>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
>


More information about the datatable-help mailing list