[datatable-help] select * and getting the full sub data.table/frame

David Bellot david.bellot at gmail.com
Thu Jan 17 18:12:01 CET 2013


Hi Matthew,

I read indeed the introduction but I wasn't sure about the way to write it.
Hence my question.

In fact, I do agree if the function would sum(sqrt(y)), but in my case, I
would like to do something like

f <- function(d)  head(d,1)

It's a small example for the sake of simplicity, just to illustrate that I
really want to have access to the full sub data.frame (the d variable) and
not just one column.

Best,
David

On Thu, Jan 17, 2013 at 5:07 PM, Matthew Dowle <mdowle at mdowle.plus.com>wrote:

>
> Akhil,
>
> Kind of, but defining :
>
> my.func <- function (d) {
>     sum(sqrt(d[["y"]]))
> }
>
> followed by
>
> x.dt[ , my.func(.SD), by=x]
>
> isn't very data.table'ish. In fact the
> advice is to avoid .SD if possible, for speed.
>
> We'd forget my.funct, and just do :
>
> x.dt[, sum(sqrt(y)), by=x]
>
> That is how we recommend it to be used, and
> allows data.table to optimize the query (which
> use of .SD may prevent).
>
> David - have you read the introduction vignette and have
> you worked through example(data.table) at the prompt?
>
> Matthew
>
>
>
> On 17.01.2013 16:53, Akhil Behl wrote:
>
>> If I am not wrong, you are looking for `.SD'. In fact you can put in
>> the exact function you were throwing at ddply earlier. There are other
>> special names like .SD that you can find in the data.table FAQs.
>>
>> Let's see:
>> R> require(plyr)
>> Loading required package: plyr
>>
>> R> require(data.table)
>> Loading required package: data.table
>> data.table 1.8.7  For help type: help("data.table")
>>
>> R> x.df <- data.frame(x=letters[1:2], y=1:10)
>> R> x.dt <- data.table(x.df)
>> R>
>> R> my.func <- function (d) { # Define a function on the subset
>> + sum(sqrt(d[["y"]]))
>> + }
>> R>
>> R> # The plyr way:
>> R> ddply(x.df, "x", my.func) -> ans.plyr
>> R>
>> R> # The data.table way:
>> R> x.dt[ , my.func(.SD), by=x] -> ans.dt
>> R>
>> R> ans.plyr
>>   x       V1
>> 1 a 10.61387
>> 2 b 11.85441
>>
>> R> ans.dt
>>    x       V1
>> 1: a 10.61387
>> 2: b 11.85441
>>
>> For more help, try this on an R prompt:
>>
>> R> vignette('datatable-faq')
>>
>> --
>> ASB.
>>
>> On Thu, Jan 17, 2013 at 9:49 PM, David Bellot <david.bellot at gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I've been looking all around the web without a clear answer to this
>>> trivial
>>> problem. I'm sure I'm not looking where I should:
>>>
>>> in fact, I want to replace my use of ddply from the plyr package by
>>> data.table. One of my main use is to group a big data.frame by a group of
>>> variable and do something on this sub data.frame:
>>>
>>> ddply( my_df, my_grouping_var, function (d)   { do something with d } )
>>> ----> d is a data.frame again
>>>
>>> and it's slow on big data.frame.
>>>
>>>
>>> However, I don't really understand how to redo the same thing with a
>>> data.table. Basically if "j" in a data.table is equivalent to the select
>>> clause in SQL, then how do I do SELECT * FROM etc...
>>>
>>> I want to be able to pass a function like in ddply that will receive not
>>> only a few columns but the full subset that is selected by the "by"
>>> clause.
>>>
>>> Thanks...
>>> Best,
>>> David
>>>
>>> ______________________________**_________________
>>> datatable-help mailing list
>>> datatable-help at lists.r-forge.**r-project.org<datatable-help at lists.r-forge.r-project.org>
>>>
>>> https://lists.r-forge.r-**project.org/cgi-bin/mailman/**
>>> listinfo/datatable-help<https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help>
>>>
>> ______________________________**_________________
>> datatable-help mailing list
>> datatable-help at lists.r-forge.**r-project.org<datatable-help at lists.r-forge.r-project.org>
>>
>> https://lists.r-forge.r-**project.org/cgi-bin/mailman/**
>> listinfo/datatable-help<https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130117/36383f3c/attachment.html>


More information about the datatable-help mailing list