[datatable-help] select * and getting the full sub data.table/frame

David Bellot david.bellot at gmail.com
Thu Jan 17 18:08:20 CET 2013


wow ! You just saved me hours of computation. Now I can get it of all my
ddply !
Many thanks !

May I ask for something else: in your function you use the notation
d[["y"]].  I tried to use d[ , "y" ] instead of it and got an error message
"Non-numeric argument to mathematical function".

However if I use one or the other notation in sqrt directly on the command
line it works.

So in that specific case, what's the difference in using d[["y"]] in place
of d[, "y"]

Many thanks again for your help.

Best,
David

On Thu, Jan 17, 2013 at 4:53 PM, Akhil Behl <akhil at igidr.ac.in> wrote:

> If I am not wrong, you are looking for `.SD'. In fact you can put in
> the exact function you were throwing at ddply earlier. There are other
> special names like .SD that you can find in the data.table FAQs.
>
> Let's see:
> R> require(plyr)
> Loading required package: plyr
>
> R> require(data.table)
> Loading required package: data.table
> data.table 1.8.7  For help type: help("data.table")
>
> R> x.df <- data.frame(x=letters[1:2], y=1:10)
> R> x.dt <- data.table(x.df)
> R>
> R> my.func <- function (d) { # Define a function on the subset
> + sum(sqrt(d[["y"]]))
> + }
> R>
> R> # The plyr way:
> R> ddply(x.df, "x", my.func) -> ans.plyr
> R>
> R> # The data.table way:
> R> x.dt[ , my.func(.SD), by=x] -> ans.dt
> R>
> R> ans.plyr
>   x       V1
> 1 a 10.61387
> 2 b 11.85441
>
> R> ans.dt
>    x       V1
> 1: a 10.61387
> 2: b 11.85441
>
> For more help, try this on an R prompt:
>
> R> vignette('datatable-faq')
>
> --
> ASB.
>
> On Thu, Jan 17, 2013 at 9:49 PM, David Bellot <david.bellot at gmail.com>
> wrote:
> > Hi,
> >
> > I've been looking all around the web without a clear answer to this
> trivial
> > problem. I'm sure I'm not looking where I should:
> >
> > in fact, I want to replace my use of ddply from the plyr package by
> > data.table. One of my main use is to group a big data.frame by a group of
> > variable and do something on this sub data.frame:
> >
> > ddply( my_df, my_grouping_var, function (d)   { do something with d } )
> > ----> d is a data.frame again
> >
> > and it's slow on big data.frame.
> >
> >
> > However, I don't really understand how to redo the same thing with a
> > data.table. Basically if "j" in a data.table is equivalent to the select
> > clause in SQL, then how do I do SELECT * FROM etc...
> >
> > I want to be able to pass a function like in ddply that will receive not
> > only a few columns but the full subset that is selected by the "by"
> clause.
> >
> > Thanks...
> > Best,
> > David
> >
> > _______________________________________________
> > datatable-help mailing list
> > datatable-help at lists.r-forge.r-project.org
> >
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20130117/7341a121/attachment.html>


More information about the datatable-help mailing list