<font><font face="arial,helvetica,sans-serif">wow ! You just save<font>d me h<font>ours of computation. Now I can <font>get it <font>of all my ddply !<br><font>Many thanks !<br><br><font><font>May I ask for something else<font>: in your function you use the notation d[["y"]]. I tried to use d[ , "y" ] <font>instead of it and got a<font>n error message "Non-numeric argument to mathematical function".<br>
<br><font>However if I use one or the other notation in sqrt directly on the command line it works.<br><br><font>So in <font>that specific case, what<font>'s the differen<font>ce in using d[["y"]] in place of d[, "y"]<br>
<br><font>Man<font>y thanks again for your help.<br><br><font>Best,<br><font>David</font><br></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font><br>
<div class="gmail_quote">On Thu, Jan 17, 2013 at 4:53 PM, Akhil Behl <span dir="ltr"><<a href="mailto:akhil@igidr.ac.in" target="_blank">akhil@igidr.ac.in</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
If I am not wrong, you are looking for `.SD'. In fact you can put in<br>
the exact function you were throwing at ddply earlier. There are other<br>
special names like .SD that you can find in the data.table FAQs.<br>
<br>
Let's see:<br>
R> require(plyr)<br>
Loading required package: plyr<br>
<br>
R> require(data.table)<br>
Loading required package: data.table<br>
data.table 1.8.7 For help type: help("data.table")<br>
<br>
R> x.df <- data.frame(x=letters[1:2], y=1:10)<br>
R> x.dt <- data.table(x.df)<br>
R><br>
R> my.func <- function (d) { # Define a function on the subset<br>
+ sum(sqrt(d[["y"]]))<br>
+ }<br>
R><br>
R> # The plyr way:<br>
R> ddply(x.df, "x", my.func) -> ans.plyr<br>
R><br>
R> # The data.table way:<br>
R> x.dt[ , my.func(.SD), by=x] -> ans.dt<br>
R><br>
R> ans.plyr<br>
x V1<br>
1 a 10.61387<br>
2 b 11.85441<br>
<br>
R> ans.dt<br>
x V1<br>
1: a 10.61387<br>
2: b 11.85441<br>
<br>
For more help, try this on an R prompt:<br>
<br>
R> vignette('datatable-faq')<br>
<br>
--<br>
ASB.<br>
<div><div class="h5"><br>
On Thu, Jan 17, 2013 at 9:49 PM, David Bellot <<a href="mailto:david.bellot@gmail.com">david.bellot@gmail.com</a>> wrote:<br>
> Hi,<br>
><br>
> I've been looking all around the web without a clear answer to this trivial<br>
> problem. I'm sure I'm not looking where I should:<br>
><br>
> in fact, I want to replace my use of ddply from the plyr package by<br>
> data.table. One of my main use is to group a big data.frame by a group of<br>
> variable and do something on this sub data.frame:<br>
><br>
> ddply( my_df, my_grouping_var, function (d) { do something with d } )<br>
> ----> d is a data.frame again<br>
><br>
> and it's slow on big data.frame.<br>
><br>
><br>
> However, I don't really understand how to redo the same thing with a<br>
> data.table. Basically if "j" in a data.table is equivalent to the select<br>
> clause in SQL, then how do I do SELECT * FROM etc...<br>
><br>
> I want to be able to pass a function like in ddply that will receive not<br>
> only a few columns but the full subset that is selected by the "by" clause.<br>
><br>
> Thanks...<br>
> Best,<br>
> David<br>
><br>
</div></div>> _______________________________________________<br>
> datatable-help mailing list<br>
> <a href="mailto:datatable-help@lists.r-forge.r-project.org">datatable-help@lists.r-forge.r-project.org</a><br>
> <a href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help" target="_blank">https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help</a><br>
_______________________________________________<br>
datatable-help mailing list<br>
<a href="mailto:datatable-help@lists.r-forge.r-project.org">datatable-help@lists.r-forge.r-project.org</a><br>
<a href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help" target="_blank">https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help</a><br>
</blockquote></div><br>