<font><font face="arial,helvetica,sans-serif">Hi Matthew,<br><br><font>I read indeed the intro<font>duction but I wasn't sure about the way to write it. Hence my question.<br><br><font>In fact, I do agree i<font>f the <font>function would sum(sqrt(y)), b<font>ut in my case, I would like to do something like <br>
<br><font><font>f <<font>- function(d) head(d,1)<br><br><font>It's a small example for the sake of simplicity, just to illustrate that I really want to have access to the full sub data.frame (the d variable) and not just one column.<br>
<br><font>Best,<br><font>David</font><br></font></font></font></font></font></font></font></font></font></font></font></font></font><br><div class="gmail_quote">On Thu, Jan 17, 2013 at 5:07 PM, Matthew Dowle <span dir="ltr"><<a href="mailto:mdowle@mdowle.plus.com" target="_blank">mdowle@mdowle.plus.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>
Akhil,<br>
<br>
Kind of, but defining :<br>
<br>
my.func <- function (d) {<br>
sum(sqrt(d[["y"]]))<br>
}<br>
<br>
followed by<br>
<br>
x.dt[ , my.func(.SD), by=x]<br>
<br>
isn't very data.table'ish. In fact the<br>
advice is to avoid .SD if possible, for speed.<br>
<br>
We'd forget my.funct, and just do :<br>
<br>
x.dt[, sum(sqrt(y)), by=x]<br>
<br>
That is how we recommend it to be used, and<br>
allows data.table to optimize the query (which<br>
use of .SD may prevent).<br>
<br>
David - have you read the introduction vignette and have<br>
you worked through example(data.table) at the prompt?<span class="HOEnZb"><font color="#888888"><br>
<br>
Matthew</font></span><div class="HOEnZb"><div class="h5"><br>
<br>
<br>
On 17.01.2013 16:53, Akhil Behl wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
If I am not wrong, you are looking for `.SD'. In fact you can put in<br>
the exact function you were throwing at ddply earlier. There are other<br>
special names like .SD that you can find in the data.table FAQs.<br>
<br>
Let's see:<br>
R> require(plyr)<br>
Loading required package: plyr<br>
<br>
R> require(data.table)<br>
Loading required package: data.table<br>
data.table 1.8.7 For help type: help("data.table")<br>
<br>
R> x.df <- data.frame(x=letters[1:2], y=1:10)<br>
R> x.dt <- data.table(x.df)<br>
R><br>
R> my.func <- function (d) { # Define a function on the subset<br>
+ sum(sqrt(d[["y"]]))<br>
+ }<br>
R><br>
R> # The plyr way:<br>
R> ddply(x.df, "x", my.func) -> ans.plyr<br>
R><br>
R> # The data.table way:<br>
R> x.dt[ , my.func(.SD), by=x] -> ans.dt<br>
R><br>
R> ans.plyr<br>
x V1<br>
1 a 10.61387<br>
2 b 11.85441<br>
<br>
R> ans.dt<br>
x V1<br>
1: a 10.61387<br>
2: b 11.85441<br>
<br>
For more help, try this on an R prompt:<br>
<br>
R> vignette('datatable-faq')<br>
<br>
--<br>
ASB.<br>
<br>
On Thu, Jan 17, 2013 at 9:49 PM, David Bellot <<a href="mailto:david.bellot@gmail.com" target="_blank">david.bellot@gmail.com</a>> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Hi,<br>
<br>
I've been looking all around the web without a clear answer to this trivial<br>
problem. I'm sure I'm not looking where I should:<br>
<br>
in fact, I want to replace my use of ddply from the plyr package by<br>
data.table. One of my main use is to group a big data.frame by a group of<br>
variable and do something on this sub data.frame:<br>
<br>
ddply( my_df, my_grouping_var, function (d) { do something with d } )<br>
----> d is a data.frame again<br>
<br>
and it's slow on big data.frame.<br>
<br>
<br>
However, I don't really understand how to redo the same thing with a<br>
data.table. Basically if "j" in a data.table is equivalent to the select<br>
clause in SQL, then how do I do SELECT * FROM etc...<br>
<br>
I want to be able to pass a function like in ddply that will receive not<br>
only a few columns but the full subset that is selected by the "by" clause.<br>
<br>
Thanks...<br>
Best,<br>
David<br>
<br>
______________________________<u></u>_________________<br>
datatable-help mailing list<br>
<a href="mailto:datatable-help@lists.r-forge.r-project.org" target="_blank">datatable-help@lists.r-forge.<u></u>r-project.org</a><br>
<br>
<a href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help" target="_blank">https://lists.r-forge.r-<u></u>project.org/cgi-bin/mailman/<u></u>listinfo/datatable-help</a><br>
</blockquote>
______________________________<u></u>_________________<br>
datatable-help mailing list<br>
<a href="mailto:datatable-help@lists.r-forge.r-project.org" target="_blank">datatable-help@lists.r-forge.<u></u>r-project.org</a><br>
<br>
<a href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help" target="_blank">https://lists.r-forge.r-<u></u>project.org/cgi-bin/mailman/<u></u>listinfo/datatable-help</a><br>
</blockquote>
</div></div></blockquote></div><br>