<font><font face="arial,helvetica,sans-serif">Hi,<br><br><font>I've been looking all <font>around the web <font>without a clear answer to this trivial problem. I'm sure I'm <font>n<font>ot looking <font>where I should:<br>
<br><font>in fact, I want to replace my use of ddply from th<font>e </font>plyr </font>package by data.table. One o<font>f my main use is to group a big data.frame by a group of variable and do <font>something on this sub data.frame:<br>
<br><font>d<font>dply<font>( my_df, my_grouping_var, function (d) { <font>do<font> something with d } ) ----> d<font> is a data.frame again</font><br><br><font>and it's slow on big data.frame.<br><br><br></font><font>However</font>, I don't really underst<font>and how to redo the same thing with a data.table. Basically if<font> "j" in a data.<font>table is equivalent to the select clause in SQL, then how do I do SELECT * <font>FROM etc...<br>
<br><font>I want to be able to pass<font> a function like in ddply that will receive no<font>t only a few columns but the full subset that is selected by the "by<font>" clause<font>.<br><br><font><font>Thanks<font>...<br>
<font>Best,<br>David</font><br></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font></font>