Thanks.<div><br></div><div>Here is the bigger picture.</div><div>There are about 2 million records. They need to be grouped using person ID.<div>When we group them, we want to obtain a string where the grouped values are sorted</div>
<div>and concatenated.</div><div><br></div><div>For example</div><div><br></div><div>ID, V1</div><div>--- ---</div><div>1, 2</div><div>1, 1</div><div>2, 8</div><div>2, 3</div><div>2, 5</div><div>2, 2</div><div><br></div>
<div>should become</div><div><br></div><div>ID, Gr_V1</div><div>--- -----</div><div>1, 1,2</div><div>2, 2,3,5,8</div><div><br></div><div>The number of people is about 1,007 K</div><div><br></div><div>I am giving examples because (1) I cannot copy-paste code (2) data & problem are classified</div>
<div>All of these computations are performed on secure machines disconnected from the Internet.</div><div>Using R is not a requirement. Many databases can handle the above using SQL.</div><div>However, these questions came up because I saw data.table while browsing on the Internet</div>
<div>and thought I could give it a try in order to avoid using SQL.<br><br><div class="gmail_quote">On Fri, May 6, 2011 at 6:37 PM, Matthew Dowle <span dir="ltr"><<a href="mailto:mdowle@mdowle.plus.com">mdowle@mdowle.plus.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">Steve H,<br>
How much is 'much better' and 'much longer' please? And on how many<br>
rows/GB? What is the bigger picture, and why are you concatenating<br>
strings together and using paste() at all?<br>
Guess 1: you can include the x column in your key; e.g. setkey(grp,x),<br>
then there would be no need to sort(x) again.<br>
Guess 2: sorting character can be slow. Hence we don't allow character<br>
columns in keys (yet); data.table converts character to factor.<br>
But, ideally, more information at a higher level would help us to help.<br>
<font color="#888888">Matthew<br>
</font><div><div></div><div class="h5"><br>
<br>
On Fri, 2011-05-06 at 12:16 -0700, Steve Harman wrote:<br>
> Connected to this RMySQL performs much better<br>
> (using GROUP BY and functions such as GROUP_CONCAT which allows you<br>
> to<br>
> order and use a separator too).<br>
><br>
> So, I would recommend using them if you want grouping with sorting.<br>
><br>
> On May 6, 2:36 pm, Steve Harman <<a href="mailto:stvhar...@gmail.com">stvhar...@gmail.com</a>> wrote:<br>
> > Hello !<br>
> > When grouping using data.table, mean and sum functions applied within<br>
> > groups work well but if we use sort(x) function it takes much longer.<br>
> ><br>
> > I would like to do first sort(x) and put it inside paste such as<br>
> > paste(sort(x),collapse=",")<br>
> > I was wondering if there is any more efficient of effective way of<br>
> > doing this?<br>
> ><br>
> > thanks in advance,<br>
> ><br>
> > Steve<br>
> > _______________________________________________<br>
> > datatable-help mailing list<br>
> > datatable-h...@lists.r-forge.r-project.orghttps://<a href="http://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatabl." target="_blank">lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatabl.</a>..<br>
> _______________________________________________<br>
> datatable-help mailing list<br>
> <a href="mailto:datatable-help@lists.r-forge.r-project.org">datatable-help@lists.r-forge.r-project.org</a><br>
> <a href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help" target="_blank">https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help</a><br>
<br>
<br>
</div></div></blockquote></div><br></div></div>