[datatable-help] Grouping with sort
Matthew Dowle
mdowle at mdowle.plus.com
Sat May 7 22:28:47 CEST 2011
Hi Steve H,
Please read "Describe the goal, not the step" here :
http://www.catb.org/~esr/faqs/smart-questions.html
Matthew
On Sat, 2011-05-07 at 01:50 -0400, Steve Harman wrote:
> Thanks.
>
>
> Here is the bigger picture.
> There are about 2 million records. They need to be grouped using
> person ID.
> When we group them, we want to obtain a string where the grouped
> values are sorted
> and concatenated.
>
>
> For example
>
>
> ID, V1
> --- ---
> 1, 2
> 1, 1
> 2, 8
> 2, 3
> 2, 5
> 2, 2
>
>
> should become
>
>
> ID, Gr_V1
> --- -----
> 1, 1,2
> 2, 2,3,5,8
>
>
> The number of people is about 1,007 K
>
>
> I am giving examples because (1) I cannot copy-paste code (2) data &
> problem are classified
> All of these computations are performed on secure machines
> disconnected from the Internet.
> Using R is not a requirement. Many databases can handle the above
> using SQL.
> However, these questions came up because I saw data.table while
> browsing on the Internet
> and thought I could give it a try in order to avoid using SQL.
>
> On Fri, May 6, 2011 at 6:37 PM, Matthew Dowle <mdowle at mdowle.plus.com>
> wrote:
> Steve H,
> How much is 'much better' and 'much longer' please? And on how
> many
> rows/GB? What is the bigger picture, and why are you
> concatenating
> strings together and using paste() at all?
> Guess 1: you can include the x column in your key; e.g.
> setkey(grp,x),
> then there would be no need to sort(x) again.
> Guess 2: sorting character can be slow. Hence we don't allow
> character
> columns in keys (yet); data.table converts character to
> factor.
> But, ideally, more information at a higher level would help us
> to help.
> Matthew
>
>
>
> On Fri, 2011-05-06 at 12:16 -0700, Steve Harman wrote:
> > Connected to this RMySQL performs much better
> > (using GROUP BY and functions such as GROUP_CONCAT which
> allows you
> > to
> > order and use a separator too).
> >
> > So, I would recommend using them if you want grouping with
> sorting.
> >
> > On May 6, 2:36 pm, Steve Harman <stvhar... at gmail.com> wrote:
> > > Hello !
> > > When grouping using data.table, mean and sum functions
> applied within
> > > groups work well but if we use sort(x) function it takes
> much longer.
> > >
> > > I would like to do first sort(x) and put it inside paste
> such as
> > > paste(sort(x),collapse=",")
> > > I was wondering if there is any more efficient of
> effective way of
> > > doing this?
> > >
> > > thanks in advance,
> > >
> > > Steve
> > > _______________________________________________
> > > datatable-help mailing list
> > >
> datatable-h... at lists.r-forge.r-project.orghttps://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatabl...
> > _______________________________________________
> > datatable-help mailing list
> > datatable-help at lists.r-forge.r-project.org
> >
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>
>
>
>
>
More information about the datatable-help
mailing list