[datatable-help] Fwd: Stackoverflow thread comparing merge times

Tom Short tshort.rlists at gmail.com
Tue Dec 7 20:37:33 CET 2010


Forgot to reply to the list...

On Tue, Dec 7, 2010 at 2:07 PM, Matthew Dowle <mdowle at mdowle.plus.com> wrote:
>
> Does anyone have time to see if this post uses data.table correctly :
>
> http://stackoverflow.com/questions/4322219/whats-the-fastest-way-to-merge-join-data-frames-in-r

Not enough time to do it justice. On my system, I get the following:

> system.time(aggregate <- aggregate(d[c("x", "y")], d[c("g1", "g2")], mean))
  user  system elapsed
  6.72    0.08    6.65
> system.time(dt1 <- dt[, list(x=mean(x), y=mean(y)), by = "g1,g2"])
  user  system elapsed
  3.95    0.02    3.87
> system.time(dt2 <- dt[, list(x=.Internal(mean(x)), y=.Internal(mean(y))), by = "g1,g2"])
  user  system elapsed
  0.12    0.01    0.19

This is a "many groups" case.

- Tom


More information about the datatable-help mailing list