[datatable-help] Fwd: Stackoverflow thread comparing merge times

Matthew Dowle mdowle at mdowle.plus.com
Sun Dec 19 15:26:29 CET 2010


Just posted to that stackoverflow thread showing a worst, better and
best use of data.table. Hope that gets the point across. I'm sure plyr
can be used better too so I sent Hadley the link to it.

New feature requests for data.table (any thoughts anyone?) :
https://r-forge.r-project.org/tracker/index.php?func=detail&aid=1230&group_id=240&atid=978
https://r-forge.r-project.org/tracker/index.php?func=detail&aid=1231&group_id=240&atid=978

Matthew


On Tue, 2010-12-07 at 14:37 -0500, Tom Short wrote:
> Forgot to reply to the list...
> 
> On Tue, Dec 7, 2010 at 2:07 PM, Matthew Dowle <mdowle at mdowle.plus.com> wrote:
> >
> > Does anyone have time to see if this post uses data.table correctly :
> >
> > http://stackoverflow.com/questions/4322219/whats-the-fastest-way-to-merge-join-data-frames-in-r
> 
> Not enough time to do it justice. On my system, I get the following:
> 
> > system.time(aggregate <- aggregate(d[c("x", "y")], d[c("g1", "g2")], mean))
>   user  system elapsed
>   6.72    0.08    6.65
> > system.time(dt1 <- dt[, list(x=mean(x), y=mean(y)), by = "g1,g2"])
>   user  system elapsed
>   3.95    0.02    3.87
> > system.time(dt2 <- dt[, list(x=.Internal(mean(x)), y=.Internal(mean(y))), by = "g1,g2"])
>   user  system elapsed
>   0.12    0.01    0.19
> 
> This is a "many groups" case.
> 
> - Tom
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help




More information about the datatable-help mailing list