[datatable-help] Stackoverflow thread comparing merge times

Steve Lianoglou mailinglist.honeypot at gmail.com
Tue Dec 7 20:36:36 CET 2010


Hi,

On Tue, Dec 7, 2010 at 2:07 PM, Matthew Dowle <mdowle at mdowle.plus.com> wrote:
>
> Does anyone have time to see if this post uses data.table correctly :
>
> http://stackoverflow.com/questions/4322219/whats-the-fastest-way-to-merge-join-data-frames-in-r
>
> The  dt[, colMeans(cbind(x, y)), by="g1,g2"] bit looks wrong to me. Is
> that why it takes 131 seconds vs 2.73 for sqldf ?  Shouldn't it be
> dt[,list(mean(x),mean(y)),by="g1,g2"] ?
>
> And also the y2= bit of dt1[dt2,list(x,y1,y2=dt2$y2)] looks odd.

Don't know what's wrong with me today, but running this part of the
given example in "the obvious way" is causing data.table to error and
I'm not sure what I'm (obviously(?)) doing wrong:

set.seed(123)
N <- 1e5
d1 <- data.frame(x=sample(N,N), y1=rnorm(N))
d2 <- data.frame(x=sample(N,N), y2=rnorm(N))

d1 <- data.table(d1, key="x")
d2 <- data.table(d2, key="x")
merge(d1, d2, by="x")

Error in x[, key, with = FALSE] : incorrect number of dimensions

What am I missing?
-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact


More information about the datatable-help mailing list