[datatable-help] Stackoverflow thread comparing merge times
Steve Lianoglou
mailinglist.honeypot at gmail.com
Tue Dec 7 20:36:36 CET 2010
Hi,
On Tue, Dec 7, 2010 at 2:07 PM, Matthew Dowle <mdowle at mdowle.plus.com> wrote:
>
> Does anyone have time to see if this post uses data.table correctly :
>
> http://stackoverflow.com/questions/4322219/whats-the-fastest-way-to-merge-join-data-frames-in-r
>
> The dt[, colMeans(cbind(x, y)), by="g1,g2"] bit looks wrong to me. Is
> that why it takes 131 seconds vs 2.73 for sqldf ? Shouldn't it be
> dt[,list(mean(x),mean(y)),by="g1,g2"] ?
>
> And also the y2= bit of dt1[dt2,list(x,y1,y2=dt2$y2)] looks odd.
Don't know what's wrong with me today, but running this part of the
given example in "the obvious way" is causing data.table to error and
I'm not sure what I'm (obviously(?)) doing wrong:
set.seed(123)
N <- 1e5
d1 <- data.frame(x=sample(N,N), y1=rnorm(N))
d2 <- data.frame(x=sample(N,N), y2=rnorm(N))
d1 <- data.table(d1, key="x")
d2 <- data.table(d2, key="x")
merge(d1, d2, by="x")
Error in x[, key, with = FALSE] : incorrect number of dimensions
What am I missing?
-steve
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
| Memorial Sloan-Kettering Cancer Center
| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
More information about the datatable-help
mailing list