[datatable-help] Stackoverflow thread comparing merge times

Matthew Dowle mdowle at mdowle.plus.com
Tue Jan 18 00:34:29 CET 2011


Steve,
The "x" problem you found in merge has been fixed now. In 1.5.2.
There was a subtle issue here, which new FAQs 2.12 and 2.13 cover.
Matthew


On Wed, 2010-12-08 at 09:07 +0000, Matthew Dowle wrote:
> Bug raised for this one :
> https://r-forge.r-project.org/tracker/index.php?func=detail&aid=1229&group_id=240&atid=975
> 
> The "x" issue was fixed in [.data.table. It wasn't a specific fix as far as 
> I remember but when the internal scoping was tidied up and made more robust. 
> Maybe this is a new one in merge.data.table when it calls [.data.table.
> 
> I don't use merge() btw, preferring d1[d2] syntax instead which may explain 
> why this got missed.
> 
> Matthew
> 
> "Tom Short" <tshort.rlists at gmail.com> wrote in message 
> news:AANLkTint_CMfpQC_jxDP51zNN8zBvdKqTs=wcrK1G6Xk at mail.gmail.com...
> On Tue, Dec 7, 2010 at 2:36 PM, Steve Lianoglou
> <mailinglist.honeypot at gmail.com> wrote:
> > Hi,
> >
> > On Tue, Dec 7, 2010 at 2:07 PM, Matthew Dowle <mdowle at mdowle.plus.com> 
> > wrote:
> >>
> >> Does anyone have time to see if this post uses data.table correctly :
> >>
> >> http://stackoverflow.com/questions/4322219/whats-the-fastest-way-to-merge-join-data-frames-in-r
> >>
> >> The dt[, colMeans(cbind(x, y)), by="g1,g2"] bit looks wrong to me. Is
> >> that why it takes 131 seconds vs 2.73 for sqldf ? Shouldn't it be
> >> dt[,list(mean(x),mean(y)),by="g1,g2"] ?
> >>
> >> And also the y2= bit of dt1[dt2,list(x,y1,y2=dt2$y2)] looks odd.
> >
> > Don't know what's wrong with me today, but running this part of the
> > given example in "the obvious way" is causing data.table to error and
> > I'm not sure what I'm (obviously(?)) doing wrong:
> >
> > set.seed(123)
> > N <- 1e5
> > d1 <- data.frame(x=sample(N,N), y1=rnorm(N))
> > d2 <- data.frame(x=sample(N,N), y2=rnorm(N))
> >
> > d1 <- data.table(d1, key="x")
> > d2 <- data.table(d2, key="x")
> > merge(d1, d2, by="x")
> >
> > Error in x[, key, with = FALSE] : incorrect number of dimensions
> >
> > What am I missing?
> 
> It's a problem with the column name "x". I thought we got rid of the
> naming issues a while ago. The following seems to work:
> 
> set.seed(123)
> N <- 1e5
> d1 <- data.frame(xx=sample(N,N), y1=rnorm(N))
> d2 <- data.frame(xx=sample(N,N), y2=rnorm(N))
> 
> d1 <- data.table(d1, key="xx")
> d2 <- data.table(d2, key="xx")
> merge(d1, d2)
> 
> Right now, I don't have time to dig further.
> 
> - Tom 
> 
> 
> 
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help




More information about the datatable-help mailing list