[datatable-help] Is there a good way to do a self join in one line?

Matthew Dowle mdowle at mdowle.plus.com
Thu Aug 11 21:29:44 CEST 2011


Ok that works (and you need 1.6.5 for that I think, for anyone else who
might be trying it in 1.6.4).
Just for completeness, ave() uses lapply split. The lapply split
paradigm is the fundamentally inefficient part. Hopefully we can get to
the second way soon!
Matthew

On Thu, 2011-08-11 at 08:53 -0400, Chris Neff wrote:
> To provide a workaround for now until grouping really happens, I've found that
> 
> DT[,z:=ave(x, y, FUN=sum)]
> 
> to be a reasonable alternative to
> 
> DT[, z:=sum(x), by=y]
> 
> until the second way is supported of course.
> 
> On 3 August 2011 09:17, Matthew Dowle <mdowle at mdowle.plus.com> wrote:
> >
> > There is an example in ?data.table (admittedly one line) :
> >
> >    DT[,transform(.SD,z=sum(x)),by=y]
> >
> > But, see point 2 on wiki :
> >
> >    http://rwiki.sciviews.org/doku.php?id=packages:cran:data.table
> >
> > Maybe you could rep(sum(x),.N) by group, then cbind afterwards.
> >
> > All that may be quite "hard" so the future idiom for this will simply be :
> >
> >    DT[,z:=sum(x),by=y]
> >
> > := is implemented in 1.6.3 but only in combination with i so far. See NEWS
> > for 1.6.3. := isn't implemented when grouping (yet).  So, (I think) you're
> > stuck with cbind for the moment if speed is important as per wiki example,
> > otherwise transform .SD in j.
> >
> > Matthew
> >
> > "Chris Neff" <caneff at gmail.com> wrote in message
> > news:CAAuY0RV=Ot2XkVopihk6WmsyMrfcs7hrKtuXMLktiW1nrcRMdw at mail.gmail.com...
> >> Say I want to calculate an aggregate statistic and append it to the
> >> data frame all in one move. Like this:
> >>
> >> DT <- data.table(x= 1:10, y=rep(1:2,each=5))
> >>
> >> DT <- DT[, list(x, z=sum(x)), by=y]
> >>
> >> This will append the new variable z to the data frame. But what if I
> >> have a lot of columns, and I don't want to address them by name like I
> >> did there? I'd like to do something like:
> >>
> >>
> >> DT <- DT[, list(names(DT), z=sum(x)), by=y]
> >>
> >> but that won't work because names(DT) is a character vector not the
> >> parts of the list expression I want. I mean there is the following:
> >>
> >> tmp <- DT[,list(z=sum(x)), by=y]
> >>
> >> DT <- DT[tmp]
> >>
> >> but creating a temporary variable is annoying.  This doesn't work:
> >>
> >> DT <- DT[DT[, list(z=sum(x)), by=y]]
> >>
> >> Thoughts?
> >>
> >> Chris
> >
> >
> >
> > _______________________________________________
> > datatable-help mailing list
> > datatable-help at lists.r-forge.r-project.org
> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> >




More information about the datatable-help mailing list