[datatable-help] Is there a good way to do a self join in one line?

Matthew Dowle mdowle at mdowle.plus.com
Wed Aug 3 15:17:29 CEST 2011


There is an example in ?data.table (admittedly one line) :

    DT[,transform(.SD,z=sum(x)),by=y]

But, see point 2 on wiki :

    http://rwiki.sciviews.org/doku.php?id=packages:cran:data.table

Maybe you could rep(sum(x),.N) by group, then cbind afterwards.

All that may be quite "hard" so the future idiom for this will simply be :

    DT[,z:=sum(x),by=y]

:= is implemented in 1.6.3 but only in combination with i so far. See NEWS 
for 1.6.3. := isn't implemented when grouping (yet).  So, (I think) you're 
stuck with cbind for the moment if speed is important as per wiki example, 
otherwise transform .SD in j.

Matthew

"Chris Neff" <caneff at gmail.com> wrote in message 
news:CAAuY0RV=Ot2XkVopihk6WmsyMrfcs7hrKtuXMLktiW1nrcRMdw at mail.gmail.com...
> Say I want to calculate an aggregate statistic and append it to the
> data frame all in one move. Like this:
>
> DT <- data.table(x= 1:10, y=rep(1:2,each=5))
>
> DT <- DT[, list(x, z=sum(x)), by=y]
>
> This will append the new variable z to the data frame. But what if I
> have a lot of columns, and I don't want to address them by name like I
> did there? I'd like to do something like:
>
>
> DT <- DT[, list(names(DT), z=sum(x)), by=y]
>
> but that won't work because names(DT) is a character vector not the
> parts of the list expression I want. I mean there is the following:
>
> tmp <- DT[,list(z=sum(x)), by=y]
>
> DT <- DT[tmp]
>
> but creating a temporary variable is annoying.  This doesn't work:
>
> DT <- DT[DT[, list(z=sum(x)), by=y]]
>
> Thoughts?
>
> Chris 





More information about the datatable-help mailing list