[datatable-help] Is there a good way to do a self join in one line?
Matthew Dowle
mdowle at mdowle.plus.com
Wed Aug 3 15:17:29 CEST 2011
There is an example in ?data.table (admittedly one line) :
DT[,transform(.SD,z=sum(x)),by=y]
But, see point 2 on wiki :
http://rwiki.sciviews.org/doku.php?id=packages:cran:data.table
Maybe you could rep(sum(x),.N) by group, then cbind afterwards.
All that may be quite "hard" so the future idiom for this will simply be :
DT[,z:=sum(x),by=y]
:= is implemented in 1.6.3 but only in combination with i so far. See NEWS
for 1.6.3. := isn't implemented when grouping (yet). So, (I think) you're
stuck with cbind for the moment if speed is important as per wiki example,
otherwise transform .SD in j.
Matthew
"Chris Neff" <caneff at gmail.com> wrote in message
news:CAAuY0RV=Ot2XkVopihk6WmsyMrfcs7hrKtuXMLktiW1nrcRMdw at mail.gmail.com...
> Say I want to calculate an aggregate statistic and append it to the
> data frame all in one move. Like this:
>
> DT <- data.table(x= 1:10, y=rep(1:2,each=5))
>
> DT <- DT[, list(x, z=sum(x)), by=y]
>
> This will append the new variable z to the data frame. But what if I
> have a lot of columns, and I don't want to address them by name like I
> did there? I'd like to do something like:
>
>
> DT <- DT[, list(names(DT), z=sum(x)), by=y]
>
> but that won't work because names(DT) is a character vector not the
> parts of the list expression I want. I mean there is the following:
>
> tmp <- DT[,list(z=sum(x)), by=y]
>
> DT <- DT[tmp]
>
> but creating a temporary variable is annoying. This doesn't work:
>
> DT <- DT[DT[, list(z=sum(x)), by=y]]
>
> Thoughts?
>
> Chris
More information about the datatable-help
mailing list