[datatable-help] Should by=character(0) just perform as if no by was called?
Chris Neff
caneff at gmail.com
Wed Sep 21 14:18:08 CEST 2011
I sometimes in my code have multiple possible inputs into a function,
with some of them having more key columns than other inputs. I will
then often times in my code create aggregations over some of those key
columns. However, this has wound up with the case where sometimes I
try to aggregate over no columns. An example:
Suppose I have two data frames, one with "t" and "g" as keys and
another with just "t" as a key.
DT1 <- data.table( t=rep(1:10,10), g=rep(1:5, 20), x=rnorm(100),
y=rnorm(100), key=c("t","g"))
DT2 <- data.table( t=rep(1:10,10), x=rnorm(100), y=rnorm(100), key="t")
Now I make a function that aggregates all values of "t" together
F <- function(dt) {
dt[, list(x=sum(x), y=sum(y)), by=setdiff(key(dt), "t")]
}
> F(DT1)
g x y
[1,] 1 -1.829979 -3.320561
[2,] 2 -4.822312 5.136586
[3,] 3 6.326729 2.298288
[4,] 4 4.226714 3.267511
[5,] 5 -3.277370 -3.474824
> F(DT2)
Error in l[[1]] : subscript out of bounds
I would like F(DT2) to work. It should be the same as calling dt[,
list(x=sum(x), y=sum(y))]. Do you see any pathologic cases where that
wouldn't make sense as the default? As it is now, I have to check
every time if setdiff(key(dt), "t") is empty, and then do a dt call
without the by. This is messy and encourages too much copy paste
errors.
Thanks!
Chris
More information about the datatable-help
mailing list