[datatable-help] Should by=character(0) just perform as if no by was called?

Chris Neff caneff at gmail.com
Wed Sep 21 14:18:08 CEST 2011


I sometimes in my code have multiple possible inputs into a function,
with some of them having more key columns than other inputs.  I will
then often times in my code create aggregations over some of those key
columns.  However, this has wound up with the case where sometimes I
try to aggregate over no columns. An example:

Suppose I have two data frames, one with "t" and "g" as keys and
another with just "t" as a key.

DT1 <- data.table( t=rep(1:10,10), g=rep(1:5, 20), x=rnorm(100),
y=rnorm(100), key=c("t","g"))
DT2 <- data.table( t=rep(1:10,10), x=rnorm(100), y=rnorm(100), key="t")


Now I make a function that aggregates all values of "t" together

F <- function(dt) {
 dt[, list(x=sum(x), y=sum(y)), by=setdiff(key(dt), "t")]
}


> F(DT1)
     g         x         y
[1,] 1 -1.829979 -3.320561
[2,] 2 -4.822312  5.136586
[3,] 3  6.326729  2.298288
[4,] 4  4.226714  3.267511
[5,] 5 -3.277370 -3.474824

> F(DT2)
Error in l[[1]] : subscript out of bounds


I would like F(DT2) to work. It should be the same as calling dt[,
list(x=sum(x), y=sum(y))].  Do you see any pathologic cases where that
wouldn't make sense as the default? As it is now, I have to check
every time if setdiff(key(dt), "t")  is empty, and then do a dt call
without the by. This is messy and encourages too much copy paste
errors.

Thanks!
Chris


More information about the datatable-help mailing list