[datatable-help] Programmatic by clauses
Johann Hibschman
jhibschman+r at gmail.com
Mon Aug 30 22:03:11 CEST 2010
"Short, Tom" <TShort at epri.com> writes:
> Johann, how about the following:
> [snip example]
That's a good example; thanks.
> Here's a data.table version:
>
>> data[, lapply(.SD[, cols.to.sum, with = FALSE], sum),
> + by = lapply(aggregation.spec, function (f) f(data))]
> iquarter fico.bucket balance count
> [1,] 0 25 0.5506797 1.133675
> [2,] 0 50 1.5175908 0.854553
> [3,] 0 75 0.4627294 1.171430
> [4,] 0 100 0.8354870 1.083211
> [5,] 1 25 1.7311503 1.210178
> [6,] 1 50 2.2930775 1.974759
> [7,] 1 75 1.0477066 1.973119
> [8,] 1 100 1.4351321 1.501291
I hadn't understood .SD before; that's a very good thing to know.
> I think the following should also work, but it doesn't. Note that I
> didn't update to the very latest version of data.table, and I know
> Matthew has changed some things that might already fix this.
>
>
>> data[, lapply(.SD[, cols.to.sum, with = FALSE], sum),
> + by = by.factors]
> Error in `[.data.table`(data, , lapply(.SD[, cols.to.sum, with = FALSE],
> :
> column or expression 1 of 'by' list is not internally type integer. Do
> not quote column names. Example of correct use:
> by=list(colA,month(colB),...).
It still doesn't work. Unfortunately, if I want to have a drop-in
replacement, I have to operate on the equivalent by.factors.
I tried the following:
dt.tmp <- cbind(data[, cols.to.sum, with=FALSE],
data.table(by.factors))
dt.agg <- dt.tmp[, lapply(.SD, sum), by=paste(names(by.factor),
collapse=",")]
but I got:
Error in `[.data.table`(dt.tmp, , lapply(.SD, sum.na), by = paste(names(by), :
by must evaluate to list
I tried
by.names <- paste(names(by.factor), collapse=",")
dt.agg <- dt.tmp[, lapply(.SD, sum), by=by.names]
but I got the same error. Randomly wrapping things in eval or evalq
didn't seem to work either.
Is there any chance that we could get a "less magic" version of the
data.table extract that doesn't do anything fancy? Or maybe a
by.with=FALSE option?
I periodically try data.table, but I always run into this wall where I
waste a few hours trying to guess how to make extract do what I want it
to and finally give up. It's frustrating, it seems as if only
data.table were trying to be less clever, it would be very useful to me.
- Johann
More information about the datatable-help
mailing list