[datatable-help] Programmatic by clauses

Johann Hibschman jhibschman+r at gmail.com
Mon Aug 30 22:03:11 CEST 2010


"Short, Tom" <TShort at epri.com> writes:

> Johann, how about the following:
> [snip example]

That's a good example; thanks.

> Here's a data.table version:
>      
>>     data[, lapply(.SD[, cols.to.sum, with = FALSE], sum),
> +          by = lapply(aggregation.spec, function (f) f(data))]
>      iquarter fico.bucket   balance    count
> [1,]        0          25 0.5506797 1.133675
> [2,]        0          50 1.5175908 0.854553
> [3,]        0          75 0.4627294 1.171430
> [4,]        0         100 0.8354870 1.083211
> [5,]        1          25 1.7311503 1.210178
> [6,]        1          50 2.2930775 1.974759
> [7,]        1          75 1.0477066 1.973119
> [8,]        1         100 1.4351321 1.501291

I hadn't understood .SD before; that's a very good thing to know.

> I think the following should also work, but it doesn't. Note that I
> didn't update to the very latest version of data.table, and I know
> Matthew has changed some things that might already fix this.
>      
>
>>     data[, lapply(.SD[, cols.to.sum, with = FALSE], sum),
> +          by = by.factors]
> Error in `[.data.table`(data, , lapply(.SD[, cols.to.sum, with = FALSE],
> : 
>   column or expression 1 of 'by' list is not internally type integer. Do
> not quote column names. Example of correct use:
> by=list(colA,month(colB),...).

It still doesn't work.  Unfortunately, if I want to have a drop-in
replacement, I have to operate on the equivalent by.factors.

I tried the following:

  dt.tmp <- cbind(data[, cols.to.sum, with=FALSE],
    data.table(by.factors))
  dt.agg <- dt.tmp[, lapply(.SD, sum), by=paste(names(by.factor),
    collapse=",")]

but I got:

  Error in `[.data.table`(dt.tmp, , lapply(.SD, sum.na), by = paste(names(by),  : 
    by must evaluate to list

I tried

  by.names <- paste(names(by.factor), collapse=",")
  dt.agg <- dt.tmp[, lapply(.SD, sum), by=by.names]

but I got the same error.  Randomly wrapping things in eval or evalq
didn't seem to work either.

Is there any chance that we could get a "less magic" version of the
data.table extract that doesn't do anything fancy?  Or maybe a
by.with=FALSE option?

I periodically try data.table, but I always run into this wall where I
waste a few hours trying to guess how to make extract do what I want it
to and finally give up.  It's frustrating, it seems as if only
data.table were trying to be less clever, it would be very useful to me.


- Johann



More information about the datatable-help mailing list