[datatable-help] Programmatic by clauses
Johann Hibschman
jhibschman+r at gmail.com
Tue Aug 31 17:10:12 CEST 2010
"Short, Tom" <TShort at epri.com> writes:
> This seems to work ("data" is different than before, so the balance and
> count columns are different):
>
>> data[, lapply(.SD[, cols.to.sum, with = FALSE], sum),
> + by = as.list(by.factors)]
> iquarter fico.bucket balance count
> [1,] 0 25 0.1427648 1.0449715
> [2,] 0 50 0.8598616 0.7946641
> [3,] 0 75 0.7799311 0.6733977
> [4,] 0 100 1.1240393 1.3415721
> [5,] 1 25 1.6179294 1.9870932
> [6,] 1 50 1.4562150 2.0651700
> [7,] 1 75 1.8457541 1.6337161
> [8,] 1 100 2.0330688 0.8113971
Using as.list works for me as well, thanks.
I had to change my summary function to return NA_real_ rather than just
plain NA, but once I did that, everything seems to work.
I'm impressed. It looks to be about 10 times faster, all
considered. The actual aggregation step is something like 40 times
faster, but I have to do some extra work to get it into a format
suitable for data.table.
I would still prefer there to be a more "plain vanilla" interface to all
this. I have no idea why using "as.list" works, and that makes me
uncomfortable.
Regards,
Johann
>
>
>
>> -----Original Message-----
>> From: datatable-help-bounces at lists.r-forge.r-project.org
>> [mailto:datatable-help-bounces at lists.r-forge.r-project.org]
>> On Behalf Of Johann Hibschman
>> Sent: Monday, August 30, 2010 16:03
>> To: datatable-help at lists.r-forge.r-project.org
>> Subject: Re: [datatable-help] Programmatic by clauses
>>
>> "Short, Tom" <TShort at epri.com> writes:
>>
>> > Johann, how about the following:
>> > [snip example]
>>
>> That's a good example; thanks.
>>
>> > Here's a data.table version:
>> >
>> >> data[, lapply(.SD[, cols.to.sum, with = FALSE], sum),
>> > + by = lapply(aggregation.spec, function (f) f(data))]
>> > iquarter fico.bucket balance count
>> > [1,] 0 25 0.5506797 1.133675
>> > [2,] 0 50 1.5175908 0.854553
>> > [3,] 0 75 0.4627294 1.171430
>> > [4,] 0 100 0.8354870 1.083211
>> > [5,] 1 25 1.7311503 1.210178
>> > [6,] 1 50 2.2930775 1.974759
>> > [7,] 1 75 1.0477066 1.973119
>> > [8,] 1 100 1.4351321 1.501291
>>
>> I hadn't understood .SD before; that's a very good thing to know.
>>
>> > I think the following should also work, but it doesn't. Note that I
>> > didn't update to the very latest version of data.table, and I know
>> > Matthew has changed some things that might already fix this.
>> >
>> >
>> >> data[, lapply(.SD[, cols.to.sum, with = FALSE], sum),
>> > + by = by.factors]
>> > Error in `[.data.table`(data, , lapply(.SD[, cols.to.sum, with =
>> > FALSE],
>> > :
>> > column or expression 1 of 'by' list is not internally
>> type integer.
>> > Do not quote column names. Example of correct use:
>> > by=list(colA,month(colB),...).
>>
>> It still doesn't work. Unfortunately, if I want to have a
>> drop-in replacement, I have to operate on the equivalent by.factors.
>>
>> I tried the following:
>>
>> dt.tmp <- cbind(data[, cols.to.sum, with=FALSE],
>> data.table(by.factors))
>> dt.agg <- dt.tmp[, lapply(.SD, sum), by=paste(names(by.factor),
>> collapse=",")]
>>
>> but I got:
>>
>> Error in `[.data.table`(dt.tmp, , lapply(.SD, sum.na), by =
>> paste(names(by), :
>> by must evaluate to list
>>
>> I tried
>>
>> by.names <- paste(names(by.factor), collapse=",")
>> dt.agg <- dt.tmp[, lapply(.SD, sum), by=by.names]
>>
>> but I got the same error. Randomly wrapping things in eval
>> or evalq didn't seem to work either.
>>
>> Is there any chance that we could get a "less magic" version
>> of the data.table extract that doesn't do anything fancy? Or
>> maybe a by.with=FALSE option?
>>
>> I periodically try data.table, but I always run into this
>> wall where I waste a few hours trying to guess how to make
>> extract do what I want it to and finally give up. It's
>> frustrating, it seems as if only data.table were trying to be
>> less clever, it would be very useful to me.
>>
>>
>> - Johann
>>
>> _______________________________________________
>> datatable-help mailing list
>> datatable-help at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/d
> atatable-help
>>
More information about the datatable-help
mailing list