[datatable-help] Programmatic by clauses
Short, Tom
TShort at epri.com
Mon Aug 30 22:45:44 CEST 2010
This seems to work ("data" is different than before, so the balance and
count columns are different):
> data[, lapply(.SD[, cols.to.sum, with = FALSE], sum),
+ by = as.list(by.factors)]
iquarter fico.bucket balance count
[1,] 0 25 0.1427648 1.0449715
[2,] 0 50 0.8598616 0.7946641
[3,] 0 75 0.7799311 0.6733977
[4,] 0 100 1.1240393 1.3415721
[5,] 1 25 1.6179294 1.9870932
[6,] 1 50 1.4562150 2.0651700
[7,] 1 75 1.8457541 1.6337161
[8,] 1 100 2.0330688 0.8113971
> -----Original Message-----
> From: datatable-help-bounces at lists.r-forge.r-project.org
> [mailto:datatable-help-bounces at lists.r-forge.r-project.org]
> On Behalf Of Johann Hibschman
> Sent: Monday, August 30, 2010 16:03
> To: datatable-help at lists.r-forge.r-project.org
> Subject: Re: [datatable-help] Programmatic by clauses
>
> "Short, Tom" <TShort at epri.com> writes:
>
> > Johann, how about the following:
> > [snip example]
>
> That's a good example; thanks.
>
> > Here's a data.table version:
> >
> >> data[, lapply(.SD[, cols.to.sum, with = FALSE], sum),
> > + by = lapply(aggregation.spec, function (f) f(data))]
> > iquarter fico.bucket balance count
> > [1,] 0 25 0.5506797 1.133675
> > [2,] 0 50 1.5175908 0.854553
> > [3,] 0 75 0.4627294 1.171430
> > [4,] 0 100 0.8354870 1.083211
> > [5,] 1 25 1.7311503 1.210178
> > [6,] 1 50 2.2930775 1.974759
> > [7,] 1 75 1.0477066 1.973119
> > [8,] 1 100 1.4351321 1.501291
>
> I hadn't understood .SD before; that's a very good thing to know.
>
> > I think the following should also work, but it doesn't. Note that I
> > didn't update to the very latest version of data.table, and I know
> > Matthew has changed some things that might already fix this.
> >
> >
> >> data[, lapply(.SD[, cols.to.sum, with = FALSE], sum),
> > + by = by.factors]
> > Error in `[.data.table`(data, , lapply(.SD[, cols.to.sum, with =
> > FALSE],
> > :
> > column or expression 1 of 'by' list is not internally
> type integer.
> > Do not quote column names. Example of correct use:
> > by=list(colA,month(colB),...).
>
> It still doesn't work. Unfortunately, if I want to have a
> drop-in replacement, I have to operate on the equivalent by.factors.
>
> I tried the following:
>
> dt.tmp <- cbind(data[, cols.to.sum, with=FALSE],
> data.table(by.factors))
> dt.agg <- dt.tmp[, lapply(.SD, sum), by=paste(names(by.factor),
> collapse=",")]
>
> but I got:
>
> Error in `[.data.table`(dt.tmp, , lapply(.SD, sum.na), by =
> paste(names(by), :
> by must evaluate to list
>
> I tried
>
> by.names <- paste(names(by.factor), collapse=",")
> dt.agg <- dt.tmp[, lapply(.SD, sum), by=by.names]
>
> but I got the same error. Randomly wrapping things in eval
> or evalq didn't seem to work either.
>
> Is there any chance that we could get a "less magic" version
> of the data.table extract that doesn't do anything fancy? Or
> maybe a by.with=FALSE option?
>
> I periodically try data.table, but I always run into this
> wall where I waste a few hours trying to guess how to make
> extract do what I want it to and finally give up. It's
> frustrating, it seems as if only data.table were trying to be
> less clever, it would be very useful to me.
>
>
> - Johann
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/d
atatable-help
>
More information about the datatable-help
mailing list