[datatable-help] Programmatic by clauses
Short, Tom
TShort at epri.com
Thu Sep 2 23:10:01 CEST 2010
Johann, I think that "as.list" works because you need something other
than a single variable. Single variables are treated differently.
Wrapping it in brackets also works:
data[, lapply(.SD[, cols.to.sum, with = FALSE], sum),
by = {by.factors}]
What it tries to do with a single variable is turn it into
list(by.factors). I think that's unintended, but we need to check with
Matthew.
Matthew, in the following lines of [.data.table, changing "list(" to
"as.list(" would fix the problem above, but if something's a vector, it
won't work.
if (mode(bysub) %in% c("name","character")) {
# name : j may be a single unquoted column name but
it must evaluate to list so this is a convenience to users
# character: for backwards compatibility with v1.2
syntax passing single character to 'by' rather than list()
bysub =
parse(text=paste("list(",bysub,")",sep=""))[[1]]
}
This seems to work, but it feels a little kludgy:
if (mode(bysub) == "character") {
# character: for backwards compatibility with v1.2
syntax passing single character to 'by' rather than list()
bysub = parse(text=paste("list(",bysub,")",
sep=""))[[1]]
}
if (mode(bysub) == "name") {
# name : j may be a single unquoted column name but
it must evaluate to list so this is a convenience to users
bysub = parse(text=paste("if (is.list(",bysub,")) ",
bysub, " else list(", bysub, ")", sep=""))[[1]]
}
- Tom
> -----Original Message-----
> From: datatable-help-bounces at lists.r-forge.r-project.org
> [mailto:datatable-help-bounces at lists.r-forge.r-project.org]
> On Behalf Of Johann Hibschman
> Sent: Tuesday, August 31, 2010 11:10
> To: datatable-help at lists.r-forge.r-project.org
> Subject: Re: [datatable-help] Programmatic by clauses
>
> "Short, Tom" <TShort at epri.com> writes:
>
> > This seems to work ("data" is different than before, so the balance
> > and count columns are different):
> >
> >> data[, lapply(.SD[, cols.to.sum, with = FALSE], sum),
> > + by = as.list(by.factors)]
> > iquarter fico.bucket balance count
> > [1,] 0 25 0.1427648 1.0449715
> > [2,] 0 50 0.8598616 0.7946641
> > [3,] 0 75 0.7799311 0.6733977
> > [4,] 0 100 1.1240393 1.3415721
> > [5,] 1 25 1.6179294 1.9870932
> > [6,] 1 50 1.4562150 2.0651700
> > [7,] 1 75 1.8457541 1.6337161
> > [8,] 1 100 2.0330688 0.8113971
>
> Using as.list works for me as well, thanks.
>
> I had to change my summary function to return NA_real_ rather
> than just plain NA, but once I did that, everything seems to work.
>
> I'm impressed. It looks to be about 10 times faster, all
> considered. The actual aggregation step is something like 40
> times faster, but I have to do some extra work to get it into
> a format suitable for data.table.
>
> I would still prefer there to be a more "plain vanilla"
> interface to all this. I have no idea why using "as.list"
> works, and that makes me uncomfortable.
>
> Regards,
> Johann
>
> >
> >
> >
> >> -----Original Message-----
> >> From: datatable-help-bounces at lists.r-forge.r-project.org
> >> [mailto:datatable-help-bounces at lists.r-forge.r-project.org]
> >> On Behalf Of Johann Hibschman
> >> Sent: Monday, August 30, 2010 16:03
> >> To: datatable-help at lists.r-forge.r-project.org
> >> Subject: Re: [datatable-help] Programmatic by clauses
> >>
> >> "Short, Tom" <TShort at epri.com> writes:
> >>
> >> > Johann, how about the following:
> >> > [snip example]
> >>
> >> That's a good example; thanks.
> >>
> >> > Here's a data.table version:
> >> >
> >> >> data[, lapply(.SD[, cols.to.sum, with = FALSE], sum),
> >> > + by = lapply(aggregation.spec, function (f) f(data))]
> >> > iquarter fico.bucket balance count
> >> > [1,] 0 25 0.5506797 1.133675
> >> > [2,] 0 50 1.5175908 0.854553
> >> > [3,] 0 75 0.4627294 1.171430
> >> > [4,] 0 100 0.8354870 1.083211
> >> > [5,] 1 25 1.7311503 1.210178
> >> > [6,] 1 50 2.2930775 1.974759
> >> > [7,] 1 75 1.0477066 1.973119
> >> > [8,] 1 100 1.4351321 1.501291
> >>
> >> I hadn't understood .SD before; that's a very good thing to know.
> >>
> >> > I think the following should also work, but it doesn't.
> Note that I
> >> > didn't update to the very latest version of data.table,
> and I know
> >> > Matthew has changed some things that might already fix this.
> >> >
> >> >
> >> >> data[, lapply(.SD[, cols.to.sum, with = FALSE], sum),
> >> > + by = by.factors]
> >> > Error in `[.data.table`(data, , lapply(.SD[, cols.to.sum, with =
> >> > FALSE],
> >> > :
> >> > column or expression 1 of 'by' list is not internally
> >> type integer.
> >> > Do not quote column names. Example of correct use:
> >> > by=list(colA,month(colB),...).
> >>
> >> It still doesn't work. Unfortunately, if I want to have a drop-in
> >> replacement, I have to operate on the equivalent by.factors.
> >>
> >> I tried the following:
> >>
> >> dt.tmp <- cbind(data[, cols.to.sum, with=FALSE],
> >> data.table(by.factors))
> >> dt.agg <- dt.tmp[, lapply(.SD, sum), by=paste(names(by.factor),
> >> collapse=",")]
> >>
> >> but I got:
> >>
> >> Error in `[.data.table`(dt.tmp, , lapply(.SD, sum.na), by =
> >> paste(names(by), :
> >> by must evaluate to list
> >>
> >> I tried
> >>
> >> by.names <- paste(names(by.factor), collapse=",")
> >> dt.agg <- dt.tmp[, lapply(.SD, sum), by=by.names]
> >>
> >> but I got the same error. Randomly wrapping things in
> eval or evalq
> >> didn't seem to work either.
> >>
> >> Is there any chance that we could get a "less magic"
> version of the
> >> data.table extract that doesn't do anything fancy? Or maybe a
> >> by.with=FALSE option?
> >>
> >> I periodically try data.table, but I always run into this
> wall where
> >> I waste a few hours trying to guess how to make extract do what I
> >> want it to and finally give up. It's frustrating, it seems as if
> >> only data.table were trying to be less clever, it would be very
> >> useful to me.
> >>
> >>
> >> - Johann
> >>
> >> _______________________________________________
> >> datatable-help mailing list
> >> datatable-help at lists.r-forge.r-project.org
> >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/d
> > atatable-help
> >>
>
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/d
atatable-help
>
More information about the datatable-help
mailing list