[datatable-help] Programmatic by clauses

Short, Tom TShort at epri.com
Mon Aug 30 22:45:44 CEST 2010


This seems to work ("data" is different than before, so the balance and
count columns are different):

>     data[, lapply(.SD[, cols.to.sum, with = FALSE], sum),
+          by = as.list(by.factors)]
     iquarter fico.bucket   balance     count
[1,]        0          25 0.1427648 1.0449715
[2,]        0          50 0.8598616 0.7946641
[3,]        0          75 0.7799311 0.6733977
[4,]        0         100 1.1240393 1.3415721
[5,]        1          25 1.6179294 1.9870932
[6,]        1          50 1.4562150 2.0651700
[7,]        1          75 1.8457541 1.6337161
[8,]        1         100 2.0330688 0.8113971

 

> -----Original Message-----
> From: datatable-help-bounces at lists.r-forge.r-project.org 
> [mailto:datatable-help-bounces at lists.r-forge.r-project.org] 
> On Behalf Of Johann Hibschman
> Sent: Monday, August 30, 2010 16:03
> To: datatable-help at lists.r-forge.r-project.org
> Subject: Re: [datatable-help] Programmatic by clauses
> 
> "Short, Tom" <TShort at epri.com> writes:
> 
> > Johann, how about the following:
> > [snip example]
> 
> That's a good example; thanks.
> 
> > Here's a data.table version:
> >      
> >>     data[, lapply(.SD[, cols.to.sum, with = FALSE], sum),
> > +          by = lapply(aggregation.spec, function (f) f(data))]
> >      iquarter fico.bucket   balance    count
> > [1,]        0          25 0.5506797 1.133675
> > [2,]        0          50 1.5175908 0.854553
> > [3,]        0          75 0.4627294 1.171430
> > [4,]        0         100 0.8354870 1.083211
> > [5,]        1          25 1.7311503 1.210178
> > [6,]        1          50 2.2930775 1.974759
> > [7,]        1          75 1.0477066 1.973119
> > [8,]        1         100 1.4351321 1.501291
> 
> I hadn't understood .SD before; that's a very good thing to know.
> 
> > I think the following should also work, but it doesn't. Note that I 
> > didn't update to the very latest version of data.table, and I know 
> > Matthew has changed some things that might already fix this.
> >      
> >
> >>     data[, lapply(.SD[, cols.to.sum, with = FALSE], sum),
> > +          by = by.factors]
> > Error in `[.data.table`(data, , lapply(.SD[, cols.to.sum, with = 
> > FALSE],
> > : 
> >   column or expression 1 of 'by' list is not internally 
> type integer. 
> > Do not quote column names. Example of correct use:
> > by=list(colA,month(colB),...).
> 
> It still doesn't work.  Unfortunately, if I want to have a 
> drop-in replacement, I have to operate on the equivalent by.factors.
> 
> I tried the following:
> 
>   dt.tmp <- cbind(data[, cols.to.sum, with=FALSE],
>     data.table(by.factors))
>   dt.agg <- dt.tmp[, lapply(.SD, sum), by=paste(names(by.factor),
>     collapse=",")]
> 
> but I got:
> 
>   Error in `[.data.table`(dt.tmp, , lapply(.SD, sum.na), by = 
> paste(names(by),  : 
>     by must evaluate to list
> 
> I tried
> 
>   by.names <- paste(names(by.factor), collapse=",")
>   dt.agg <- dt.tmp[, lapply(.SD, sum), by=by.names]
> 
> but I got the same error.  Randomly wrapping things in eval 
> or evalq didn't seem to work either.
> 
> Is there any chance that we could get a "less magic" version 
> of the data.table extract that doesn't do anything fancy?  Or 
> maybe a by.with=FALSE option?
> 
> I periodically try data.table, but I always run into this 
> wall where I waste a few hours trying to guess how to make 
> extract do what I want it to and finally give up.  It's 
> frustrating, it seems as if only data.table were trying to be 
> less clever, it would be very useful to me.
> 
> 
> - Johann
> 
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/d
atatable-help
> 


More information about the datatable-help mailing list