[datatable-help] Expressions in "by" criteria (again)

Matthew Dowle mdowle at mdowle.plus.com
Sat Jul 3 01:58:33 CEST 2010


c.listquote looks very complicated.  Mike shouldn't need to do that at
this stage. My gut tells me there some fundamental misunderstanding,
somewhere. Maybe its all the fruit.

Is Mike's data really _sorted_ by Apples column, then by Bananas column,
then by Kiwi column then by Pineapples column then by Prunes
column, ...?  Whats in those columns?  I can't see any data or anything
reproducible.  When we 'by' we aim for that 'by' to be in the same order
as the key.  That implies a key of 20+ columns long.  Doesn't seem
right.  I've never needed a key that long.

Surely Mike needs _one_ fruit column, which will likely be the 2nd
column of a key,  then a 3rd column which is "yield" or some
measurement.

To add more fruit, you add more rows, not more columns. Like a database.

Matthew


On Fri, 2010-07-02 at 10:13 -0700, Harish wrote:
> Mike,
> 
> Matthew is right.  Here is a function that might help you transition from your current state to where you need to get to quickly.
> 
> I started creating a function for other purposes that might be useful to you.  I described the usage below.
> 
> ========================
> 
> # Concatenate all given arguments into a quote of a list()
> # Arguments can be any of:
> #    1) an expression that returns a valid value when evaluated in calling
> #       environment.
> #    2) a character vector which will be treated as text inside list(...)
> #    3) a quote of a list
> #    4) a list() directly given in the argument
> # Returns a quote of a list
> c.listquote <- function( ... ) {
>    
>    args <- as.list( match.call()[ -1 ] )
>    lstquote <- list( as.symbol( "list" ) );
>    for ( i in args ) {
>       # Evaluate expression in parent eviron to see what it refers to
>       if ( class( i ) == "name" || ( class( i ) == "call" && i[[1]] != "list" ) ) {
>          i <- eval( substitute( i ), sys.frame( sys.parent() ) )
>       }
>       if ( class( i ) == "call" && i[[1]] == "list" ) {
>          lstquote <- c( lstquote, as.list( i )[ -1 ] )
>       }
>       else if ( class( i ) == "character" )
>       {
>          for ( chr in i ) {
>             lstquote <- c( lstquote, list( parse( text=chr )[[1]] ) )
>          }
>       }
>       else
>          stop( paste( "[", deparse( substitute( i ) ), "] Unknown class [", class( i ), "] or is not a list()", sep="" ) )
>    }
>    return( as.call( lstquote ) )
> }
> 
> ========================
> 
> IMPORTANT: If you find any bugs in this or find ways to improve it, please let me know.
> 
> The usage is as follows:
> 
> my.fields <- c("Apples","Bananas","Coconuts","Dragonfruits","Pomelos")
> q <- c.listquote( my.fields )
> DT[ , Col1, by=eval( q ) ]
> DT[ , q ]
> 
> The advantage of the function is that you can also easily add fields through a variety of ways...
> 
> foo <- function() {
>    return( quote( list( Orange ) ) )
> }
> DT[ , eval( c.listquote( q, foo(), list( Pear ), "Peach", c( "New1", "New2=form" ) ) ) ]
> 
> 
> Hope this helps.
> 
> 
> Regards,
> Harish
> 
> 
> --- On Fri, 7/2/10, mdowle at mdowle.plus.com <mdowle at mdowle.plus.com> wrote:
> 
> > From: mdowle at mdowle.plus.com <mdowle at mdowle.plus.com>
> > Subject: Re: [datatable-help] Expressions in "by" criteria (again)
> > To: "Mike Sandfort" <cute_moniker at yahoo.com>
> > Cc: datatable-help at lists.r-forge.r-project.org
> > Date: Friday, July 2, 2010, 9:49 AM
> > Quick answer is it needs to be this
> > way :
> > 
> >    my.fields =
> > quote(list(Apples,Bananas,...))
> >    DT[,sum(NumericField),by=eval(my.fields)]
> > 
> > Also some bugs were just fixed in this area so you may need
> > latest 1.5
> > from r-forge for this.
> > 
> > Having said that its sometimes easier coding to use a flat
> > format (i.e.
> > have a single column 'fruit') then "[,...,by=fruit]". There
> > was another
> > thread showing examples of long to wide taking care of NAs
> > etc, search for
> > 'wide'.
> > 
> > HTH, thanks for the interest,
> > 
> > Matthew
> > 
> > 
> > > Hi,
> > >
> > > I suspect my question is similar to Harish's "Question
> > #2" from 6/18.
> > > Suppose
> > > I have a data.table with many fields and have a large
> > subset of fields I
> > > need to include
> > > in several expressions. Ordinarily, I would create
> > (once) a vector of
> > > names of the fields
> > > in my subset:
> > > my.fields <-
> > c("Apples","Bananas","Coconuts","Dragonfruits",...,"Pomelos")
> > >     [where the whole data frame has many more
> > fields, including
> > > "Broccoli","Cabbages",...]
> > >
> > > Then I can re-use the my.fields vector when
> > extracting subsets, creating
> > > plots, aggregating with
> > > ddply(), etc. The problem is that I can't figure out
> > how to
> > > (re)use my.fields to aggregate a
> > > data.table.
> > >
> > >
> > DT[,sum(NumericField),by=(Apples,Bananas,Coconuts,Dragonfruits,...,Pomelos)]
> > > will work.
> > > However,
> > > DT[,sum(NumericField),by=my.fields]
> > > won't work, nor will any combination of paste(),
> > list(), eval(), quote(),
> > > deparse(), etc. applied
> > > to my.fields (at least I haven't found one yet).
> > >
> > > I know this is probably more an R-language issue, but
> > since it's come up
> > > in my work with
> > > the (excellent) data.table package, I thought I would
> > ask here.
> > >
> > > Thanks!
> > > Mike S.
> > >
> > >
> > >
> > > _______________________________________________
> > > datatable-help mailing list
> > > datatable-help at lists.r-forge.r-project.org
> > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> > >
> > 
> > 
> > _______________________________________________
> > datatable-help mailing list
> > datatable-help at lists.r-forge.r-project.org
> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> > 
> 
> 
>       




More information about the datatable-help mailing list