[datatable-help] Expressions in "by" criteria (again)

Sat Jul 3 04:22:42 CEST 2010

Mike, I am glad that you are well on your way.

Matthew, I am intrigued by your view that c.listquote() is complex. I agree that it is, but I had to come up with it to solve a slightly different problem.  Maybe you could share some of your thoughts on how else I could do similar things.

For example, I had a situation where I had to execute...

DT[ , list( A_min=min(A), A_max=max(A), B_min=min(B), B_max =max(B), blah blah, Other1, Other2 ) ]

I wanted to avoid typing in the whole list because the code would become a nightmare.  The list had to keep changing a little based on the situation.  And there was no easy way that I could find for me to concatenate items to a quoted list like I can to a vector of strings using c().

If I were to use c.listquote(), then I can do something as follows:

minmax <- function( name ) {
   strMin <- paste( name, "_min=min(", name, ")", sep="" )
   strMax <- paste( name, "_max=max(", name, ")", sep="" )
   return( c( strMin, strMax ) )
}

longlist <- function() {
   c( minmax( "A" ), minmax( "B" ), minmax( "C" ) )
}

DT[ , eval( c.listquote( longlist(), list( Other1, Other2 ) ) ) ]
DT[ , eval( c.listquote( longlist(), list( Other3 ) ) ) ]

Essentially, I wanted to be able to create the query based on other arguments or parameters.  This function allows me to have that flexibility.

How would you recommend that I deal with a situation like this?

Regards,
Harish

--- On Fri, 7/2/10, Mike Sandfort <cute_moniker at yahoo.com> wrote:

> From: Mike Sandfort <cute_moniker at yahoo.com>
> Subject: Re: [datatable-help] Expressions in "by" criteria (again)
> To: mdowle at mdowle.plus.com, "Harish" <harishv_99 at yahoo.com>
> Cc: datatable-help at lists.r-forge.r-project.org
> Date: Friday, July 2, 2010, 6:38 PM
> Yikes. Seems like this fruit salad
> has gone a bit too far.
> 
> c.listquote might look complicated but, as I emailed Harish
> off-list (sorry),
> it's exactly what I was looking for. The point you raise is
> an excellent one -- 
> the ability to use expressions in "by" does make the tool
> much more flexible in
> ways I hadn't thought about. My point was only that many
> commonly-used R functions
> encourage the user to keep collections of fields stored in
> vectors and lists. The fact
> that I didn't have a tool to shoehorn those vectors and
> lists into an expression (without
> a bunch of repetitive typing) was why I emailed the list.
> Harish's code does exactly that
> vector/list -> expression conversion that I needed.
> 
> As far as my need for lists of variables, there are lots of
> reasons to keep them around.
> If you need to bulk-convert a set of character fields to
> factors, for example, it's handy to be able to
> say
> 
> my.factors = <Vector of Field Names>
> idx <- match(my.factors,names(df))
> df[,idx] <- lapply(df[,idx],as.factor)
> 
> One may also have sets of factors which are relevant to
> different kinds of analysis.
> Working with invoice records, one might have
> customer-related fields, product-related
> fields, business-unit related fields, etc. Depending on the
> sort of analysis one wants to
> perform, one might only have need to aggregate across a
> particular subset of factors.
> Having my.geo.factors, my.cust.factors, my.prod.factors,
> etc. reduces typing and makes
> the code easier to debug -- particularly when the number of
> field names becomes very large.
> 
> Thanks to both of you for your help in working this out.
> And from now on I'll stick to "A","B","C" for my field
> names when I email.
> 
> Mike
> 
> 
> 
> 
> ----- Original Message ----
> From: Matthew Dowle <mdowle at mdowle.plus.com>
> To: Harish <harishv_99 at yahoo.com>
> Cc: Mike Sandfort <cute_moniker at yahoo.com>;
> datatable-help at lists.r-forge.r-project.org
> Sent: Fri, July 2, 2010 7:58:33 PM
> Subject: Re: [datatable-help] Expressions in "by" criteria
> (again)
> 
> 
> c.listquote looks very complicated.  Mike shouldn't
> need to do that at
> this stage. My gut tells me there some fundamental
> misunderstanding,
> somewhere. Maybe its all the fruit.
> 
> Is Mike's data really _sorted_ by Apples column, then by
> Bananas column,
> then by Kiwi column then by Pineapples column then by
> Prunes
> column, ...?  Whats in those columns?  I can't
> see any data or anything
> reproducible.  When we 'by' we aim for that 'by' to be
> in the same order
> as the key.  That implies a key of 20+ columns
> long.  Doesn't seem
> right.  I've never needed a key that long.
> 
> Surely Mike needs _one_ fruit column, which will likely be
> the 2nd
> column of a key,  then a 3rd column which is "yield"
> or some
> measurement.
> 
> To add more fruit, you add more rows, not more columns.
> Like a database.
> 
> Matthew
> 
> 
> On Fri, 2010-07-02 at 10:13 -0700, Harish wrote:
> > Mike,
> > 
> > Matthew is right.  Here is a function that might
> help you transition from your current state to where you
> need to get to quickly.
> > 
> > I started creating a function for other purposes that
> might be useful to you.  I described the usage below.
> > 
> > ========================
> > 
> > # Concatenate all given arguments into a quote of a
> list()
> > # Arguments can be any of:
> > #    1) an expression that returns a valid
> value when evaluated in calling
> > #       environment.
> > #    2) a character vector which will be
> treated as text inside list(...)
> > #    3) a quote of a list
> > #    4) a list() directly given in the
> argument
> > # Returns a quote of a list
> > c.listquote <- function( ... ) {
> >    
> >    args <- as.list( match.call()[ -1 ] )
> >    lstquote <- list( as.symbol( "list" )
> );
> >    for ( i in args ) {
> >       # Evaluate expression
> in parent eviron to see what it refers to
> >       if ( class( i ) ==
> "name" || ( class( i ) == "call" && i[[1]] != "list"
> ) ) {
> >          i <- eval(
> substitute( i ), sys.frame( sys.parent() ) )
> >       }
> >       if ( class( i ) ==
> "call" && i[[1]] == "list" ) {
> >          lstquote <- c(
> lstquote, as.list( i )[ -1 ] )
> >       }
> >       else if ( class( i ) ==
> "character" )
> >       {
> >          for ( chr in i ) {
> >         
>    lstquote <- c( lstquote, list( parse(
> text=chr )[[1]] ) )
> >          }
> >       }
> >       else
> >          stop( paste( "[",
> deparse( substitute( i ) ), "] Unknown class [", class( i ),
> "] or is not a list()", sep="" ) )
> >    }
> >    return( as.call( lstquote ) )
> > }
> > 
> > ========================
> > 
> > IMPORTANT: If you find any bugs in this or find ways
> to improve it, please let me know.
> > 
> > The usage is as follows:
> > 
> > my.fields <-
> c("Apples","Bananas","Coconuts","Dragonfruits","Pomelos")
> > q <- c.listquote( my.fields )
> > DT[ , Col1, by=eval( q ) ]
> > DT[ , q ]
> > 
> > The advantage of the function is that you can also
> easily add fields through a variety of ways...
> > 
> > foo <- function() {
> >    return( quote( list( Orange ) ) )
> > }
> > DT[ , eval( c.listquote( q, foo(), list( Pear ),
> "Peach", c( "New1", "New2=form" ) ) ) ]
> > 
> > 
> > Hope this helps.
> > 
> > 
> > Regards,
> > Harish
> > 
> > 
> > --- On Fri, 7/2/10, mdowle at mdowle.plus.com
> <mdowle at mdowle.plus.com>
> wrote:
> > 
> > > From: mdowle at mdowle.plus.com
> <mdowle at mdowle.plus.com>
> > > Subject: Re: [datatable-help] Expressions in "by"
> criteria (again)
> > > To: "Mike Sandfort" <cute_moniker at yahoo.com>
> > > Cc: datatable-help at lists.r-forge.r-project.org
> > > Date: Friday, July 2, 2010, 9:49 AM
> > > Quick answer is it needs to be this
> > > way :
> > > 
> > >    my.fields =
> > > quote(list(Apples,Bananas,...))
> > >   
> DT[,sum(NumericField),by=eval(my.fields)]
> > > 
> > > Also some bugs were just fixed in this area so
> you may need
> > > latest 1.5
> > > from r-forge for this.
> > > 
> > > Having said that its sometimes easier coding to
> use a flat
> > > format (i.e.
> > > have a single column 'fruit') then
> "[,...,by=fruit]". There
> > > was another
> > > thread showing examples of long to wide taking
> care of NAs
> > > etc, search for
> > > 'wide'.
> > > 
> > > HTH, thanks for the interest,
> > > 
> > > Matthew
> > > 
> > > 
> > > > Hi,
> > > >
> > > > I suspect my question is similar to Harish's
> "Question
> > > #2" from 6/18.
> > > > Suppose
> > > > I have a data.table with many fields and
> have a large
> > > subset of fields I
> > > > need to include
> > > > in several expressions. Ordinarily, I would
> create
> > > (once) a vector of
> > > > names of the fields
> > > > in my subset:
> > > > my.fields <-
> > >
> c("Apples","Bananas","Coconuts","Dragonfruits",...,"Pomelos")
> > > >     [where the whole
> data frame has many more
> > > fields, including
> > > > "Broccoli","Cabbages",...]
> > > >
> > > > Then I can re-use the my.fields vector when
> > > extracting subsets, creating
> > > > plots, aggregating with
> > > > ddply(), etc. The problem is that I can't
> figure out
> > > how to
> > > > (re)use my.fields to aggregate a
> > > > data.table.
> > > >
> > > >
> > >
> DT[,sum(NumericField),by=(Apples,Bananas,Coconuts,Dragonfruits,...,Pomelos)]
> > > > will work.
> > > > However,
> > > > DT[,sum(NumericField),by=my.fields]
> > > > won't work, nor will any combination of
> paste(),
> > > list(), eval(), quote(),
> > > > deparse(), etc. applied
> > > > to my.fields (at least I haven't found one
> yet).
> > > >
> > > > I know this is probably more an R-language
> issue, but
> > > since it's come up
> > > > in my work with
> > > > the (excellent) data.table package, I
> thought I would
> > > ask here.
> > > >
> > > > Thanks!
> > > > Mike S.
> > > >
> > > >
> > > >
> > > >
> _______________________________________________
> > > > datatable-help mailing list
> > > > datatable-help at lists.r-forge.r-project.org
> > > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> > > >
> > > 
> > > 
> > > _______________________________________________
> > > datatable-help mailing list
> > > datatable-help at lists.r-forge.r-project.org
> > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> > > 
> > 
> > 
> >      
> 
> 
>       
>