[datatable-help] Top N Items [was: Environment of eval()execution for "j" appears to vary inexplicably]

Mon Jul 12 19:43:44 CEST 2010

Harish, here's a simpler approach for top:

top <- function(x, by, num=10, FUN = mean, decreasing=FALSE, otherlab="(Other)" ) {
    dt <- data.table(x = x, idx = by)
    rankings <- order(dt[, FUN(x), by = idx]$V1, decreasing = decreasing)
    n <- length(levels(dt$idx))
    if (num < n)
        levels(dt$idx)[rankings[(num+1):n]] <- otherlab
    dt$idx
}
top( DT$C, DT$B, num = 3, decreasing=TRUE )

res <- DT[ , list( meanC=mean(C) ) ,
          by=list( topA=top( C, by=A, n=3, decreasing=TRUE ),
                   topB=top( C, by=B, n=3, decreasing=TRUE ) ) ]
res

Some notes are in order:

* It only works if by's levels are ordered alphabetically. If not, you'll have to use match or something to align them.
* The levels of the answer are not always ordered alphabetically.
* I haven't done much testing. NA's may cause problems. 

This approach has the advantage that you don't have to pass DT.

- Tom

> -----Original Message-----
> From: datatable-help-bounces at lists.r-forge.r-project.org 
> [mailto:datatable-help-bounces at lists.r-forge.r-project.org] 
> On Behalf Of Harish
> Sent: Saturday, July 10, 2010 01:24
> To: mdowle at mdowle.plus.com
> Cc: datatable-help at lists.r-forge.r-project.org
> Subject: Re: [datatable-help] Top N Items [was: Environment 
> of eval()execution for "j" appears to vary inexplicably]
> 
> Thanks Matthew.
> 
> I am trying to incorporate your approach and am experimenting 
> with a few different interfaces.  Since your approach needs 
> multiple statements, I cannot maintain the top() interface as 
> it is currently defined.
> 
> I will keep you posted.  It will take me a while, I think, to 
> get to the point where I can share something with you again.
> 
> By the way, the line
>    DT[,mean(C),by=list(A,B)][order(V1,decreasing=TRUE)]
> is not doing quite what I intended.
> 
> Following what Excel's pivot does -- top is the top N items 
> for the entire data set.  So for B, if the top items are 
> c("D","E","H") based on the calculation, then for each 
> grouping of A, the B entries are displayed in the order 
> mentioned previously.  A's entries are displayed together and 
> in the order of top items for A.  So A's items in order could 
> be c("c","b","a").  So the desired result is (where original 
> row numbers from your output are maintained):
>           A    B       V1
>   [1,]    c    D 710.4716
>   [2,]    c    E 692.5095
>   [7,]    c <NA> 514.0045
>   [6,]    b    D 524.7078
>   [4,]    b    E 572.1907
>   [3,]    b    H 599.4201
>   [9,]    b <NA> 484.2036
>   [5,]    a    H 530.2149
>   [8,]    a <NA> 490.4854
>  [10,] <NA>    E 471.2445
>  [11,] <NA> <NA> 445.4304
> 
> 
> Nonetheless, I get the idea of what you are trying to do.  I 
> will play around with it before I ask any questions.
> 
> 
> Thanks,
> Harish
> 
> 
> --- On Mon, 7/5/10, Matthew Dowle <mdowle at mdowle.plus.com> wrote:
> 
> > From: Matthew Dowle <mdowle at mdowle.plus.com>
> > Subject: Re: Top N Items [was: Environment of eval() 
> execution for "j" 
> > appears to vary inexplicably]
> > To: "Harish" <harishv_99 at yahoo.com>
> > Cc: datatable-help at lists.r-forge.r-project.org
> > Date: Monday, July 5, 2010, 12:11 AM
> > 
> > Thanks for this example. Interesting. This is grouping of grouping 
> > then, and making it general too. I guess speed is very important as 
> > you'd like to use 'by=' for the subgroups too rather than tapply. I 
> > see where you're coming from now.
> > 
> > This isn't generalised, but this is how I might approach it
> > :
> > 
> > setkey(DT,A)  # keep top 3 A by mean(C) DT$A[-DT[ 
> > head(DT[,mean(C),by=A][order(V1,decreasing=TRUE)],3),
> > which=TRUE, mult="all"]] = NA
> > 
> > setkey(DT,B)  # keep top 3 B by mean(C) DT$B[-DT[ 
> > head(DT[,mean(C),by=B][order(V1,decreasing=TRUE)],3),
> > which=TRUE, mult="all"]] = NA
> > 
> > DT[,mean(C),by=list(A,B)][order(V1,decreasing=TRUE)]
> > 
> >          A
> > B       V1
> >  [1,]    c    D 710.4716
> >  [2,]    c    E 692.5095
> >  [3,]    b    H 599.4201
> >  [4,]    b    E 572.1907
> >  [5,]    a    H 530.2149
> >  [6,]    b    D 524.7078
> >  [7,]    c <NA> 514.0045
> >  [8,]    a <NA> 490.4854
> >  [9,]    b <NA> 484.2036
> > [10,] <NA>    E 471.2445
> > [11,] <NA> <NA> 445.4304
> > 
> > It uses NA rather than "(Other)" to avoid building a new 
> factor at the 
> > original DT level, saving time and memory. After you have the 
> > aggregate result you can always replace the NA with 
> '(Other)' to make 
> > it pretty.
> > Notice in your top() the as.character() and the %in% which 
> are things 
> > to avoid if possible, for speed. The code above is doing the same 
> > thing really but using a fast join.
> > 
> > To not change DT, take a copy first, add columns rather 
> than change A 
> > or B, or work on a local variable inside a function.
> > 
> > A trick to get a 'not-join' is to use the 'which' argument, and 
> > negative integer like that. Maybe data.table should have a new 
> > 'notjoin' argument to make that built-in.
> > 
> > Admittedly, this isn't a general interface in the way yours 
> works. Its 
> > not a single query. The only way I can think of right now, 
> to not pass 
> > DT into top() in yours is to use tapply inside top(). That 
> would make 
> > it less complicated, but slower.  Maybe you can combine the 
> approaches 
> > somehow. If you time either approach on large data I'd be 
> interested 
> > in the results.
> > 
> > Hope that helps a little,
> > 
> > Matthew
> > 
> > 
> > 
> > On Fri, 2010-07-02 at 20:11 -0700, Harish wrote:
> > > Matthew,
> > > 
> > > My code for top() along with a working example
> > follows.
> > > 
> > > I had tried to think about a way to avoid passing in
> > the DT, but I couldn't think of a way.  I need the 
> data.table because 
> > I am creating a few queries on the entire table as you'll see.  I 
> > suppose if this feature was implemented inside the 
> data.table package, 
> > it could be done.
> > > 
> > > I am still learning how to deal with expressions being
> > passed in and manipulating it without having it being 
> evaluated in my 
> > function (since it has to be done by data.table).  So, if 
> you find any 
> > fundamental flaws in my thinking that is complicating the 
> code, please 
> > let me know.
> > > 
> > > Also, I am new to R coding conventions.  So
> > pointers are appreciated.
> > > 
> > > top <- function( DT, v, criteria, num=10,
> > decreasing=FALSE, unique=FALSE, otherlab="(Other)" ) {
> > >    qcriteria <- parse( text=paste( "list(
> > V = ",
> > >         
> >    deparse( substitute( criteria ) ),
> > >             "
> > )" ) )[[ 1 ]]
> > >    qlv <- bquote( list( .( substitute( v
> > ) ) ) )
> > >    qv <- bquote( .( substitute( v ) ) )
> > > 
> > >    nDec <- ifelse( decreasing, -1, 1 )
> > >    dtTop <- DT[ , eval( qcriteria ) ,
> > >          by=eval( qlv )
> > >          ][ order( nDec * V )
> > ]
> > >    nItems <- min( nrow( dtTop ), num )
> > >    astrTopLevels <- as.character( dtTop[
> > 1:nItems, eval( qv ) ] )
> > > 
> > >    # Show the top names only
> > >    if ( unique == TRUE )
> > >       return( astrTopLevels
> > )
> > >    # Show the list of newly categorized
> > names for the data
> > >    else
> > >       return( factor(
> > >               
> > ifelse( as.character( DT[[ deparse( substitute( v ) ) ]] ) %in% 
> > astrTopLevels,
> > >               
> >    as.character( DT[[ deparse( substitute( v
> > ) ) ]] ),
> > >               
> >    otherlab ),
> > >               
> > c( astrTopLevels, otherlab )
> > >               
> > )
> > >             )
> > > }
> > > 
> > > # Random data being created
> > > DT <- data.table( A=sample( letters[1:4], 50,
> > replace=TRUE, prob=c(.2, .4, .1, .1) ),
> > >          B=sample(
> > LETTERS[1:8], 50, replace=TRUE, prob=c(.4, .3, .1, .05, 
> .05, .05, .05, 
> > .05) ),
> > >          C=rnorm(50,
> > mean=500, sd=150) )
> > > 
> > > # Get the Top A's and B's.
> > > DT[ , list( meanC=mean(C) ) , by=list( topA=top( DT,
> > A, mean(C), 3, decreasing=TRUE ), topB=top( DT, B, mean(C), 3, 
> > decreasing=TRUE ) ) ]
> > > 
> > > 
> > > The concept is very similar to Pivot tables in
> > Excel.  It has a feature where it can show the Top N items. 
>  The top N 
> > in this case is computed with the full data.
> > > 
> > > I am still thinking about a nice way to implement "Top
> > N" items within each category.  While I know how to get it 
> done for a 
> > one-time-event, I am not sure how to do it nicely as a utility 
> > function.  A real-life example could be "I want to see top 
> N factories 
> > in each state based on its volume of production (along with 
> seeing the 
> > Others aggregated)."
> > > 
> > > Pivot tables in Excel have some other powerful
> > features in it that I want to figure out how to do it 
> "easily" using 
> > data.tables using helper functions.
> > > 
> > > My final goal is to get something similar to pivot
> > tables where the data can be reshaped (similar to cast() in reshape 
> > package) as well.  Your comments will be highly helpful in 
> pushing me 
> > along those lines.
> > > 
> > > 
> > > Regards,
> > > Harish
> > > 
> > > 
> > > 
> > > --- On Fri, 7/2/10, Matthew Dowle <mdowle at mdowle.plus.com>
> > wrote:
> > > 
> > > > From: Matthew Dowle <mdowle at mdowle.plus.com>
> > > > Subject: Re: [datatable-help] Environment of
> > eval() execution for "j" appears to vary inexplicably
> > > > To: "Harish" <harishv_99 at yahoo.com>
> > > > Cc: datatable-help at lists.r-forge.r-project.org
> > > > Date: Friday, July 2, 2010, 4:32 PM Actually, can you provide a 
> > > > full reproducible example of that by=top() example with 
> dummy data 
> > > > ?
> > > > 
> > > > I'm thinking why not use an expression of columns
> > directly
> > > > in the by?
> > > > 
> > > >    by = list(..,cut(somecol,...),...)
> > > > 
> > > > or
> > > > 
> > > >    by =
> > > >
> > list(..,top(top_criteria,sort_order,num_items),...)
> > > > 
> > > > why does DT have to passed in to top basically ?
> > What are
> > > > some examples
> > > > of top_criteria ?
> > > > 
> > > > Matthew
> > > > 
> > > > 
> > > > On Fri, 2010-07-02 at 17:33 +0100, mdowle at mdowle.plus.com
> > > > wrote:
> > > > > Glad to hear it was that.
> > > > > Yes dput is similar to dump, great, all
> > sorted.
> > > > > Your top() is very neat. I had to
> > double-take for a
> > > > second as it looks
> > > > > strange to have DT repeated again inside the
> > by, but
> > > > it reads well now. I
> > > > > can't think of any better way, looks like
> > you nailed
> > > > it.
> > > > > Will look at the crash bug.
> > > > > Matthew
> > > > > 
> > > > > 
> > > > > > Thank you Matthew.  The issue was
> > the
> > > > "class" not having "data.frame".
> > > > > > The bug fix you had made works
> > perfectly.
> > > > > >
> > > > > > I had read online to use dput() to
> > recreate
> > > > variables for posting online.
> > > > > > This, I understand, is good at times
> > because
> > > > there could be other aspects
> > > > > > of the variables (other than the
> > values) that
> > > > could be contributing to the
> > > > > > problem.  (I did not know about
> > dump()).
> > > > > >
> > > > > > Having "(Other)" in the level is a by
> > product of
> > > > how I found the bug.  I
> > > > > > was trying to create a function "top"
> > which would
> > > > keep the top N labels
> > > > > > (where top N is measured in some
> > fashion) and
> > > > mark the others with
> > > > > > "(Other)".  I had originally used
> > the same
> > > > function to recreate the
> > > > > > problem on dummy data and took a dput()
> > dump of
> > > > the data from somewhere in
> > > > > > the middle of the processing.
> > This allowed
> > > > me to have "simpler" functions
> > > > > > to paste on here to reproduce the
> > problem without
> > > > much extraneous code.
> > > > > > That is why you had that the extra
> > level in
> > > > there.
> > > > > >
> > > > > > FYI, my original objective was to use
> > my function
> > > > top() to group on those
> > > > > > values; so I would be focusing on the
> > "important"
> > > > values without being
> > > > > > overwhelmed with the large amount of
> > > > "unimportant" data.
> > > > > >     DT[ , list(
> > Col1, Col2),
> > > > by=top( DT, top_criteria, sort_order,
> > > > > > num_items_to_show ) ]
> > > > > > If there is a better way to do the
> > above, please
> > > > comment.
> > > > > >
> > > > > > Thanks a bunch for all your help!
> > > > > >
> > > > > >
> > > > > > Regards,
> > > > > > Harish
> > > > > >
> > > > > >
> > > > > > --- On Fri, 7/2/10, mdowle at mdowle.plus.com
> > > > <mdowle at mdowle.plus.com>
> > > > wrote:
> > > > > >
> > > > > >> From: mdowle at mdowle.plus.com
> > > > <mdowle at mdowle.plus.com>
> > > > > >> Subject: Re: [datatable-help]
> > Environment of
> > > > eval() execution for "j"
> > > > > >> appears to vary inexplicably
> > > > > >> To: "Harish" <harishv_99 at yahoo.com>
> > > > > >> Cc: datatable-help at lists.r-forge.r-project.org
> > > > > >> Date: Friday, July 2, 2010, 2:15
> > AM
> > > > > >> Note the class of A is
> > "data.table"
> > > > > >> in the structure() rather than 
> c("data.table","data.frame").
> > > > > >>
> > > > > >> Now whilst, even so, I don't fully
> > understand
> > > > why the error
> > > > > >> is occurring,
> > > > > >> if you could try creating the
> > data.table
> > > > using data.table()
> > > > > >> to create the
> > > > > >> correct structure, rather than
> > structure
> > > > manually, and see
> > > > > >> if that fixes
> > > > > >> it.  Since the structure has
> > changed
> > > > between 1.4.1 and
> > > > > >> now, thats where
> > > > > >> the '1.4.1 knocking around' might
> > be coming
> > > > in.
> > > > > >>
> > > > > >> I've noticed use of structure() in
> > other
> > > > threads and had
> > > > > >> assumed you were
> > > > > >> using dump(..,file="") as I think
> > some
> > > > posting guidlines
> > > > > >> say. I personally
> > > > > >> prefer to see the data.table() call
> > to create
> > > > the dummy
> > > > > >> data, as I can
> > > > > >> read it quicker.
> > > > > >>
> > > > > >> Also why do return a factor
> > constructed using
> > > > structure()?
> > > > > >> If you want
> > > > > >> extra unused levels (e.g.
> > '(Other)'), then
> > > > factor() has a
> > > > > >> levels argument
> > > > > >> for that purpose.
> > > > > >>
> > > > > >> Anyway, might not be that, just
> > first thing
> > > > to try ..
> > > > > >>
> > > > > >>
> > > > > >> > Unfortunate.  I still get
> > the
> > > > error.  I did
> > > > > >> the following:
> > > > > >> >
> > > > > >> > 1) Deleted the "data.table"
> > folder in
> > > > win-library
> > > > > >> > 2) Started R with "--vanilla"
> > parameter
> > > > > >> > 3) Installed the binaries from
> > R-Forge
> > > > > >> > 4) Restarted R
> > > > > >> > 5) Ran the test case below
> > > > > >> >
> > > > > >> > Would someone else also try
> > the
> > > > following test
> > > > > >> case?  That would help
> > > > > >> > isolate whether it is my
> > configuration
> > > > that is causing
> > > > > >> the problem.
> > > > > >> > Thanks.
> > > > > >> >
> > > > > >> > ======== Start My R Session
> > ========
> > > > > >> >
> > > > > >> >> search()
> > > > > >> >  [1] ".GlobalEnv" 
> > > >    
> > > > > >>    "package:data.table"
> > > > "package:stats"
> > > > > >> >  [4]
> > > > > >>
> > > >
> > "package:graphics"   "package:grDevices" 
> > > > > >> "package:utils"
> > > > > >> >  [7]
> > > > > >>
> > > >
> > "package:datasets"   "package:methods" 
> > > > > >>   "Autoloads"
> > > > > >> > [10] "package:base"
> > > > > >> >> loadedNamespaces()
> > > > > >> > [1] "base"
> > > >    "data.table"
> > > > > >>
> > "graphics"   "grDevices" 
> > > > "methods"
> > > > > >> > [6] "stats"
> >   "utils"
> > > > > >> >> sessionInfo()
> > > > > >> > R version 2.11.1 (2010-05-31)
> > > > > >> > i386-pc-mingw32
> > > > > >> >
> > > > > >> > locale:
> > > > > >> > [1] LC_COLLATE=English_United
> > > > States.1252
> > > > > >> > [2] LC_CTYPE=English_United
> > States.1252
> > > > > >> > [3]
> > LC_MONETARY=English_United
> > > > States.1252
> > > > > >> > [4] LC_NUMERIC=C
> > > > > >> > [5] LC_TIME=English_United
> > States.1252
> > > > > >> >
> > > > > >> > attached base packages:
> > > > > >> > [1] stats
> > > >    graphics
> > > > > >> grDevices utils
> > > >    datasets
> > > > > >> methods   base
> > > > > >> >
> > > > > >> > other attached packages:
> > > > > >> > [1] data.table_1.5
> > > > > >> >>
> > > > > >> >
> > > > > >> > ======== End My R Session
> > ========
> > > > > >> >
> > > > > >> >
> > > > > >> > ======== Example code
> > ========
> > > > > >> >
> > > > > >> > A <- structure(list(a =
> > > > structure(1:3, .Label =
> > > > > >> c("A", "C", "D"), class =
> > > > > >> > "factor"),
> > > > > >> >     Count
> > = c(4L,
> > > > 8L, 1L)), .Names
> > > > > >> = c("a", "Count"), class =
> > > > > >> > "data.table")
> > > > > >> >
> > > > > >> > foo1 <- function(DT) {
> > > > > >> >    dtRet <- DT[
> > ,
> > > > > >> >     
> >    
> > > > > >>    list( Count=sum( Count
> > ) ),
> > > > > >> >     
> >    
> > > > > >>    by=list(
> > Category=foo2( DT, a )
> > > > )
> > > > > >> >       
> >   ]
> > > > > >> >    invisible()
> > > > > >> > }
> > > > > >> >
> > > > > >> >
> > > > > >> > foo2 <- function( DT, v )
> > {
> > > > > >> >    q <-
> > substitute( v )
> > > > > >> >
> > > > > >> >    print( identical(
> > q,
> > > > substitute( v ) )
> > > > > >> )   # TRUE as
> > expected
> > > > > >> >    print( DT[ 1:2,
> > eval( q ) ]
> > > > )
> > > > > >> >    print( DT[ 1:2,
> > eval(
> > > > substitute( v ) ) ]
> > > > > >> )
> > > > > >> >    return(
> > structure(1:3,
> > > > .Label = c("A",
> > > > > >> "C", "D", "(Other)"), class =
> > > > > >> > "factor") )
> > > > > >> > }
> > > > > >> >
> > > > > >> > foo1( A ) # Test 1
> > > > > >> > foo2( A, a ) # Test 2
> > > > > >> > === End code ===
> > > > > >> >
> > > > > >> > I get the following output:
> > > > > >> >
> > > > > >> >> foo1( A ) # Test 1
> > > > > >> > [1] TRUE
> > > > > >> > [1] A C
> > > > > >> > Levels: A C D
> > > > > >> > [1] A C
> > > > > >> > Levels: A C D
> > > > > >> > Error: evaluation nested too
> > deeply:
> > > > infinite
> > > > > >> recursion /
> > > > > >> > options(expressions=)?
> > > > > >> >> foo2( A, a ) # Test 2
> > > > > >> > [1] TRUE
> > > > > >> > [1] A C
> > > > > >> > Levels: A C D
> > > > > >> > [1] A C
> > > > > >> > Levels: A C D
> > > > > >> > [1] A C D
> > > > > >> > Levels: A C D (Other)
> > > > > >> >>
> > > > > >> >
> > > > > >> > Note the error in the
> > middle...
> > > > > >> >
> > > > > >> >
> > > > > >> > Harish
> > > > > >> >
> > > > > >> > --- On Thu, 7/1/10, mdowle at mdowle.plus.com
> > > > > >> <mdowle at mdowle.plus.com>
> > > > > >> wrote:
> > > > > >> >
> > > > > >> >> From: mdowle at mdowle.plus.com
> > > > > >> <mdowle at mdowle.plus.com>
> > > > > >> >> Subject: Re:
> > [datatable-help]
> > > > Environment of
> > > > > >> eval() execution for "j"
> > > > > >> >> appears to vary
> > inexplicably
> > > > > >> >> To: "Harish" <harishv_99 at yahoo.com>
> > > > > >> >> Cc: datatable-help at lists.r-forge.r-project.org
> > > > > >> >> Date: Thursday, July 1,
> > 2010, 2:01
> > > > AM
> > > > > >> >>
> > > > > >> >> I've seen that before. For
> > me it was
> > > > because
> > > > > >> version 1.4.1
> > > > > >> >> of data.table
> > > > > >> >> was still knocking
> > around.  For
> > > > me I looked at
> > > > > >> >> loadedNamespaces() and it
> > > > > >> >> listed data.table, but
> > search() did
> > > > not.
> > > > > >> >>
> > > > > >> >> The loop happens because
> > of the
> > > > particular changes
> > > > > >> that
> > > > > >> >> have happened
> > > > > >> >> between 1.4.1 and latest
> > 1.5, AND
> > > > having both
> > > > > >> versions
> > > > > >> >> somehow visible to
> > > > > >> >> R at the same time, in
> > some way
> > > > conflicting with
> > > > > >> each
> > > > > >> >> other. Or at least
> > > > > >> >> thats what it was for me.
> > > > > >> >>
> > > > > >> >> To be sure, start R with
> > --vanilla,
> > > > for me on
> > > > > >> ubuntu I have
> > > > > >> >> to "sudo R
> > > > > >> >> --vanilla" anyway because
> > of
> > > > permissions (which I
> > > > > >> like).
> > > > > >> >>
> > > > > >> >> Then install.packages(...)
> > to
> > > > cleanly install.
> > > > > >> Then restart
> > > > > >> >> R. The error
> > > > > >> >> should go away?
> > > > > >> >>
> > > > > >> >>
> > > > > >> >> > Matthew,
> > > > > >> >> >
> > > > > >> >> > Thanks for the
> > fix.  It
> > > > almost works...  I
> > > > > >> >> tested it on Rev 101
> > binaries.
> > > > > >> >> >
> > > > > >> >> > I get an extra line
> > of output
> > > > for the test
> > > > > >> case #1 I
> > > > > >> >> mentioned...
> > > > > >> >> >
> > > > > >> >> >> foo1( A )  #
> > Test 1
> > > > > >> >> > [1] TRUE
> > > > > >> >> > [1] A C
> > > > > >> >> > Levels: A C D
> > > > > >> >> > [1] A C
> > > > > >> >> > Levels: A C D
> > > > > >> >> > Error: evaluation
> > nested too
> > > > deeply:
> > > > > >> infinite
> > > > > >> >> recursion /
> > > > > >> >> >
> > options(expressions=)?
> > > > > >> >> >>
> > > > > >> >> >
> > > > > >> >> > Please note that I
> > get an error
> > > > at the
> > > > > >> end. 
> > > > > >> >> Though, the output seems
> > to
> > > > > >> >> > be right.
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >> > Regards,
> > > > > >> >> > Harish
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >> > --- On Tue, 6/29/10,
> > Matthew
> > > > Dowle <mdowle at mdowle.plus.com>
> > > > > >> >> wrote:
> > > > > >> >> >
> > > > > >> >> >> From: Matthew
> > Dowle <mdowle at mdowle.plus.com>
> > > > > >> >> >> Subject: Re:
> > > > [datatable-help] Environment
> > > > > >> of
> > > > > >> >> eval() execution for "j"
> > > > > >> >> >> appears to vary
> > > > inexplicably
> > > > > >> >> >> To: "Harish"
> > <harishv_99 at yahoo.com>
> > > > > >> >> >> Cc: datatable-help at lists.r-forge.r-project.org
> > > > > >> >> >> Date: Tuesday,
> > June 29,
> > > > 2010, 1:43 PM
> > > > > >> >> >> Yes, that was
> > reproducible,
> > > > thanks.
> > > > > >> >> >>
> > > > > >> >> >> The last commit
> > 101 fixes
> > > > this one too, I
> > > > > >> think.
> > > > > >> >> Please
> > > > > >> >> >> confirm.
> > > > > >> >> >>
> > > > > >> >> >> A =
> > > > data.table(a=c("A","C","D"),
> > > > > >> >> Count=c(4L,8L,1L))
> > > > > >> >> >>
> > > > > >> >> >> > foo1(A)
> > > > > >> >> >> [1] TRUE
> > > > > >> >> >> [1] A C
> > > > > >> >> >> Levels: A C D
> > > > > >> >> >> [1] A C
> > > > > >> >> >> Levels: A C D
> > > > > >> >> >>
> > > > > >> >> >> > foo2(A,a)
> > > > > >> >> >> [1] TRUE
> > > > > >> >> >> [1] A C
> > > > > >> >> >> Levels: A C D
> > > > > >> >> >> [1] A C
> > > > > >> >> >> Levels: A C D
> > > > > >> >> >> [1] A C D
> > > > > >> >> >> Levels: A C D
> > (Other)
> > > > > >> >> >> >
> > > > > >> >> >>
> > > > > >> >> >> Matthew
> > > > > >> >> >>
> > > > > >> >> >>
> > > > > >> >> >> On Sat,
> > 2010-06-26 at 00:28
> > > > -0700, Harish
> > > > > >> wrote:
> > > > > >> >> >> > I am running
> > into a
> > > > peculiar issue
> > > > > >> which
> > > > > >> >> seems to be
> > > > > >> >> >> related to the
> > environment
> > > > in which the
> > > > > >> eval() is
> > > > > >> >> executed
> > > > > >> >> >> when passed as
> > the
> > > > "j".  The environment
> > > > > >> of
> > > > > >> >> execution
> > > > > >> >> >> of the eval()
> > seems to vary
> > > > depending on
> > > > > >> whether I
> > > > > >> >> pass in a
> > > > > >> >> >> variable (of
> > class "name")
> > > > or an
> > > > > >> equivalent
> > > > > >> >> expression is
> > > > > >> >> >> typed inside the
> > eval.
> > > > > >> >> >> >
> > > > > >> >> >> > === Example
> > code ===
> > > > > >> >> >> >
> > > > > >> >> >> > A <-
> > > > structure(list(a =
> > > > > >> structure(1:3,
> > > > > >> >> .Label =
> > > > > >> >> >> c("A", "C", "D"),
> > class =
> > > > "factor"),
> > > > > >> >> >> > 
> > > >    Count = c(4L, 8L, 1L)),
> > > > > >> .Names
> > > > > >> >> >> = c("a",
> > "Count"), class =
> > > > "data.table")
> > > > > >> >> >> >
> > > > > >> >> >> > foo1 <-
> > > > function(DT) {
> > > > > >> >> >> >   
> > dtRet
> > > > <- DT[ ,
> > > > > >> >> >> > 
> >    
> > > >    
> > > > > >> >> >>   
> > list(
> > > > Count=sum( Count ) ),
> > > > > >> >> >> > 
> >    
> > > >    
> > > > > >> >> >>   
> > by=list(
> > > > Category=foo2( DT, a ) )
> > > > > >> >> >> > 
> >    
> > > >     ]
> > > > > >> >> >>
> > >   
> > > > invisible()
> > > > > >> >> >> > }
> > > > > >> >> >> >
> > > > > >> >> >> >
> > > > > >> >> >> > foo2 <-
> > function(
> > > > DT, v ) {
> > > > > >> >> >> >   
> > q <-
> > > > substitute( v )
> > > > > >> >> >> >
> > > > > >> >> >> >   
> > print(
> > > > identical( q,
> > > > > >> substitute( v ) )
> > > > > >> >> >>
> > )   # TRUE
> > > > as expected
> > > > > >> >> >> >   
> > print(
> > > > DT[ 1:2, eval( q ) ] )
> > > > > >> >> >> >   
> > print(
> > > > DT[ 1:2, eval(
> > > > > >> substitute( v ) )
> > > > > >> >> ]
> > > > > >> >> >> )
> > > > > >> >> >> >   
> > return(
> > > > structure(1:3, .Label =
> > > > > >> c("A",
> > > > > >> >> >> "C", "D",
> > "(Other)"), class
> > > > = "factor")
> > > > > >> )
> > > > > >> >> >> > }
> > > > > >> >> >> >
> > > > > >> >> >> > foo1( A ) #
> > Test 1
> > > > > >> >> >> > foo2( A, a )
> > # Test 2
> > > > > >> >> >> > === End code
> > ===
> > > > > >> >> >> >
> > > > > >> >> >> > In Test 1,
> > when I run
> > > > foo1(), I am
> > > > > >> >> essentially
> > > > > >> >> >> executing
> > > > > >> >> >> >   
> > foo2( A,
> > > > a ) from within the
> > > > > >> code of
> > > > > >> >> the
> > > > > >> >> >> data table.
> > > > > >> >> >> >
> > > > > >> >> >> > I get:
> > > > > >> >> >> > [1] TRUE
> > > > > >> >> >> > [1] A C
> > > > > >> >> >> > Levels: A C
> > D
> > > > > >> >> >> > [1] A C D
> > > > > >> >> >> > Levels: A C
> > D
> > > > > >> >> >> >
> > > > > >> >> >> > Issue #1
> > ==> The
> > > > third print in
> > > > > >> foo2() is
> > > > > >> >> actually
> > > > > >> >> >> returning 3 items
> > when I am
> > > > requesting
> > > > > >> only the
> > > > > >> >> first 2
> > > > > >> >> >> items.
> > (Also, in my
> > > > more complex
> > > > > >> program, it
> > > > > >> >> seemed to
> > > > > >> >> >> return the data
> > in
> > > > alphabetical order or
> > > > > >> the order
> > > > > >> >> of the
> > > > > >> >> >> factor levels
> > rather than
> > > > in the order of
> > > > > >> the data
> > > > > >> >> in the
> > > > > >> >> >> table.
> > However, I am
> > > > not able to
> > > > > >> reproduce this
> > > > > >> >> in a
> > > > > >> >> >> simpler
> > example.  I am
> > > > hoping that this
> > > > > >> behavior
> > > > > >> >> will
> > > > > >> >> >> also be rectified
> > with any
> > > > bug fixes you
> > > > > >> make.)
> > > > > >> >> >> >
> > > > > >> >> >> > In Test 2, I
> > run
> > > > foo2() directly in
> > > > > >> >> .GlobalEnv, but I
> > > > > >> >> >> am passing in the
> > same data
> > > > that foo1()
> > > > > >> would have
> > > > > >> >> passed it
> > > > > >> >> >> in Test 1.
> > > > > >> >> >> >
> > > > > >> >> >> > I get:
> > > > > >> >> >> > [1] TRUE
> > > > > >> >> >> > [1] A C
> > > > > >> >> >> > Levels: A C
> > D
> > > > > >> >> >> > Error in
> > eval(expr,
> > > > envir, enclos) :
> > > > > >> object
> > > > > >> >> 'a' not
> > > > > >> >> >> found
> > > > > >> >> >> >
> > > > > >> >> >> > Issue #2
> > ==> It
> > > > looks like if I
> > > > > >> have an
> > > > > >> >> expression
> > > > > >> >> >> inside eval(), it
> > is
> > > > executed in a
> > > > > >> different
> > > > > >> >> environment as
> > > > > >> >> >> the prior print
> > statement
> > > > where I have an
> > > > > >> eval()
> > > > > >> >> with just a
> > > > > >> >> >> single variable.
> > 
> > > > Technically, I would
> > > > > >> expect
> > > > > >> >> both to
> > > > > >> >> >> be equivalent.
> > > > > >> >> >> >
> > > > > >> >> >> >
> > > > > >> >> >> > I hope I
> > clearly
> > > > explained what my
> > > > > >> issues
> > > > > >> >> are.
> > > > > >> >> >> >
> > > > > >> >> >> >
> > > > > >> >> >> > Regards,
> > > > > >> >> >> > Harish
> > > > > >> >> >> >
> > > > > >> >> >> >
> > > > > >> >> >> >
> > > > > >> >> >>
> > >   
> > > >    
> > > > > >> >> >> >
> > > > > >> >>
> > > > _______________________________________________
> > > > > >> >> >> >
> > datatable-help mailing
> > > > list
> > > > > >> >> >> > datatable-help at lists.r-forge.r-project.org
> > > > > >> >> >> > 
> https://lists.r-forge.r-project.org/cgi-bin/mailman/l
> > > > > >> >> >> > istinfo/datatable-help
> > > > > >> >> >>
> > > > > >> >> >>
> > > > > >> >> >>
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >>
> > > > > >> >>
> > > > > >> >>
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > 
> > > > > 
> > > > >
> > _______________________________________________
> > > > > datatable-help mailing list
> > > > > datatable-help at lists.r-forge.r-project.org
> > > > > 
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/dat
> > > > > atable-help
> > > > 
> > > > 
> > > > 
> > > 
> > > 
> > >       
> > 
> > 
> > 
> 
> 
>       
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/d
atatable-help
>