[datatable-help] Top N Items [was: Environment of eval() execution for "j" appears to vary inexplicably]

Harish harishv_99 at yahoo.com
Sat Jul 10 07:24:12 CEST 2010


Thanks Matthew.

I am trying to incorporate your approach and am experimenting with a few different interfaces.  Since your approach needs multiple statements, I cannot maintain the top() interface as it is currently defined.

I will keep you posted.  It will take me a while, I think, to get to the point where I can share something with you again.

By the way, the line
   DT[,mean(C),by=list(A,B)][order(V1,decreasing=TRUE)]
is not doing quite what I intended.

Following what Excel's pivot does -- top is the top N items for the entire data set.  So for B, if the top items are c("D","E","H") based on the calculation, then for each grouping of A, the B entries are displayed in the order mentioned previously.  A's entries are displayed together and in the order of top items for A.  So A's items in order could be c("c","b","a").  So the desired result is (where original row numbers from your output are maintained):
          A    B       V1
  [1,]    c    D 710.4716
  [2,]    c    E 692.5095
  [7,]    c <NA> 514.0045
  [6,]    b    D 524.7078
  [4,]    b    E 572.1907
  [3,]    b    H 599.4201
  [9,]    b <NA> 484.2036
  [5,]    a    H 530.2149
  [8,]    a <NA> 490.4854
 [10,] <NA>    E 471.2445
 [11,] <NA> <NA> 445.4304


Nonetheless, I get the idea of what you are trying to do.  I will play around with it before I ask any questions.


Thanks,
Harish


--- On Mon, 7/5/10, Matthew Dowle <mdowle at mdowle.plus.com> wrote:

> From: Matthew Dowle <mdowle at mdowle.plus.com>
> Subject: Re: Top N Items [was: Environment of eval() execution for "j" appears to vary inexplicably]
> To: "Harish" <harishv_99 at yahoo.com>
> Cc: datatable-help at lists.r-forge.r-project.org
> Date: Monday, July 5, 2010, 12:11 AM
> 
> Thanks for this example. Interesting. This is grouping of
> grouping then,
> and making it general too. I guess speed is very important
> as you'd like
> to use 'by=' for the subgroups too rather than tapply. I
> see where
> you're coming from now.
> 
> This isn't generalised, but this is how I might approach it
> :
> 
> setkey(DT,A)  # keep top 3 A by mean(C)
> DT$A[-DT[
> head(DT[,mean(C),by=A][order(V1,decreasing=TRUE)],3),
> which=TRUE, mult="all"]] = NA
> 
> setkey(DT,B)  # keep top 3 B by mean(C)
> DT$B[-DT[
> head(DT[,mean(C),by=B][order(V1,decreasing=TRUE)],3),
> which=TRUE, mult="all"]] = NA
> 
> DT[,mean(C),by=list(A,B)][order(V1,decreasing=TRUE)]
> 
>          A   
> B       V1
>  [1,]    c    D 710.4716
>  [2,]    c    E 692.5095
>  [3,]    b    H 599.4201
>  [4,]    b    E 572.1907
>  [5,]    a    H 530.2149
>  [6,]    b    D 524.7078
>  [7,]    c <NA> 514.0045
>  [8,]    a <NA> 490.4854
>  [9,]    b <NA> 484.2036
> [10,] <NA>    E 471.2445
> [11,] <NA> <NA> 445.4304
> 
> It uses NA rather than "(Other)" to avoid building a new
> factor at the
> original DT level, saving time and memory. After you have
> the aggregate
> result you can always replace the NA with '(Other)' to make
> it pretty.
> Notice in your top() the as.character() and the %in% which
> are things to
> avoid if possible, for speed. The code above is doing the
> same thing
> really but using a fast join.
> 
> To not change DT, take a copy first, add columns rather
> than change A or
> B, or work on a local variable inside a function.
> 
> A trick to get a 'not-join' is to use the 'which' argument,
> and negative
> integer like that. Maybe data.table should have a new
> 'notjoin' argument
> to make that built-in.
> 
> Admittedly, this isn't a general interface in the way yours
> works. Its
> not a single query. The only way I can think of right now,
> to not pass
> DT into top() in yours is to use tapply inside top(). That
> would make it
> less complicated, but slower.  Maybe you can combine
> the approaches
> somehow. If you time either approach on large data I'd be
> interested in
> the results.
> 
> Hope that helps a little,
> 
> Matthew
> 
> 
> 
> On Fri, 2010-07-02 at 20:11 -0700, Harish wrote:
> > Matthew,
> > 
> > My code for top() along with a working example
> follows.
> > 
> > I had tried to think about a way to avoid passing in
> the DT, but I couldn't think of a way.  I need the
> data.table because I am creating a few queries on the entire
> table as you'll see.  I suppose if this feature was
> implemented inside the data.table package, it could be
> done.
> > 
> > I am still learning how to deal with expressions being
> passed in and manipulating it without having it being
> evaluated in my function (since it has to be done by
> data.table).  So, if you find any fundamental flaws in
> my thinking that is complicating the code, please let me
> know.
> > 
> > Also, I am new to R coding conventions.  So
> pointers are appreciated.
> > 
> > top <- function( DT, v, criteria, num=10,
> decreasing=FALSE, unique=FALSE, otherlab="(Other)" ) {
> >    qcriteria <- parse( text=paste( "list(
> V = ",
> >         
>    deparse( substitute( criteria ) ),
> >             "
> )" ) )[[ 1 ]]
> >    qlv <- bquote( list( .( substitute( v
> ) ) ) )
> >    qv <- bquote( .( substitute( v ) ) )
> > 
> >    nDec <- ifelse( decreasing, -1, 1 )
> >    dtTop <- DT[ , eval( qcriteria ) ,
> >          by=eval( qlv )
> >          ][ order( nDec * V )
> ]
> >    nItems <- min( nrow( dtTop ), num )
> >    astrTopLevels <- as.character( dtTop[
> 1:nItems, eval( qv ) ] )
> > 
> >    # Show the top names only
> >    if ( unique == TRUE )
> >       return( astrTopLevels
> )
> >    # Show the list of newly categorized
> names for the data
> >    else
> >       return( factor(
> >               
> ifelse( as.character( DT[[ deparse( substitute( v ) ) ]] )
> %in% astrTopLevels,
> >               
>    as.character( DT[[ deparse( substitute( v
> ) ) ]] ),
> >               
>    otherlab ),
> >               
> c( astrTopLevels, otherlab )
> >               
> )
> >             )
> > }
> > 
> > # Random data being created
> > DT <- data.table( A=sample( letters[1:4], 50,
> replace=TRUE, prob=c(.2, .4, .1, .1) ),
> >          B=sample(
> LETTERS[1:8], 50, replace=TRUE, prob=c(.4, .3, .1, .05, .05,
> .05, .05, .05) ),
> >          C=rnorm(50,
> mean=500, sd=150) )
> > 
> > # Get the Top A's and B's.
> > DT[ , list( meanC=mean(C) ) , by=list( topA=top( DT,
> A, mean(C), 3, decreasing=TRUE ), topB=top( DT, B, mean(C),
> 3, decreasing=TRUE ) ) ]
> > 
> > 
> > The concept is very similar to Pivot tables in
> Excel.  It has a feature where it can show the Top N
> items.  The top N in this case is computed with the
> full data.
> > 
> > I am still thinking about a nice way to implement "Top
> N" items within each category.  While I know how to get
> it done for a one-time-event, I am not sure how to do it
> nicely as a utility function.  A real-life example
> could be "I want to see top N factories in each state based
> on its volume of production (along with seeing the Others
> aggregated)."
> > 
> > Pivot tables in Excel have some other powerful
> features in it that I want to figure out how to do it
> "easily" using data.tables using helper functions.
> > 
> > My final goal is to get something similar to pivot
> tables where the data can be reshaped (similar to cast() in
> reshape package) as well.  Your comments will be highly
> helpful in pushing me along those lines.
> > 
> > 
> > Regards,
> > Harish
> > 
> > 
> > 
> > --- On Fri, 7/2/10, Matthew Dowle <mdowle at mdowle.plus.com>
> wrote:
> > 
> > > From: Matthew Dowle <mdowle at mdowle.plus.com>
> > > Subject: Re: [datatable-help] Environment of
> eval() execution for "j" appears to vary inexplicably
> > > To: "Harish" <harishv_99 at yahoo.com>
> > > Cc: datatable-help at lists.r-forge.r-project.org
> > > Date: Friday, July 2, 2010, 4:32 PM
> > > Actually, can you provide a full
> > > reproducible example of that by=top()
> > > example with dummy data ?
> > > 
> > > I'm thinking why not use an expression of columns
> directly
> > > in the by?
> > > 
> > >    by = list(..,cut(somecol,...),...)
> > > 
> > > or
> > > 
> > >    by =
> > >
> list(..,top(top_criteria,sort_order,num_items),...)
> > > 
> > > why does DT have to passed in to top basically ?
> What are
> > > some examples
> > > of top_criteria ?
> > > 
> > > Matthew
> > > 
> > > 
> > > On Fri, 2010-07-02 at 17:33 +0100, mdowle at mdowle.plus.com
> > > wrote:
> > > > Glad to hear it was that.
> > > > Yes dput is similar to dump, great, all
> sorted.
> > > > Your top() is very neat. I had to
> double-take for a
> > > second as it looks
> > > > strange to have DT repeated again inside the
> by, but
> > > it reads well now. I
> > > > can't think of any better way, looks like
> you nailed
> > > it.
> > > > Will look at the crash bug.
> > > > Matthew
> > > > 
> > > > 
> > > > > Thank you Matthew.  The issue was
> the
> > > "class" not having "data.frame".
> > > > > The bug fix you had made works
> perfectly.
> > > > >
> > > > > I had read online to use dput() to
> recreate
> > > variables for posting online.
> > > > > This, I understand, is good at times
> because
> > > there could be other aspects
> > > > > of the variables (other than the
> values) that
> > > could be contributing to the
> > > > > problem.  (I did not know about
> dump()).
> > > > >
> > > > > Having "(Other)" in the level is a by
> product of
> > > how I found the bug.  I
> > > > > was trying to create a function "top"
> which would
> > > keep the top N labels
> > > > > (where top N is measured in some
> fashion) and
> > > mark the others with
> > > > > "(Other)".  I had originally used
> the same
> > > function to recreate the
> > > > > problem on dummy data and took a dput()
> dump of
> > > the data from somewhere in
> > > > > the middle of the processing. 
> This allowed
> > > me to have "simpler" functions
> > > > > to paste on here to reproduce the
> problem without
> > > much extraneous code.
> > > > > That is why you had that the extra
> level in
> > > there.
> > > > >
> > > > > FYI, my original objective was to use
> my function
> > > top() to group on those
> > > > > values; so I would be focusing on the
> "important"
> > > values without being
> > > > > overwhelmed with the large amount of
> > > "unimportant" data.
> > > > >     DT[ , list(
> Col1, Col2),
> > > by=top( DT, top_criteria, sort_order,
> > > > > num_items_to_show ) ]
> > > > > If there is a better way to do the
> above, please
> > > comment.
> > > > >
> > > > > Thanks a bunch for all your help!
> > > > >
> > > > >
> > > > > Regards,
> > > > > Harish
> > > > >
> > > > >
> > > > > --- On Fri, 7/2/10, mdowle at mdowle.plus.com
> > > <mdowle at mdowle.plus.com>
> > > wrote:
> > > > >
> > > > >> From: mdowle at mdowle.plus.com
> > > <mdowle at mdowle.plus.com>
> > > > >> Subject: Re: [datatable-help]
> Environment of
> > > eval() execution for "j"
> > > > >> appears to vary inexplicably
> > > > >> To: "Harish" <harishv_99 at yahoo.com>
> > > > >> Cc: datatable-help at lists.r-forge.r-project.org
> > > > >> Date: Friday, July 2, 2010, 2:15
> AM
> > > > >> Note the class of A is
> "data.table"
> > > > >> in the structure() rather than
> > > > >> c("data.table","data.frame").
> > > > >>
> > > > >> Now whilst, even so, I don't fully
> understand
> > > why the error
> > > > >> is occurring,
> > > > >> if you could try creating the
> data.table
> > > using data.table()
> > > > >> to create the
> > > > >> correct structure, rather than
> structure
> > > manually, and see
> > > > >> if that fixes
> > > > >> it.  Since the structure has
> changed
> > > between 1.4.1 and
> > > > >> now, thats where
> > > > >> the '1.4.1 knocking around' might
> be coming
> > > in.
> > > > >>
> > > > >> I've noticed use of structure() in
> other
> > > threads and had
> > > > >> assumed you were
> > > > >> using dump(..,file="") as I think
> some
> > > posting guidlines
> > > > >> say. I personally
> > > > >> prefer to see the data.table() call
> to create
> > > the dummy
> > > > >> data, as I can
> > > > >> read it quicker.
> > > > >>
> > > > >> Also why do return a factor
> constructed using
> > > structure()?
> > > > >> If you want
> > > > >> extra unused levels (e.g.
> '(Other)'), then
> > > factor() has a
> > > > >> levels argument
> > > > >> for that purpose.
> > > > >>
> > > > >> Anyway, might not be that, just
> first thing
> > > to try ..
> > > > >>
> > > > >>
> > > > >> > Unfortunate.  I still get
> the
> > > error.  I did
> > > > >> the following:
> > > > >> >
> > > > >> > 1) Deleted the "data.table"
> folder in
> > > win-library
> > > > >> > 2) Started R with "--vanilla"
> parameter
> > > > >> > 3) Installed the binaries from
> R-Forge
> > > > >> > 4) Restarted R
> > > > >> > 5) Ran the test case below
> > > > >> >
> > > > >> > Would someone else also try
> the
> > > following test
> > > > >> case?  That would help
> > > > >> > isolate whether it is my
> configuration
> > > that is causing
> > > > >> the problem.
> > > > >> > Thanks.
> > > > >> >
> > > > >> > ======== Start My R Session
> ========
> > > > >> >
> > > > >> >> search()
> > > > >> >  [1] ".GlobalEnv" 
> > >    
> > > > >>    "package:data.table"
> > > "package:stats"
> > > > >> >  [4]
> > > > >>
> > >
> "package:graphics"   "package:grDevices" 
> > > > >> "package:utils"
> > > > >> >  [7]
> > > > >>
> > >
> "package:datasets"   "package:methods" 
> > > > >>   "Autoloads"
> > > > >> > [10] "package:base"
> > > > >> >> loadedNamespaces()
> > > > >> > [1] "base"   
> > >    "data.table"
> > > > >>
> "graphics"   "grDevices" 
> > > "methods"
> > > > >> > [6] "stats"   
>   "utils"
> > > > >> >> sessionInfo()
> > > > >> > R version 2.11.1 (2010-05-31)
> > > > >> > i386-pc-mingw32
> > > > >> >
> > > > >> > locale:
> > > > >> > [1] LC_COLLATE=English_United
> > > States.1252
> > > > >> > [2] LC_CTYPE=English_United
> States.1252
> > > > >> > [3]
> LC_MONETARY=English_United
> > > States.1252
> > > > >> > [4] LC_NUMERIC=C
> > > > >> > [5] LC_TIME=English_United
> States.1252
> > > > >> >
> > > > >> > attached base packages:
> > > > >> > [1] stats 
> > >    graphics 
> > > > >> grDevices utils 
> > >    datasets 
> > > > >> methods   base
> > > > >> >
> > > > >> > other attached packages:
> > > > >> > [1] data.table_1.5
> > > > >> >>
> > > > >> >
> > > > >> > ======== End My R Session
> ========
> > > > >> >
> > > > >> >
> > > > >> > ======== Example code
> ========
> > > > >> >
> > > > >> > A <- structure(list(a =
> > > structure(1:3, .Label =
> > > > >> c("A", "C", "D"), class =
> > > > >> > "factor"),
> > > > >> >     Count
> = c(4L,
> > > 8L, 1L)), .Names
> > > > >> = c("a", "Count"), class =
> > > > >> > "data.table")
> > > > >> >
> > > > >> > foo1 <- function(DT) {
> > > > >> >    dtRet <- DT[
> ,
> > > > >> >     
>    
> > > > >>    list( Count=sum( Count
> ) ),
> > > > >> >     
>    
> > > > >>    by=list(
> Category=foo2( DT, a )
> > > )
> > > > >> >       
>   ]
> > > > >> >    invisible()
> > > > >> > }
> > > > >> >
> > > > >> >
> > > > >> > foo2 <- function( DT, v )
> {
> > > > >> >    q <-
> substitute( v )
> > > > >> >
> > > > >> >    print( identical(
> q,
> > > substitute( v ) )
> > > > >> )   # TRUE as
> expected
> > > > >> >    print( DT[ 1:2,
> eval( q ) ]
> > > )
> > > > >> >    print( DT[ 1:2,
> eval(
> > > substitute( v ) ) ]
> > > > >> )
> > > > >> >    return(
> structure(1:3,
> > > .Label = c("A",
> > > > >> "C", "D", "(Other)"), class =
> > > > >> > "factor") )
> > > > >> > }
> > > > >> >
> > > > >> > foo1( A ) # Test 1
> > > > >> > foo2( A, a ) # Test 2
> > > > >> > === End code ===
> > > > >> >
> > > > >> > I get the following output:
> > > > >> >
> > > > >> >> foo1( A ) # Test 1
> > > > >> > [1] TRUE
> > > > >> > [1] A C
> > > > >> > Levels: A C D
> > > > >> > [1] A C
> > > > >> > Levels: A C D
> > > > >> > Error: evaluation nested too
> deeply:
> > > infinite
> > > > >> recursion /
> > > > >> > options(expressions=)?
> > > > >> >> foo2( A, a ) # Test 2
> > > > >> > [1] TRUE
> > > > >> > [1] A C
> > > > >> > Levels: A C D
> > > > >> > [1] A C
> > > > >> > Levels: A C D
> > > > >> > [1] A C D
> > > > >> > Levels: A C D (Other)
> > > > >> >>
> > > > >> >
> > > > >> > Note the error in the
> middle...
> > > > >> >
> > > > >> >
> > > > >> > Harish
> > > > >> >
> > > > >> > --- On Thu, 7/1/10, mdowle at mdowle.plus.com
> > > > >> <mdowle at mdowle.plus.com>
> > > > >> wrote:
> > > > >> >
> > > > >> >> From: mdowle at mdowle.plus.com
> > > > >> <mdowle at mdowle.plus.com>
> > > > >> >> Subject: Re:
> [datatable-help]
> > > Environment of
> > > > >> eval() execution for "j"
> > > > >> >> appears to vary
> inexplicably
> > > > >> >> To: "Harish" <harishv_99 at yahoo.com>
> > > > >> >> Cc: datatable-help at lists.r-forge.r-project.org
> > > > >> >> Date: Thursday, July 1,
> 2010, 2:01
> > > AM
> > > > >> >>
> > > > >> >> I've seen that before. For
> me it was
> > > because
> > > > >> version 1.4.1
> > > > >> >> of data.table
> > > > >> >> was still knocking
> around.  For
> > > me I looked at
> > > > >> >> loadedNamespaces() and it
> > > > >> >> listed data.table, but
> search() did
> > > not.
> > > > >> >>
> > > > >> >> The loop happens because
> of the
> > > particular changes
> > > > >> that
> > > > >> >> have happened
> > > > >> >> between 1.4.1 and latest
> 1.5, AND
> > > having both
> > > > >> versions
> > > > >> >> somehow visible to
> > > > >> >> R at the same time, in
> some way
> > > conflicting with
> > > > >> each
> > > > >> >> other. Or at least
> > > > >> >> thats what it was for me.
> > > > >> >>
> > > > >> >> To be sure, start R with
> --vanilla,
> > > for me on
> > > > >> ubuntu I have
> > > > >> >> to "sudo R
> > > > >> >> --vanilla" anyway because
> of
> > > permissions (which I
> > > > >> like).
> > > > >> >>
> > > > >> >> Then install.packages(...)
> to
> > > cleanly install.
> > > > >> Then restart
> > > > >> >> R. The error
> > > > >> >> should go away?
> > > > >> >>
> > > > >> >>
> > > > >> >> > Matthew,
> > > > >> >> >
> > > > >> >> > Thanks for the
> fix.  It
> > > almost works...  I
> > > > >> >> tested it on Rev 101
> binaries.
> > > > >> >> >
> > > > >> >> > I get an extra line
> of output
> > > for the test
> > > > >> case #1 I
> > > > >> >> mentioned...
> > > > >> >> >
> > > > >> >> >> foo1( A )  #
> Test 1
> > > > >> >> > [1] TRUE
> > > > >> >> > [1] A C
> > > > >> >> > Levels: A C D
> > > > >> >> > [1] A C
> > > > >> >> > Levels: A C D
> > > > >> >> > Error: evaluation
> nested too
> > > deeply:
> > > > >> infinite
> > > > >> >> recursion /
> > > > >> >> >
> options(expressions=)?
> > > > >> >> >>
> > > > >> >> >
> > > > >> >> > Please note that I
> get an error
> > > at the
> > > > >> end. 
> > > > >> >> Though, the output seems
> to
> > > > >> >> > be right.
> > > > >> >> >
> > > > >> >> >
> > > > >> >> > Regards,
> > > > >> >> > Harish
> > > > >> >> >
> > > > >> >> >
> > > > >> >> > --- On Tue, 6/29/10,
> Matthew
> > > Dowle <mdowle at mdowle.plus.com>
> > > > >> >> wrote:
> > > > >> >> >
> > > > >> >> >> From: Matthew
> Dowle <mdowle at mdowle.plus.com>
> > > > >> >> >> Subject: Re:
> > > [datatable-help] Environment
> > > > >> of
> > > > >> >> eval() execution for "j"
> > > > >> >> >> appears to vary
> > > inexplicably
> > > > >> >> >> To: "Harish"
> <harishv_99 at yahoo.com>
> > > > >> >> >> Cc: datatable-help at lists.r-forge.r-project.org
> > > > >> >> >> Date: Tuesday,
> June 29,
> > > 2010, 1:43 PM
> > > > >> >> >> Yes, that was
> reproducible,
> > > thanks.
> > > > >> >> >>
> > > > >> >> >> The last commit
> 101 fixes
> > > this one too, I
> > > > >> think.
> > > > >> >> Please
> > > > >> >> >> confirm.
> > > > >> >> >>
> > > > >> >> >> A =
> > > data.table(a=c("A","C","D"),
> > > > >> >> Count=c(4L,8L,1L))
> > > > >> >> >>
> > > > >> >> >> > foo1(A)
> > > > >> >> >> [1] TRUE
> > > > >> >> >> [1] A C
> > > > >> >> >> Levels: A C D
> > > > >> >> >> [1] A C
> > > > >> >> >> Levels: A C D
> > > > >> >> >>
> > > > >> >> >> > foo2(A,a)
> > > > >> >> >> [1] TRUE
> > > > >> >> >> [1] A C
> > > > >> >> >> Levels: A C D
> > > > >> >> >> [1] A C
> > > > >> >> >> Levels: A C D
> > > > >> >> >> [1] A C D
> > > > >> >> >> Levels: A C D
> (Other)
> > > > >> >> >> >
> > > > >> >> >>
> > > > >> >> >> Matthew
> > > > >> >> >>
> > > > >> >> >>
> > > > >> >> >> On Sat,
> 2010-06-26 at 00:28
> > > -0700, Harish
> > > > >> wrote:
> > > > >> >> >> > I am running
> into a
> > > peculiar issue
> > > > >> which
> > > > >> >> seems to be
> > > > >> >> >> related to the
> environment
> > > in which the
> > > > >> eval() is
> > > > >> >> executed
> > > > >> >> >> when passed as
> the
> > > "j".  The environment
> > > > >> of
> > > > >> >> execution
> > > > >> >> >> of the eval()
> seems to vary
> > > depending on
> > > > >> whether I
> > > > >> >> pass in a
> > > > >> >> >> variable (of
> class "name")
> > > or an
> > > > >> equivalent
> > > > >> >> expression is
> > > > >> >> >> typed inside the
> eval.
> > > > >> >> >> >
> > > > >> >> >> > === Example
> code ===
> > > > >> >> >> >
> > > > >> >> >> > A <-
> > > structure(list(a =
> > > > >> structure(1:3,
> > > > >> >> .Label =
> > > > >> >> >> c("A", "C", "D"),
> class =
> > > "factor"),
> > > > >> >> >> > 
> > >    Count = c(4L, 8L, 1L)),
> > > > >> .Names
> > > > >> >> >> = c("a",
> "Count"), class =
> > > "data.table")
> > > > >> >> >> >
> > > > >> >> >> > foo1 <-
> > > function(DT) {
> > > > >> >> >> >   
> dtRet
> > > <- DT[ ,
> > > > >> >> >> > 
>    
> > >    
> > > > >> >> >>   
> list(
> > > Count=sum( Count ) ),
> > > > >> >> >> > 
>    
> > >    
> > > > >> >> >>   
> by=list(
> > > Category=foo2( DT, a ) )
> > > > >> >> >> > 
>    
> > >     ]
> > > > >> >> >>
> >   
> > > invisible()
> > > > >> >> >> > }
> > > > >> >> >> >
> > > > >> >> >> >
> > > > >> >> >> > foo2 <-
> function(
> > > DT, v ) {
> > > > >> >> >> >   
> q <-
> > > substitute( v )
> > > > >> >> >> >
> > > > >> >> >> >   
> print(
> > > identical( q,
> > > > >> substitute( v ) )
> > > > >> >> >>
> )   # TRUE
> > > as expected
> > > > >> >> >> >   
> print(
> > > DT[ 1:2, eval( q ) ] )
> > > > >> >> >> >   
> print(
> > > DT[ 1:2, eval(
> > > > >> substitute( v ) )
> > > > >> >> ]
> > > > >> >> >> )
> > > > >> >> >> >   
> return(
> > > structure(1:3, .Label =
> > > > >> c("A",
> > > > >> >> >> "C", "D",
> "(Other)"), class
> > > = "factor")
> > > > >> )
> > > > >> >> >> > }
> > > > >> >> >> >
> > > > >> >> >> > foo1( A ) #
> Test 1
> > > > >> >> >> > foo2( A, a )
> # Test 2
> > > > >> >> >> > === End code
> ===
> > > > >> >> >> >
> > > > >> >> >> > In Test 1,
> when I run
> > > foo1(), I am
> > > > >> >> essentially
> > > > >> >> >> executing
> > > > >> >> >> >   
> foo2( A,
> > > a ) from within the
> > > > >> code of
> > > > >> >> the
> > > > >> >> >> data table.
> > > > >> >> >> >
> > > > >> >> >> > I get:
> > > > >> >> >> > [1] TRUE
> > > > >> >> >> > [1] A C
> > > > >> >> >> > Levels: A C
> D
> > > > >> >> >> > [1] A C D
> > > > >> >> >> > Levels: A C
> D
> > > > >> >> >> >
> > > > >> >> >> > Issue #1
> ==> The
> > > third print in
> > > > >> foo2() is
> > > > >> >> actually
> > > > >> >> >> returning 3 items
> when I am
> > > requesting
> > > > >> only the
> > > > >> >> first 2
> > > > >> >> >> items. 
> (Also, in my
> > > more complex
> > > > >> program, it
> > > > >> >> seemed to
> > > > >> >> >> return the data
> in
> > > alphabetical order or
> > > > >> the order
> > > > >> >> of the
> > > > >> >> >> factor levels
> rather than
> > > in the order of
> > > > >> the data
> > > > >> >> in the
> > > > >> >> >> table. 
> However, I am
> > > not able to
> > > > >> reproduce this
> > > > >> >> in a
> > > > >> >> >> simpler
> example.  I am
> > > hoping that this
> > > > >> behavior
> > > > >> >> will
> > > > >> >> >> also be rectified
> with any
> > > bug fixes you
> > > > >> make.)
> > > > >> >> >> >
> > > > >> >> >> > In Test 2, I
> run
> > > foo2() directly in
> > > > >> >> .GlobalEnv, but I
> > > > >> >> >> am passing in the
> same data
> > > that foo1()
> > > > >> would have
> > > > >> >> passed it
> > > > >> >> >> in Test 1.
> > > > >> >> >> >
> > > > >> >> >> > I get:
> > > > >> >> >> > [1] TRUE
> > > > >> >> >> > [1] A C
> > > > >> >> >> > Levels: A C
> D
> > > > >> >> >> > Error in
> eval(expr,
> > > envir, enclos) :
> > > > >> object
> > > > >> >> 'a' not
> > > > >> >> >> found
> > > > >> >> >> >
> > > > >> >> >> > Issue #2
> ==> It
> > > looks like if I
> > > > >> have an
> > > > >> >> expression
> > > > >> >> >> inside eval(), it
> is
> > > executed in a
> > > > >> different
> > > > >> >> environment as
> > > > >> >> >> the prior print
> statement
> > > where I have an
> > > > >> eval()
> > > > >> >> with just a
> > > > >> >> >> single variable.
> 
> > > Technically, I would
> > > > >> expect
> > > > >> >> both to
> > > > >> >> >> be equivalent.
> > > > >> >> >> >
> > > > >> >> >> >
> > > > >> >> >> > I hope I
> clearly
> > > explained what my
> > > > >> issues
> > > > >> >> are.
> > > > >> >> >> >
> > > > >> >> >> >
> > > > >> >> >> > Regards,
> > > > >> >> >> > Harish
> > > > >> >> >> >
> > > > >> >> >> >
> > > > >> >> >> >
> > > > >> >> >>
> >   
> > >    
> > > > >> >> >> >
> > > > >> >>
> > > _______________________________________________
> > > > >> >> >> >
> datatable-help mailing
> > > list
> > > > >> >> >> > datatable-help at lists.r-forge.r-project.org
> > > > >> >> >> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> > > > >> >> >>
> > > > >> >> >>
> > > > >> >> >>
> > > > >> >> >
> > > > >> >> >
> > > > >> >> >
> > > > >> >> >
> > > > >> >>
> > > > >> >>
> > > > >> >>
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >>
> > > > >>
> > > > >>
> > > > >
> > > > >
> > > > >
> > > > >
> > > > 
> > > > 
> > > >
> _______________________________________________
> > > > datatable-help mailing list
> > > > datatable-help at lists.r-forge.r-project.org
> > > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> > > 
> > > 
> > > 
> > 
> > 
> >       
> 
> 
> 


      


More information about the datatable-help mailing list