[datatable-help] Environment of eval() execution for "j" appears to vary inexplicably

Matthew Dowle mdowle at mdowle.plus.com
Sat Jul 3 01:32:23 CEST 2010


Actually, can you provide a full reproducible example of that by=top()
example with dummy data ?

I'm thinking why not use an expression of columns directly in the by?

   by = list(..,cut(somecol,...),...)

or

   by = list(..,top(top_criteria,sort_order,num_items),...)

why does DT have to passed in to top basically ? What are some examples
of top_criteria ?

Matthew


On Fri, 2010-07-02 at 17:33 +0100, mdowle at mdowle.plus.com wrote:
> Glad to hear it was that.
> Yes dput is similar to dump, great, all sorted.
> Your top() is very neat. I had to double-take for a second as it looks
> strange to have DT repeated again inside the by, but it reads well now. I
> can't think of any better way, looks like you nailed it.
> Will look at the crash bug.
> Matthew
> 
> 
> > Thank you Matthew.  The issue was the "class" not having "data.frame".
> > The bug fix you had made works perfectly.
> >
> > I had read online to use dput() to recreate variables for posting online.
> > This, I understand, is good at times because there could be other aspects
> > of the variables (other than the values) that could be contributing to the
> > problem.  (I did not know about dump()).
> >
> > Having "(Other)" in the level is a by product of how I found the bug.  I
> > was trying to create a function "top" which would keep the top N labels
> > (where top N is measured in some fashion) and mark the others with
> > "(Other)".  I had originally used the same function to recreate the
> > problem on dummy data and took a dput() dump of the data from somewhere in
> > the middle of the processing.  This allowed me to have "simpler" functions
> > to paste on here to reproduce the problem without much extraneous code.
> > That is why you had that the extra level in there.
> >
> > FYI, my original objective was to use my function top() to group on those
> > values; so I would be focusing on the "important" values without being
> > overwhelmed with the large amount of "unimportant" data.
> >     DT[ , list( Col1, Col2), by=top( DT, top_criteria, sort_order,
> > num_items_to_show ) ]
> > If there is a better way to do the above, please comment.
> >
> > Thanks a bunch for all your help!
> >
> >
> > Regards,
> > Harish
> >
> >
> > --- On Fri, 7/2/10, mdowle at mdowle.plus.com <mdowle at mdowle.plus.com> wrote:
> >
> >> From: mdowle at mdowle.plus.com <mdowle at mdowle.plus.com>
> >> Subject: Re: [datatable-help] Environment of eval() execution for "j"
> >> appears to vary inexplicably
> >> To: "Harish" <harishv_99 at yahoo.com>
> >> Cc: datatable-help at lists.r-forge.r-project.org
> >> Date: Friday, July 2, 2010, 2:15 AM
> >> Note the class of A is "data.table"
> >> in the structure() rather than
> >> c("data.table","data.frame").
> >>
> >> Now whilst, even so, I don't fully understand why the error
> >> is occurring,
> >> if you could try creating the data.table using data.table()
> >> to create the
> >> correct structure, rather than structure manually, and see
> >> if that fixes
> >> it.  Since the structure has changed between 1.4.1 and
> >> now, thats where
> >> the '1.4.1 knocking around' might be coming in.
> >>
> >> I've noticed use of structure() in other threads and had
> >> assumed you were
> >> using dump(..,file="") as I think some posting guidlines
> >> say. I personally
> >> prefer to see the data.table() call to create the dummy
> >> data, as I can
> >> read it quicker.
> >>
> >> Also why do return a factor constructed using structure()?
> >> If you want
> >> extra unused levels (e.g. '(Other)'), then factor() has a
> >> levels argument
> >> for that purpose.
> >>
> >> Anyway, might not be that, just first thing to try ..
> >>
> >>
> >> > Unfortunate.  I still get the error.  I did
> >> the following:
> >> >
> >> > 1) Deleted the "data.table" folder in win-library
> >> > 2) Started R with "--vanilla" parameter
> >> > 3) Installed the binaries from R-Forge
> >> > 4) Restarted R
> >> > 5) Ran the test case below
> >> >
> >> > Would someone else also try the following test
> >> case?  That would help
> >> > isolate whether it is my configuration that is causing
> >> the problem.
> >> > Thanks.
> >> >
> >> > ======== Start My R Session ========
> >> >
> >> >> search()
> >> >  [1] ".GlobalEnv"     
> >>    "package:data.table" "package:stats"
> >> >  [4]
> >> "package:graphics"   "package:grDevices" 
> >> "package:utils"
> >> >  [7]
> >> "package:datasets"   "package:methods" 
> >>   "Autoloads"
> >> > [10] "package:base"
> >> >> loadedNamespaces()
> >> > [1] "base"       "data.table"
> >> "graphics"   "grDevices"  "methods"
> >> > [6] "stats"      "utils"
> >> >> sessionInfo()
> >> > R version 2.11.1 (2010-05-31)
> >> > i386-pc-mingw32
> >> >
> >> > locale:
> >> > [1] LC_COLLATE=English_United States.1252
> >> > [2] LC_CTYPE=English_United States.1252
> >> > [3] LC_MONETARY=English_United States.1252
> >> > [4] LC_NUMERIC=C
> >> > [5] LC_TIME=English_United States.1252
> >> >
> >> > attached base packages:
> >> > [1] stats     graphics 
> >> grDevices utils     datasets 
> >> methods   base
> >> >
> >> > other attached packages:
> >> > [1] data.table_1.5
> >> >>
> >> >
> >> > ======== End My R Session ========
> >> >
> >> >
> >> > ======== Example code ========
> >> >
> >> > A <- structure(list(a = structure(1:3, .Label =
> >> c("A", "C", "D"), class =
> >> > "factor"),
> >> >     Count = c(4L, 8L, 1L)), .Names
> >> = c("a", "Count"), class =
> >> > "data.table")
> >> >
> >> > foo1 <- function(DT) {
> >> >    dtRet <- DT[ ,
> >> >         
> >>    list( Count=sum( Count ) ),
> >> >         
> >>    by=list( Category=foo2( DT, a ) )
> >> >          ]
> >> >    invisible()
> >> > }
> >> >
> >> >
> >> > foo2 <- function( DT, v ) {
> >> >    q <- substitute( v )
> >> >
> >> >    print( identical( q, substitute( v ) )
> >> )   # TRUE as expected
> >> >    print( DT[ 1:2, eval( q ) ] )
> >> >    print( DT[ 1:2, eval( substitute( v ) ) ]
> >> )
> >> >    return( structure(1:3, .Label = c("A",
> >> "C", "D", "(Other)"), class =
> >> > "factor") )
> >> > }
> >> >
> >> > foo1( A ) # Test 1
> >> > foo2( A, a ) # Test 2
> >> > === End code ===
> >> >
> >> > I get the following output:
> >> >
> >> >> foo1( A ) # Test 1
> >> > [1] TRUE
> >> > [1] A C
> >> > Levels: A C D
> >> > [1] A C
> >> > Levels: A C D
> >> > Error: evaluation nested too deeply: infinite
> >> recursion /
> >> > options(expressions=)?
> >> >> foo2( A, a ) # Test 2
> >> > [1] TRUE
> >> > [1] A C
> >> > Levels: A C D
> >> > [1] A C
> >> > Levels: A C D
> >> > [1] A C D
> >> > Levels: A C D (Other)
> >> >>
> >> >
> >> > Note the error in the middle...
> >> >
> >> >
> >> > Harish
> >> >
> >> > --- On Thu, 7/1/10, mdowle at mdowle.plus.com
> >> <mdowle at mdowle.plus.com>
> >> wrote:
> >> >
> >> >> From: mdowle at mdowle.plus.com
> >> <mdowle at mdowle.plus.com>
> >> >> Subject: Re: [datatable-help] Environment of
> >> eval() execution for "j"
> >> >> appears to vary inexplicably
> >> >> To: "Harish" <harishv_99 at yahoo.com>
> >> >> Cc: datatable-help at lists.r-forge.r-project.org
> >> >> Date: Thursday, July 1, 2010, 2:01 AM
> >> >>
> >> >> I've seen that before. For me it was because
> >> version 1.4.1
> >> >> of data.table
> >> >> was still knocking around.  For me I looked at
> >> >> loadedNamespaces() and it
> >> >> listed data.table, but search() did not.
> >> >>
> >> >> The loop happens because of the particular changes
> >> that
> >> >> have happened
> >> >> between 1.4.1 and latest 1.5, AND having both
> >> versions
> >> >> somehow visible to
> >> >> R at the same time, in some way conflicting with
> >> each
> >> >> other. Or at least
> >> >> thats what it was for me.
> >> >>
> >> >> To be sure, start R with --vanilla, for me on
> >> ubuntu I have
> >> >> to "sudo R
> >> >> --vanilla" anyway because of permissions (which I
> >> like).
> >> >>
> >> >> Then install.packages(...) to cleanly install.
> >> Then restart
> >> >> R. The error
> >> >> should go away?
> >> >>
> >> >>
> >> >> > Matthew,
> >> >> >
> >> >> > Thanks for the fix.  It almost works...  I
> >> >> tested it on Rev 101 binaries.
> >> >> >
> >> >> > I get an extra line of output for the test
> >> case #1 I
> >> >> mentioned...
> >> >> >
> >> >> >> foo1( A )  # Test 1
> >> >> > [1] TRUE
> >> >> > [1] A C
> >> >> > Levels: A C D
> >> >> > [1] A C
> >> >> > Levels: A C D
> >> >> > Error: evaluation nested too deeply:
> >> infinite
> >> >> recursion /
> >> >> > options(expressions=)?
> >> >> >>
> >> >> >
> >> >> > Please note that I get an error at the
> >> end. 
> >> >> Though, the output seems to
> >> >> > be right.
> >> >> >
> >> >> >
> >> >> > Regards,
> >> >> > Harish
> >> >> >
> >> >> >
> >> >> > --- On Tue, 6/29/10, Matthew Dowle <mdowle at mdowle.plus.com>
> >> >> wrote:
> >> >> >
> >> >> >> From: Matthew Dowle <mdowle at mdowle.plus.com>
> >> >> >> Subject: Re: [datatable-help] Environment
> >> of
> >> >> eval() execution for "j"
> >> >> >> appears to vary inexplicably
> >> >> >> To: "Harish" <harishv_99 at yahoo.com>
> >> >> >> Cc: datatable-help at lists.r-forge.r-project.org
> >> >> >> Date: Tuesday, June 29, 2010, 1:43 PM
> >> >> >> Yes, that was reproducible, thanks.
> >> >> >>
> >> >> >> The last commit 101 fixes this one too, I
> >> think.
> >> >> Please
> >> >> >> confirm.
> >> >> >>
> >> >> >> A = data.table(a=c("A","C","D"),
> >> >> Count=c(4L,8L,1L))
> >> >> >>
> >> >> >> > foo1(A)
> >> >> >> [1] TRUE
> >> >> >> [1] A C
> >> >> >> Levels: A C D
> >> >> >> [1] A C
> >> >> >> Levels: A C D
> >> >> >>
> >> >> >> > foo2(A,a)
> >> >> >> [1] TRUE
> >> >> >> [1] A C
> >> >> >> Levels: A C D
> >> >> >> [1] A C
> >> >> >> Levels: A C D
> >> >> >> [1] A C D
> >> >> >> Levels: A C D (Other)
> >> >> >> >
> >> >> >>
> >> >> >> Matthew
> >> >> >>
> >> >> >>
> >> >> >> On Sat, 2010-06-26 at 00:28 -0700, Harish
> >> wrote:
> >> >> >> > I am running into a peculiar issue
> >> which
> >> >> seems to be
> >> >> >> related to the environment in which the
> >> eval() is
> >> >> executed
> >> >> >> when passed as the "j".  The environment
> >> of
> >> >> execution
> >> >> >> of the eval() seems to vary depending on
> >> whether I
> >> >> pass in a
> >> >> >> variable (of class "name") or an
> >> equivalent
> >> >> expression is
> >> >> >> typed inside the eval.
> >> >> >> >
> >> >> >> > === Example code ===
> >> >> >> >
> >> >> >> > A <- structure(list(a =
> >> structure(1:3,
> >> >> .Label =
> >> >> >> c("A", "C", "D"), class = "factor"),
> >> >> >> >     Count = c(4L, 8L, 1L)),
> >> .Names
> >> >> >> = c("a", "Count"), class = "data.table")
> >> >> >> >
> >> >> >> > foo1 <- function(DT) {
> >> >> >> >    dtRet <- DT[ ,
> >> >> >> >         
> >> >> >>    list( Count=sum( Count ) ),
> >> >> >> >         
> >> >> >>    by=list( Category=foo2( DT, a ) )
> >> >> >> >          ]
> >> >> >> >    invisible()
> >> >> >> > }
> >> >> >> >
> >> >> >> >
> >> >> >> > foo2 <- function( DT, v ) {
> >> >> >> >    q <- substitute( v )
> >> >> >> >
> >> >> >> >    print( identical( q,
> >> substitute( v ) )
> >> >> >> )   # TRUE as expected
> >> >> >> >    print( DT[ 1:2, eval( q ) ] )
> >> >> >> >    print( DT[ 1:2, eval(
> >> substitute( v ) )
> >> >> ]
> >> >> >> )
> >> >> >> >    return( structure(1:3, .Label =
> >> c("A",
> >> >> >> "C", "D", "(Other)"), class = "factor")
> >> )
> >> >> >> > }
> >> >> >> >
> >> >> >> > foo1( A ) # Test 1
> >> >> >> > foo2( A, a ) # Test 2
> >> >> >> > === End code ===
> >> >> >> >
> >> >> >> > In Test 1, when I run foo1(), I am
> >> >> essentially
> >> >> >> executing
> >> >> >> >    foo2( A, a ) from within the
> >> code of
> >> >> the
> >> >> >> data table.
> >> >> >> >
> >> >> >> > I get:
> >> >> >> > [1] TRUE
> >> >> >> > [1] A C
> >> >> >> > Levels: A C D
> >> >> >> > [1] A C D
> >> >> >> > Levels: A C D
> >> >> >> >
> >> >> >> > Issue #1 ==> The third print in
> >> foo2() is
> >> >> actually
> >> >> >> returning 3 items when I am requesting
> >> only the
> >> >> first 2
> >> >> >> items.  (Also, in my more complex
> >> program, it
> >> >> seemed to
> >> >> >> return the data in alphabetical order or
> >> the order
> >> >> of the
> >> >> >> factor levels rather than in the order of
> >> the data
> >> >> in the
> >> >> >> table.  However, I am not able to
> >> reproduce this
> >> >> in a
> >> >> >> simpler example.  I am hoping that this
> >> behavior
> >> >> will
> >> >> >> also be rectified with any bug fixes you
> >> make.)
> >> >> >> >
> >> >> >> > In Test 2, I run foo2() directly in
> >> >> .GlobalEnv, but I
> >> >> >> am passing in the same data that foo1()
> >> would have
> >> >> passed it
> >> >> >> in Test 1.
> >> >> >> >
> >> >> >> > I get:
> >> >> >> > [1] TRUE
> >> >> >> > [1] A C
> >> >> >> > Levels: A C D
> >> >> >> > Error in eval(expr, envir, enclos) :
> >> object
> >> >> 'a' not
> >> >> >> found
> >> >> >> >
> >> >> >> > Issue #2 ==> It looks like if I
> >> have an
> >> >> expression
> >> >> >> inside eval(), it is executed in a
> >> different
> >> >> environment as
> >> >> >> the prior print statement where I have an
> >> eval()
> >> >> with just a
> >> >> >> single variable.  Technically, I would
> >> expect
> >> >> both to
> >> >> >> be equivalent.
> >> >> >> >
> >> >> >> >
> >> >> >> > I hope I clearly explained what my
> >> issues
> >> >> are.
> >> >> >> >
> >> >> >> >
> >> >> >> > Regards,
> >> >> >> > Harish
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> >       
> >> >> >> >
> >> >> _______________________________________________
> >> >> >> > datatable-help mailing list
> >> >> >> > datatable-help at lists.r-forge.r-project.org
> >> >> >> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >>
> >> >>
> >> >>
> >> >
> >> >
> >> >
> >> >
> >>
> >>
> >>
> >
> >
> >
> >
> 
> 
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help




More information about the datatable-help mailing list