[datatable-help] Environment of eval() execution for "j" appears to vary inexplicably

Harish harishv_99 at yahoo.com
Fri Jul 2 17:40:25 CEST 2010


Thank you Matthew.  The issue was the "class" not having "data.frame".  The bug fix you had made works perfectly.

I had read online to use dput() to recreate variables for posting online.  This, I understand, is good at times because there could be other aspects of the variables (other than the values) that could be contributing to the problem.  (I did not know about dump()).

Having "(Other)" in the level is a by product of how I found the bug.  I was trying to create a function "top" which would keep the top N labels (where top N is measured in some fashion) and mark the others with "(Other)".  I had originally used the same function to recreate the problem on dummy data and took a dput() dump of the data from somewhere in the middle of the processing.  This allowed me to have "simpler" functions to paste on here to reproduce the problem without much extraneous code.  That is why you had that the extra level in there.

FYI, my original objective was to use my function top() to group on those values; so I would be focusing on the "important" values without being overwhelmed with the large amount of "unimportant" data.
    DT[ , list( Col1, Col2), by=top( DT, top_criteria, sort_order, num_items_to_show ) ]
If there is a better way to do the above, please comment.

Thanks a bunch for all your help!


Regards,
Harish


--- On Fri, 7/2/10, mdowle at mdowle.plus.com <mdowle at mdowle.plus.com> wrote:

> From: mdowle at mdowle.plus.com <mdowle at mdowle.plus.com>
> Subject: Re: [datatable-help] Environment of eval() execution for "j"   appears to vary inexplicably
> To: "Harish" <harishv_99 at yahoo.com>
> Cc: datatable-help at lists.r-forge.r-project.org
> Date: Friday, July 2, 2010, 2:15 AM
> Note the class of A is "data.table"
> in the structure() rather than
> c("data.table","data.frame").
> 
> Now whilst, even so, I don't fully understand why the error
> is occurring,
> if you could try creating the data.table using data.table()
> to create the
> correct structure, rather than structure manually, and see
> if that fixes
> it.  Since the structure has changed between 1.4.1 and
> now, thats where
> the '1.4.1 knocking around' might be coming in.
> 
> I've noticed use of structure() in other threads and had
> assumed you were
> using dump(..,file="") as I think some posting guidlines
> say. I personally
> prefer to see the data.table() call to create the dummy
> data, as I can
> read it quicker.
> 
> Also why do return a factor constructed using structure()?
> If you want
> extra unused levels (e.g. '(Other)'), then factor() has a
> levels argument
> for that purpose.
> 
> Anyway, might not be that, just first thing to try ..
> 
> 
> > Unfortunate.  I still get the error.  I did
> the following:
> >
> > 1) Deleted the "data.table" folder in win-library
> > 2) Started R with "--vanilla" parameter
> > 3) Installed the binaries from R-Forge
> > 4) Restarted R
> > 5) Ran the test case below
> >
> > Would someone else also try the following test
> case?  That would help
> > isolate whether it is my configuration that is causing
> the problem.
> > Thanks.
> >
> > ======== Start My R Session ========
> >
> >> search()
> >  [1] ".GlobalEnv"     
>    "package:data.table" "package:stats"
> >  [4]
> "package:graphics"   "package:grDevices" 
> "package:utils"
> >  [7]
> "package:datasets"   "package:methods" 
>   "Autoloads"
> > [10] "package:base"
> >> loadedNamespaces()
> > [1] "base"       "data.table"
> "graphics"   "grDevices"  "methods"
> > [6] "stats"      "utils"
> >> sessionInfo()
> > R version 2.11.1 (2010-05-31)
> > i386-pc-mingw32
> >
> > locale:
> > [1] LC_COLLATE=English_United States.1252
> > [2] LC_CTYPE=English_United States.1252
> > [3] LC_MONETARY=English_United States.1252
> > [4] LC_NUMERIC=C
> > [5] LC_TIME=English_United States.1252
> >
> > attached base packages:
> > [1] stats     graphics 
> grDevices utils     datasets 
> methods   base
> >
> > other attached packages:
> > [1] data.table_1.5
> >>
> >
> > ======== End My R Session ========
> >
> >
> > ======== Example code ========
> >
> > A <- structure(list(a = structure(1:3, .Label =
> c("A", "C", "D"), class =
> > "factor"),
> >     Count = c(4L, 8L, 1L)), .Names
> = c("a", "Count"), class =
> > "data.table")
> >
> > foo1 <- function(DT) {
> >    dtRet <- DT[ ,
> >         
>    list( Count=sum( Count ) ),
> >         
>    by=list( Category=foo2( DT, a ) )
> >          ]
> >    invisible()
> > }
> >
> >
> > foo2 <- function( DT, v ) {
> >    q <- substitute( v )
> >
> >    print( identical( q, substitute( v ) )
> )   # TRUE as expected
> >    print( DT[ 1:2, eval( q ) ] )
> >    print( DT[ 1:2, eval( substitute( v ) ) ]
> )
> >    return( structure(1:3, .Label = c("A",
> "C", "D", "(Other)"), class =
> > "factor") )
> > }
> >
> > foo1( A ) # Test 1
> > foo2( A, a ) # Test 2
> > === End code ===
> >
> > I get the following output:
> >
> >> foo1( A ) # Test 1
> > [1] TRUE
> > [1] A C
> > Levels: A C D
> > [1] A C
> > Levels: A C D
> > Error: evaluation nested too deeply: infinite
> recursion /
> > options(expressions=)?
> >> foo2( A, a ) # Test 2
> > [1] TRUE
> > [1] A C
> > Levels: A C D
> > [1] A C
> > Levels: A C D
> > [1] A C D
> > Levels: A C D (Other)
> >>
> >
> > Note the error in the middle...
> >
> >
> > Harish
> >
> > --- On Thu, 7/1/10, mdowle at mdowle.plus.com
> <mdowle at mdowle.plus.com>
> wrote:
> >
> >> From: mdowle at mdowle.plus.com
> <mdowle at mdowle.plus.com>
> >> Subject: Re: [datatable-help] Environment of
> eval() execution for "j"
> >> appears to vary inexplicably
> >> To: "Harish" <harishv_99 at yahoo.com>
> >> Cc: datatable-help at lists.r-forge.r-project.org
> >> Date: Thursday, July 1, 2010, 2:01 AM
> >>
> >> I've seen that before. For me it was because
> version 1.4.1
> >> of data.table
> >> was still knocking around.  For me I looked at
> >> loadedNamespaces() and it
> >> listed data.table, but search() did not.
> >>
> >> The loop happens because of the particular changes
> that
> >> have happened
> >> between 1.4.1 and latest 1.5, AND having both
> versions
> >> somehow visible to
> >> R at the same time, in some way conflicting with
> each
> >> other. Or at least
> >> thats what it was for me.
> >>
> >> To be sure, start R with --vanilla, for me on
> ubuntu I have
> >> to "sudo R
> >> --vanilla" anyway because of permissions (which I
> like).
> >>
> >> Then install.packages(...) to cleanly install.
> Then restart
> >> R. The error
> >> should go away?
> >>
> >>
> >> > Matthew,
> >> >
> >> > Thanks for the fix.  It almost works...  I
> >> tested it on Rev 101 binaries.
> >> >
> >> > I get an extra line of output for the test
> case #1 I
> >> mentioned...
> >> >
> >> >> foo1( A )  # Test 1
> >> > [1] TRUE
> >> > [1] A C
> >> > Levels: A C D
> >> > [1] A C
> >> > Levels: A C D
> >> > Error: evaluation nested too deeply:
> infinite
> >> recursion /
> >> > options(expressions=)?
> >> >>
> >> >
> >> > Please note that I get an error at the
> end. 
> >> Though, the output seems to
> >> > be right.
> >> >
> >> >
> >> > Regards,
> >> > Harish
> >> >
> >> >
> >> > --- On Tue, 6/29/10, Matthew Dowle <mdowle at mdowle.plus.com>
> >> wrote:
> >> >
> >> >> From: Matthew Dowle <mdowle at mdowle.plus.com>
> >> >> Subject: Re: [datatable-help] Environment
> of
> >> eval() execution for "j"
> >> >> appears to vary inexplicably
> >> >> To: "Harish" <harishv_99 at yahoo.com>
> >> >> Cc: datatable-help at lists.r-forge.r-project.org
> >> >> Date: Tuesday, June 29, 2010, 1:43 PM
> >> >> Yes, that was reproducible, thanks.
> >> >>
> >> >> The last commit 101 fixes this one too, I
> think.
> >> Please
> >> >> confirm.
> >> >>
> >> >> A = data.table(a=c("A","C","D"),
> >> Count=c(4L,8L,1L))
> >> >>
> >> >> > foo1(A)
> >> >> [1] TRUE
> >> >> [1] A C
> >> >> Levels: A C D
> >> >> [1] A C
> >> >> Levels: A C D
> >> >>
> >> >> > foo2(A,a)
> >> >> [1] TRUE
> >> >> [1] A C
> >> >> Levels: A C D
> >> >> [1] A C
> >> >> Levels: A C D
> >> >> [1] A C D
> >> >> Levels: A C D (Other)
> >> >> >
> >> >>
> >> >> Matthew
> >> >>
> >> >>
> >> >> On Sat, 2010-06-26 at 00:28 -0700, Harish
> wrote:
> >> >> > I am running into a peculiar issue
> which
> >> seems to be
> >> >> related to the environment in which the
> eval() is
> >> executed
> >> >> when passed as the "j".  The environment
> of
> >> execution
> >> >> of the eval() seems to vary depending on
> whether I
> >> pass in a
> >> >> variable (of class "name") or an
> equivalent
> >> expression is
> >> >> typed inside the eval.
> >> >> >
> >> >> > === Example code ===
> >> >> >
> >> >> > A <- structure(list(a =
> structure(1:3,
> >> .Label =
> >> >> c("A", "C", "D"), class = "factor"),
> >> >> >     Count = c(4L, 8L, 1L)),
> .Names
> >> >> = c("a", "Count"), class = "data.table")
> >> >> >
> >> >> > foo1 <- function(DT) {
> >> >> >    dtRet <- DT[ ,
> >> >> >         
> >> >>    list( Count=sum( Count ) ),
> >> >> >         
> >> >>    by=list( Category=foo2( DT, a ) )
> >> >> >          ]
> >> >> >    invisible()
> >> >> > }
> >> >> >
> >> >> >
> >> >> > foo2 <- function( DT, v ) {
> >> >> >    q <- substitute( v )
> >> >> >
> >> >> >    print( identical( q,
> substitute( v ) )
> >> >> )   # TRUE as expected
> >> >> >    print( DT[ 1:2, eval( q ) ] )
> >> >> >    print( DT[ 1:2, eval(
> substitute( v ) )
> >> ]
> >> >> )
> >> >> >    return( structure(1:3, .Label =
> c("A",
> >> >> "C", "D", "(Other)"), class = "factor")
> )
> >> >> > }
> >> >> >
> >> >> > foo1( A ) # Test 1
> >> >> > foo2( A, a ) # Test 2
> >> >> > === End code ===
> >> >> >
> >> >> > In Test 1, when I run foo1(), I am
> >> essentially
> >> >> executing
> >> >> >    foo2( A, a ) from within the
> code of
> >> the
> >> >> data table.
> >> >> >
> >> >> > I get:
> >> >> > [1] TRUE
> >> >> > [1] A C
> >> >> > Levels: A C D
> >> >> > [1] A C D
> >> >> > Levels: A C D
> >> >> >
> >> >> > Issue #1 ==> The third print in
> foo2() is
> >> actually
> >> >> returning 3 items when I am requesting
> only the
> >> first 2
> >> >> items.  (Also, in my more complex
> program, it
> >> seemed to
> >> >> return the data in alphabetical order or
> the order
> >> of the
> >> >> factor levels rather than in the order of
> the data
> >> in the
> >> >> table.  However, I am not able to
> reproduce this
> >> in a
> >> >> simpler example.  I am hoping that this
> behavior
> >> will
> >> >> also be rectified with any bug fixes you
> make.)
> >> >> >
> >> >> > In Test 2, I run foo2() directly in
> >> .GlobalEnv, but I
> >> >> am passing in the same data that foo1()
> would have
> >> passed it
> >> >> in Test 1.
> >> >> >
> >> >> > I get:
> >> >> > [1] TRUE
> >> >> > [1] A C
> >> >> > Levels: A C D
> >> >> > Error in eval(expr, envir, enclos) :
> object
> >> 'a' not
> >> >> found
> >> >> >
> >> >> > Issue #2 ==> It looks like if I
> have an
> >> expression
> >> >> inside eval(), it is executed in a
> different
> >> environment as
> >> >> the prior print statement where I have an
> eval()
> >> with just a
> >> >> single variable.  Technically, I would
> expect
> >> both to
> >> >> be equivalent.
> >> >> >
> >> >> >
> >> >> > I hope I clearly explained what my
> issues
> >> are.
> >> >> >
> >> >> >
> >> >> > Regards,
> >> >> > Harish
> >> >> >
> >> >> >
> >> >> >
> >> >> >       
> >> >> >
> >> _______________________________________________
> >> >> > datatable-help mailing list
> >> >> > datatable-help at lists.r-forge.r-project.org
> >> >> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> >> >>
> >> >>
> >> >>
> >> >
> >> >
> >> >
> >> >
> >>
> >>
> >>
> >
> >
> >
> >
> 
> 
> 


      


More information about the datatable-help mailing list