[datatable-help] Incrementing number in j-expression to get group index fails (bug?)

Matthew Dowle mdowle at mdowle.plus.com
Thu May 12 00:07:25 CEST 2011


Andreas,

This is changed now so that <- works as you expected. I can see that
being quite useful actually, thanks for mentioning it.

> groupInd
[1] 0
> dt[, list(z, groupInd <- groupInd + 1), by=list(x,y)]
     x y z V2
[1,] 0 0 1  1
[2,] 0 1 2  2
[3,] 0 1 4  2
[4,] 1 0 3  3
[5,] 1 0 5  3
[6,] 1 1 6  4
> groupInd
[1] 0   # global isn't updated when using <-
> dt[, list(z, groupInd <<- groupInd + 1), by=list(x,y)]
     x y z V2
[1,] 0 0 1  1
[2,] 0 1 2  2
[3,] 0 1 4  2
[4,] 1 0 3  3
[5,] 1 0 5  3
[6,] 1 1 6  4
> groupInd
[1] 4   # but it is when <<- is used.
> 

   o    j's environment is now consistently reused so
        that local variables may be set which persist
        from group to group; e.g., incrementing a group
        counter :
            DT[,list(z,groupInd<-groupInd+1),by=x]
        Thanks for Andreas Borg for reporting.

Matthew

On Wed, 2011-05-11 at 16:47 +0100, Matthew Dowle wrote:
> Hi,
> 
> With groupInd set to 0 in the global environment, I get the same. The
> first group (internally, and you don't normally need to know this) is
> specially run first. data.table uses the result of the first group to
> infer properies about the query and make some (very often, very good)
> guesses. Then it races through the remaining groups in C.
> 
> So, just use <<-, to make it clear it's the variable in the global
> environment you want to update. So, no bug :-)
> 
> > groupInd = 0
> > dt[, list(z, groupInd <<- groupInd + 1), by=list(x,y)]
>      x y z V2
> [1,] 0 0 1  1
> [2,] 0 1 2  2
> [3,] 0 1 4  2
> [4,] 1 0 3  3
> [5,] 1 0 5  3
> [6,] 1 1 6  4
> >
> 
> Matthew
> 
> 
> > Hi all,
> >
> > I have still another issue, but I promise this will be my last posting
> > for today ;-)
> >
> > Suppose you have the following table:
> >
> >  > dt <- data.table(x=c(0,0,1,0,1,1), y=c(0,1,0,1,0,1), z=1:6)
> >  > dt
> >      x y z
> > [1,] 0 0 1
> > [2,] 0 1 2
> > [3,] 1 0 3
> > [4,] 0 1 4
> > [5,] 1 0 5
> > [6,] 1 1 6
> >
> > I want to group the table by columns x and y and get for each z (which
> > is really just a sort of primary key) the index of the group it belongs
> > to. Grouping gives:
> >
> >  > dt[,z,by=list(x,y)]
> >      x y z
> > [1,] 0 0 1
> > [2,] 0 1 2
> > [3,] 0 1 4
> > [4,] 1 0 3
> > [5,] 1 0 5
> > [6,] 1 1 6
> >
> > The information I need looks like: z=1 assigned to group 1, z=2 assigned
> > to group 2, z=4 assigned to group 2 etc.
> >
> > I tried this by incrementing a counter in the j expression, but the
> > result lets me suspect that there is a bug somewhere:
> >
> >  > dt[, list(z, groupInd <- groupInd + 1), by=list(x,y)]
> >      x y z V2
> > [1,] 0 0 1  1
> > [2,] 0 1 2  1
> > [3,] 0 1 4  1
> > [4,] 1 0 3  2
> > [5,] 1 0 5  2
> > [6,] 1 1 6  3
> >
> > Incrementing the group counter only works from the second group on. Any
> > ideas?
> >
> > Best regards,
> >
> > Andreas
> >
> > --
> > Andreas Borg
> > Medizinische Informatik
> >
> > UNIVERSITÄTSMEDIZIN
> > der Johannes Gutenberg-Universität
> > Institut für Medizinische Biometrie, Epidemiologie und Informatik
> > Obere Zahlbacher Straße 69, 55131 Mainz
> > www.imbei.uni-mainz.de
> >
> > Telefon +49 (0) 6131 175062
> > E-Mail: borg at imbei.uni-mainz.de
> >
> > Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte
> > Informationen. Wenn Sie nicht der
> > richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben,
> > informieren Sie bitte sofort den
> > Absender und löschen Sie diese Mail. Das unerlaubte Kopieren sowie die
> > unbefugte Weitergabe
> > dieser Mail und der darin enthaltenen Informationen ist nicht gestattet.
> >
> > _______________________________________________
> > datatable-help mailing list
> > datatable-help at lists.r-forge.r-project.org
> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> >
> 
> 
> _______________________________________________
> datatable-help mailing list
> datatable-help at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help




More information about the datatable-help mailing list