[datatable-help] In 1.9.2, By with factor column do not work the same as in 1.8.10

Arunkumar Srinivasan aragorn168b at gmail.com
Sun Apr 6 11:46:07 CEST 2014


This is now fixed with commit #1256 from v1.9.3. Thanks Christophe for filing #5437 and Paul for following up. I'll close it now. Please write back if you find something's not right still.

Arun

From: Paul Johnson pauljohn32 at gmail.com
Reply: Paul Johnson pauljohn32 at gmail.com
Date: March 31, 2014 at 2:03:48 AM
To: DERVIEUX Christophe christophe.dervieux at rte-france.com
Cc: datatable-help at lists.r-forge.r-project.org datatable-help at lists.r-forge.r-project.org
Subject:  Re: [datatable-help] In 1.9.2, By with factor column do not work the same as in 1.8.10  

Hi
I see this problem too. I was not using data.table before 1.9, so I did no realize it ever behaved differently.  In the examples I've tried, any calculation that I expect to create a factor seems to create an integer that uses the R internal integer of the factor. 

I noticed this, I thought maybe I needed to do more explicit casting to make it come out as a factor. Here's my variable to lag a factor that beats the point into the ground.

lagFactor <- function(x, N){
    xold <- x
    if (is.factor(x)) {
        xlev <- levels(x)
        xnum <- as.numeric(x)
    } else {
        xlev <- unique(x)
    }
    xlag <- c(rep(NA, N), xnum[-(length(xnum):(length(xnum)-N+1))])
    xlagf <- factor(xlev[xlag], levels = xlev)
    xlagf
}

dat is a data.table with lots of lines, I can give you a copy if you want.

Now I'll show you that the result is different in and out of a data.table.

> xx <- lagFactor(dat$east2b, 1)
> table(xx)
xx
   Yes     No
130232 151885
> levels(xx)
[1] "Yes" "No"
> dat[ , xx := lagFactor(east2b, 1), by = c("sippid"), roll  = TRUE]
> table(dat$xx)

     1      2
114963 130095
> levels(dat$xx)
NULL
> table(xx, dat$xx)
    
xx         1      2
  Yes 114963      0
  No       0 130095


For my case, the only fix is an explicit re-factoring. 

 pj


On Fri, Mar 28, 2014 at 5:29 AM, DERVIEUX Christophe <christophe.dervieux at rte-france.com> wrote:
Hi,

I have updated data.table package to 1.9.2 recently from 1.8.10 and I found errors on my previous code.

See reproductible example below:

On 1.8.10 :
DT<-data.table(X=factor(2006:2012),Y=rep(1:7,2))
DT[,Z:=paste(X,.N,sep=" - "),by=list(X)][]

X Y Z
1: 2006 1 2006 - 2
2: 2007 2 2007 - 2
3: 2008 3 2008 - 2
4: 2009 4 2009 - 2
5: 2010 5 2010 - 2
6: 2011 6 2011 - 2
7: 2012 7 2012 - 2
8: 2006 1 2006 - 2
9: 2007 2 2007 - 2
10: 2008 3 2008 - 2
11: 2009 4 2009 - 2
12: 2010 5 2010 - 2
13: 2011 6 2011 - 2
14: 2012 7 2012 - 2

In column Z, I get the level of the factor column X
pasted with count '.N' as expected

However, in the 1.9.2, with same code :
DT<-data.table(X=factor(2006:2012),Y=rep(1:7,2))
DT[,Z:=paste(X,.N,sep=" - "),by=list(X)][]

X Y Z
1: 2006 1 1 - 2
2: 2007 2 2 - 2
3: 2008 3 3 - 2
4: 2009 4 4 - 2
5: 2010 5 5 - 2
6: 2011 6 6 - 2
7: 2012 7 7 - 2
8: 2006 1 1 - 2
9: 2007 2 2 - 2
10: 2008 3 3 - 2
11: 2009 4 4 - 2
12: 2010 5 5 - 2
13: 2011 6 6 - 2
14: 2012 7 7 - 2

as results, I do not get levels of factor column X but the numeric values associated with the level.

is this working normally? Why has it changed? Is that a bug?

I use this kind of procedure to make labels for ggplot. All my previous code is not working anymore. It's kind of annoying.

Thanks

Christophe

 


_______________________________________________
datatable-help mailing list
datatable-help at lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help



--
Paul E. Johnson
Professor, Political Science      Assoc. Director
1541 Lilac Lane, Room 504      Center for Research Methods
University of Kansas                 University of Kansas
http://pj.freefaculty.org               http://quant.ku.edu
_______________________________________________  
datatable-help mailing list  
datatable-help at lists.r-forge.r-project.org  
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140406/ca7fb937/attachment.html>


More information about the datatable-help mailing list