[datatable-help] In 1.9.2, By with factor column do not work the same as in 1.8.10
Arunkumar Srinivasan
aragorn168b at gmail.com
Sun Apr 6 11:46:07 CEST 2014
This is now fixed with commit #1256 from v1.9.3. Thanks Christophe for filing #5437 and Paul for following up. I'll close it now. Please write back if you find something's not right still.
Arun
From: Paul Johnson pauljohn32 at gmail.com
Reply: Paul Johnson pauljohn32 at gmail.com
Date: March 31, 2014 at 2:03:48 AM
To: DERVIEUX Christophe christophe.dervieux at rte-france.com
Cc: datatable-help at lists.r-forge.r-project.org datatable-help at lists.r-forge.r-project.org
Subject: Re: [datatable-help] In 1.9.2, By with factor column do not work the same as in 1.8.10
Hi
I see this problem too. I was not using data.table before 1.9, so I did no realize it ever behaved differently. In the examples I've tried, any calculation that I expect to create a factor seems to create an integer that uses the R internal integer of the factor.
I noticed this, I thought maybe I needed to do more explicit casting to make it come out as a factor. Here's my variable to lag a factor that beats the point into the ground.
lagFactor <- function(x, N){
xold <- x
if (is.factor(x)) {
xlev <- levels(x)
xnum <- as.numeric(x)
} else {
xlev <- unique(x)
}
xlag <- c(rep(NA, N), xnum[-(length(xnum):(length(xnum)-N+1))])
xlagf <- factor(xlev[xlag], levels = xlev)
xlagf
}
dat is a data.table with lots of lines, I can give you a copy if you want.
Now I'll show you that the result is different in and out of a data.table.
> xx <- lagFactor(dat$east2b, 1)
> table(xx)
xx
Yes No
130232 151885
> levels(xx)
[1] "Yes" "No"
> dat[ , xx := lagFactor(east2b, 1), by = c("sippid"), roll = TRUE]
> table(dat$xx)
1 2
114963 130095
> levels(dat$xx)
NULL
> table(xx, dat$xx)
xx 1 2
Yes 114963 0
No 0 130095
For my case, the only fix is an explicit re-factoring.
pj
On Fri, Mar 28, 2014 at 5:29 AM, DERVIEUX Christophe <christophe.dervieux at rte-france.com> wrote:
Hi,
I have updated data.table package to 1.9.2 recently from 1.8.10 and I found errors on my previous code.
See reproductible example below:
On 1.8.10 :
DT<-data.table(X=factor(2006:2012),Y=rep(1:7,2))
DT[,Z:=paste(X,.N,sep=" - "),by=list(X)][]
X Y Z
1: 2006 1 2006 - 2
2: 2007 2 2007 - 2
3: 2008 3 2008 - 2
4: 2009 4 2009 - 2
5: 2010 5 2010 - 2
6: 2011 6 2011 - 2
7: 2012 7 2012 - 2
8: 2006 1 2006 - 2
9: 2007 2 2007 - 2
10: 2008 3 2008 - 2
11: 2009 4 2009 - 2
12: 2010 5 2010 - 2
13: 2011 6 2011 - 2
14: 2012 7 2012 - 2
In column Z, I get the level of the factor column X
pasted with count '.N' as expected
However, in the 1.9.2, with same code :
DT<-data.table(X=factor(2006:2012),Y=rep(1:7,2))
DT[,Z:=paste(X,.N,sep=" - "),by=list(X)][]
X Y Z
1: 2006 1 1 - 2
2: 2007 2 2 - 2
3: 2008 3 3 - 2
4: 2009 4 4 - 2
5: 2010 5 5 - 2
6: 2011 6 6 - 2
7: 2012 7 7 - 2
8: 2006 1 1 - 2
9: 2007 2 2 - 2
10: 2008 3 3 - 2
11: 2009 4 4 - 2
12: 2010 5 5 - 2
13: 2011 6 6 - 2
14: 2012 7 7 - 2
as results, I do not get levels of factor column X but the numeric values associated with the level.
is this working normally? Why has it changed? Is that a bug?
I use this kind of procedure to make labels for ggplot. All my previous code is not working anymore. It's kind of annoying.
Thanks
Christophe
_______________________________________________
datatable-help mailing list
datatable-help at lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
--
Paul E. Johnson
Professor, Political Science Assoc. Director
1541 Lilac Lane, Room 504 Center for Research Methods
University of Kansas University of Kansas
http://pj.freefaculty.org http://quant.ku.edu
_______________________________________________
datatable-help mailing list
datatable-help at lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/datatable-help/attachments/20140406/ca7fb937/attachment.html>
More information about the datatable-help
mailing list