[Chaid-commits] possible bug on CHAID
Jose Marcos Ferraro
jose.ferraro at LOGITeng.com
Wed Oct 22 23:11:31 CEST 2014
Hello,
I believe I may have found a bug.
Let us consider a single explanatory variable(NOT ordered) that has four levels: three of them are very similar and one is significantly different. One would expect that these three variables would be merged, however I have got different results. To make more concrete, let the levels of the explanatory variable be "a","b","c" and d and the levels of the response "x","y" and "z". Let the cross-classification table be
x y z
a 33 33 33
b 66 34 100
c 34 33 33
d 33 33 33
One would expect that a and d be merged first(for they are identical) and then they would be merged to c, that has a very close distribution. That does not happen. Only the two variables are merged and the result is :
Model formula:
y ~ x
Fitted party:
[1] root
| [2] x in a, d: x (n = 198, err = 66.7%)
| [3] x in b: z (n = 200, err = 50.0%)
| [4] x in c: x (n = 100, err = 66.0%)
I believe the problem lies in the line 184 of chaid.R that reads
logpmaxs <- logpmaxs[-min(levindx), -max(levindx)]
and should be
logpmaxs <- logpmaxs[-max(levindx), -max(levindx)]
Am I wrong in my understanding? Could be this fixed?
On a different topic: Is development of this package still being pursued? Could numeric response variables be included or it will stick to the Kass(80) paper?
The code to run the given example follows:
padrao <- c(rep("x",33),rep("y",33),rep("z",33))
y<-c(padrao,padrao,padrao,"x" , rep("x",66),rep("y",34),rep("z",100))
x<-c(rep("a",99),rep("d",99),rep("c",100),rep("b",200))
x<-as.factor(x)
y<-as.factor(y)
xtabs(~y+x)
df <- data.frame(y , x )
chaid(y ~ x , data = df)
Jose Marcos Ferraro
Jose.ferraro at LOGITeng.com<mailto:Jose.ferraro at LOGITeng.com>
tel + 55 11 3474-8585
fax +55 11 3474-8501
www.LOGITeng.com<http://www.LOGITeng.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/chaid-commits/attachments/20141022/b682b2ff/attachment.html>
More information about the Chaid-commits
mailing list