[Traminer-users] pseudo-ANOVA
Gilbert Ritschard
Gilbert.Ritschard at unige.ch
Mon Jun 21 14:10:47 CEST 2010
Dear Claire,
The histogram gives the distribution of the F_perm's, i.e. those computed
from the permutations. It gives the distribution of the pseudo F under
independence (i.e. in the case of no covariate effect) while the pseudo F
provided in the table is the one computed for the observed data. It is
precisely because it departures from the cluster of F_perm's that it not
probable under independence and hence significant.
Cheers.
Gilbert
> -----Original Message-----
> From: traminer-users-bounces at lists.r-forge.r-project.org
> [mailto:traminer-users-bounces at lists.r-forge.r-project.org] On Behalf
> Of Claire Lemercier
> Sent: Monday, June 21, 2010 12:58
> To: traminer-users at lists.r-forge.r-project.org
> Subject: Re: [Traminer-users] pseudo-ANOVA
>
> Dear Matthias,
> Many thanks for this. Just a follow-up about the pseudo-F and pseudo-R2
> histograms (I'm rather being curious, as the important thing for me at
> this stage is that the p-value is significant, and remains so if I
> concentrate on households with at least 3 members).
> My results now look like this (after 5000 permutations, for all
> households):
> Pseudo ANOVA table:
> SS df MSE
> Exp 5700.474 469 12.154529
> Res 3653.928 862 4.238896
> Total 9354.402 1331 7.028101
>
> Test values (p-values based on 4999 permutations):
> PseudoF PseudoR2 PseudoF_Pval PseudoT PseudoT_Pval
> 2.867381 0.6093894 0 Inf 0
>
> However, when I look at ths histograms, the values of PseudoF cluster
> around 1 and those of PseudoR2 around 0.35. Why are they so different
> from the PseudoF and PseudoR2 given in the general results?
> All the best,
> Claire.
> > Dear Claire,
> >
> > Your questions are indeed very interesting. There are some cases when
> > the Pseudo T statistic becomes infinite. For instance, when some
> groups
> > have 2 or less units and I guess that this is your case. This means
> that
> > you should not interpret the T statistic. Therefore, you cannot
> conclude
> > that discrepancies (pseudo-variance) are significantly different (or
> > not) in each groups.
> >
> > The p-value is significant and that is what you should look at. The
> > absolute value of the R2 may be difficult to interpret. It should be
> > compared to the R2 obtained by random permutation (to get an idea if
> > 0.609 is high or low). These values are stored in the object returned
> by
> > dissassoc. You can easily get an histogram using:
> > da <- dissassoc(...)
> > hist(da)
> >
> > The values are stored in da$perms$t[, 1] (for the PseudoF statistic)
> or
> > da$perms$t[, 2] for Pseudo R2.
> >
> > The list of pseudo variances for each household can be easily
> recovered.
> > da$groups provide size and pseudo variance of each factor levels
> (i.e.
> > households in your case). You should just remove the last line (which
> > gives the total n and pseudo-variance). This can be done with the
> > following code.
> > da$groups[-nrow(da$groups),]
> >
> > Regarding the high number of levels, I think that this is not a
> problem.
> > Permutation gives you the probability that your R2 is higher than
> > obtained by random. To be sure I suggest you to use at least 5'000
> > permutations:
> > da <- dissassoc(..., R=5000)
> >
> > Hope this help,
> > Matthias Studer
> >
> > Le 17.06.2010 09:37, Claire Lemercier a ?crit :
> >
> >> Hi all,
> >> I am using the pseudo-ANOVA routine (dissassoc) of TraMineR; I think
> >> that I understand it correctly when it deals with "classical"
> >> categorical variables like sex, but I want to be sure that I make no
> >> mistake in interpreting the results in a case where the variable has
> >> hundreds of levels.
> >> We have 1332 individual sequences clustered in 470 households and we
> >> want to test if persons in the same household tend to vote similarly
> >> at similar timepoints. We produced a distance through optimal
> matching
> >> (with parameters giving an important weight to simultaneity) and
> >> dissassoc gives these results for households:
> >>
> >> Pseudo ANOVA table:
> >> SS df MSE
> >> Exp 5700.474 469 12.154529
> >> Res 3653.928 862 4.238896
> >> Total 9354.402 1331 7.028101
> >>
> >> Test values (p-values based on 999 permutation):
> >> PseudoF PseudoR2 PseudoF_Pval PseudoT PseudoT_Pval
> >> 2.867381 0.6093894 0 Inf 0
> >>
> >> Is the "Inf" for PseudoT a problem? Am I right in understanding that
> >> this shows a very important association and that, despite of the
> high
> >> number of categories, it is highly significant? I also wanted to
> >> control for the fact that household-homogeneity could be
> concentrated
> >> in only some of the households. Am I correct to think that the fact
> >> that, with an overall pseudo-variance of 7, a vast majority of even
> >> the largest households has an internal pseudo-variance of less than
> 5
> >> points in the right direction?
> >> All the best,
> >> Claire
> >>
> >>
> >>
> >> _______________________________________________
> >> Traminer-users mailing list
> >> Traminer-users at lists.r-forge.r-project.org
> >> https://lists.r-forge.r-project.org/cgi-
> bin/mailman/listinfo/traminer-users
> >>
> >>
> > -------------- next part --------------
> > An HTML attachment was scrubbed...
> > URL: <http://lists.r-forge.r-project.org/pipermail/traminer-
> users/attachments/20100618/d0732b59/attachment-0001.htm>
> >
> > ------------------------------
> >
> > _______________________________________________
> > Traminer-users mailing list
> > Traminer-users at lists.r-forge.r-project.org
> > https://lists.r-forge.r-project.org/cgi-
> bin/mailman/listinfo/traminer-users
> >
> >
> > End of Traminer-users Digest, Vol 2, Issue 8
> > ********************************************
> >
> >
> >
>
>
> _______________________________________________
> Traminer-users mailing list
> Traminer-users at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-
> users
More information about the Traminer-users
mailing list