[Traminer-users] pseudo-ANOVA results

Matthias Studer Matthias.Studer at unige.ch
Fri Jun 18 13:12:41 CEST 2010

Dear Claire,

Your questions are indeed very interesting. There are some cases when 
the Pseudo T statistic becomes infinite. For instance, when some groups 
have 2 or less units and I guess that this is your case. This means that 
you should not interpret the T statistic. Therefore, you cannot conclude 
that discrepancies (pseudo-variance) are significantly different (or 
not) in each groups.

The p-value is significant and that is what you should look at. The 
absolute value of the R2 may be difficult to interpret. It should be 
compared to the R2 obtained by random permutation (to get an idea if 
0.609 is high or low). These values are stored in the object returned by 
dissassoc. You can easily get an histogram using:
da <- dissassoc(...)

The values are stored in da$perms$t[, 1] (for the PseudoF statistic) or 
da$perms$t[, 2] for Pseudo R2.

The list of pseudo variances for each household can be easily recovered. 
da$groups provide size and pseudo variance of each factor levels (i.e. 
households in your case). You should just remove the last line (which 
gives the total n and pseudo-variance). This can be done with the 
following code.

Regarding the high number of levels, I think that this is not a problem. 
Permutation gives you the probability that your R2 is higher than 
obtained by random. To be sure I suggest you to use at least 5'000 
da <- dissassoc(..., R=5000)

Hope this help,
Matthias Studer

Le 17.06.2010 09:37, Claire Lemercier a écrit :
> Hi all,
> I am using the pseudo-ANOVA routine (dissassoc) of TraMineR; I think 
> that I understand it correctly when it deals with "classical" 
> categorical variables like sex, but I want to be sure that I make no 
> mistake in interpreting the results in a case where the variable has 
> hundreds of levels.
> We have 1332 individual sequences clustered in 470 households and we 
> want to test if persons in the same household tend to vote similarly 
> at similar timepoints. We produced a distance through optimal matching 
> (with parameters giving an important weight to simultaneity) and 
> dissassoc gives these results for households:
> Pseudo ANOVA table:
>            SS   df       MSE
> Exp   5700.474  469 12.154529
> Res   3653.928  862  4.238896
> Total 9354.402 1331  7.028101
> Test values  (p-values based on 999 permutation):
>  PseudoF  PseudoR2 PseudoF_Pval PseudoT PseudoT_Pval
> 2.867381 0.6093894            0     Inf            0
> Is the "Inf" for PseudoT a problem? Am I right in understanding that 
> this shows a very important association and that, despite of the high 
> number of categories, it is highly significant? I also wanted to 
> control for the fact that household-homogeneity could be concentrated 
> in only some of the households. Am I correct to think that the fact 
> that, with an overall pseudo-variance of 7, a vast majority of even 
> the largest households has an internal pseudo-variance of less than 5 
> points in the right direction?
> All the best,
> Claire
> _______________________________________________
> Traminer-users mailing list
> Traminer-users at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/traminer-users 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/traminer-users/attachments/20100618/d0732b59/attachment.htm>

More information about the Traminer-users mailing list