[Traminer-users] selecting the number of clusters

Rimantas Vosylis rvosylis at live.com
Wed Jun 10 12:30:38 CEST 2015

Dear Traminer users,


I am trying to build a typology of sequences by using cluster analysis with
OM and Ward algorith.


I have a problem of choosing the number of clusters. I use several empirical
indexes, but they don't help me a lot. I use Calinski and harabasz (CH)
index, but it has a peak at two cluster solution and the goes down. I also
use average shilloute width but it gives me the similar results as CH index.
I also run pseudo ANOVA to see which cluster solution explains most
variance, but it tells me the opposite - the more the clusters the higher
the pseudo R2 gets. When I look at the various plots (e.g. seqdplot) I see
that the most meaningful solutions (I have several types of sequences) lie
somewhere between 4-6 clusters.


Could You perhaps suggest which indexes worked best for You and matched Your
expectations / theoretical knowledge and that I could use in my analysis?


Thank You in advance!!





Rimantas Vosylis

PhD student, lecturer

Insitute of Psychology

Faculty of Social Technologies

Mykolas Romeris University


e-mail: rimantasv at mruni.eu <mailto:rimantasv at mruni.eu> 

e-mail2: rvosylis at live.com <mailto:rvosylis at live.com> 



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/traminer-users/attachments/20150610/c03c9e9e/attachment.html>

More information about the Traminer-users mailing list