[adegenet-forum] significance of clusters
Jombart, Thibaut
t.jombart at imperial.ac.uk
Mon Jul 18 11:24:50 CEST 2011
Hello Nikki,
ellipses used in DAPC scatterplots are drawn using ade4's function s.class, and are inertia ellipses. They are merely a graphical tool used to visualize the shape of different clouds of points.
If you assume the distribution in each group a bivariate normal, then these are two-dimensional confidence intervals for the groups, and the size of the ellipses is related to the level of confidence chosen:
p = 1 - exp(-cellipse^2 / 2)
where "cellipse" is the size factor used in scatter.dapc. It defaults to 1.5, so ellipses by default would contain about 2/3 of the points. To get 95% CI, use 2.5. Etc.
However, this is only true for bivariate normal distributions, which can be pretty rare!
So I'd get back to the first answer: they are a graphical tool to assess the distribution of points by group.
Cheers
Thibaut
________________________________________
From: adegenet-forum-bounces at r-forge.wu-wien.ac.at [adegenet-forum-bounces at r-forge.wu-wien.ac.at] on behalf of Nikki Vollmer [nlv209 at hotmail.com]
Sent: 13 July 2011 16:13
To: adegenet-forum at r-forge.wu-wien.ac.at
Subject: [adegenet-forum] significance of clusters
Hello,
I thought I remembered someone asking a similar question like this before, but could not find it anywhere in the forum. I do apologize for any duplication... but I was trying to figure out, with DAPC, what exactly the circle represents around each cluster. Is it the mean assignment of individuals?
Also, is there any way to test if the overlap between two clusters is significant or not? Or is it just that significant differentiation exists because there are two clusters, for instance. And that any lack of differentiation would result in the two clusters being combined into one.
Thank you for any help!
Nikki
