# [adegenet-forum] question find.clusters and BIC values

Jombart, Thibaut t.jombart at imperial.ac.uk
Tue May 24 09:19:16 CEST 2011

```Hi there,

I would not call Evanno et al's approach 'formal', but anyway are approach with find.cluster is fairly ad hoc as well. The idea, somewhat arbitrary, is to look for an elbow in the curve of BIC values. Please have a look at the DAPC paper for examples.

I developed some 'objective' criteria for automated selection of K, but none if these options is 'better' than visual inspection of the data (see ?find.clusters from the devel version).

Cheers

Thibaut.
________________________________
From: Martin Llewellyn [llewellynmartin at hotmail.com]
Sent: 23 May 2011 12:02
To: Jombart, Thibaut; adegenet-forum at r-forge.wu-wien.ac.at
Subject: RE: [adegenet-forum] question find.clusters and BIC values

Is there a formal means of identifying the appropriate number of clusters from the BIC value curve if no asymptope is reached? Structure users generally use delta-K (Evanno et al 2005) - would an exquivalent statistic be approrpiate ?

Thanks

M

________________________________
From: t.jombart at imperial.ac.uk
To: mrjonker at gmail.com; adegenet-forum at r-forge.wu-wien.ac.at
Date: Thu, 19 May 2011 08:40:45 +0000

Dear Rudy,

the "d=..." simply indicates the mesh of the grid, which is not necessarily useful for a single graph, but may be used to compare different graphs (e.g. plane 1-2 and 3-4, to see how the distances in one plot relate to the other one). You can remove this indication using "cgrid=0" or remove the grid using "grid=FALSE".

About the inset, this is generated by the function add.scatter.eig. It scales so that the largest value uses most of the inset window. For instance:
plot(1:10)

About the inertia ellipses, they are graphical summaries of a cloud of points. This topic is actually being discussed in the adelist forum (forum for ade4):

They're not generally confidence ellipses. They are whenever the cloud of point is a sample from a bivariate normal distribution. In this case, "cellipse" (determines the size of the ellipse) indirectly controls the alpha threshold. 1.5 (default) corresponds to 67%; 2.5 corresponds to the magical 95%.
The exact formula is:
p = 1 − exp( − cellipse^2 /2 )

All the best

Thibaut

________________________________
From: adegenet-forum-bounces at r-forge.wu-wien.ac.at [adegenet-forum-bounces at r-forge.wu-wien.ac.at] on behalf of Rudy Jonker [mrjonker at gmail.com]
Sent: 18 May 2011 09:39

Dear Thibaut and group,

I have a question with respect to the graphs made with scatter.dapc. I noticed a d=n in the top right corner of the graph and was wondering what it stands for. I could not find it in the manual or in the article. I also have a question with respect to the eigenvalue inset in the graph. Are the bars scaled such that the first eigenvalue is always the full length of the graph and that the rest is then plotted relative to the first?

And then the inertia ellipse: is this a sort of 95% confidence interval on the position of the centroid of a cluster?