t.jombart at imperial.ac.uk
Mon Jan 17 19:08:48 CET 2011
Thanks for your post, which allows to illustrate the interest of the a.score.
As you suggest, the idea behind the a.score is that there is a trade-off between finding a space with a good power of discrimination using DAPC, and retaining too many dimensions so that for any random group we'd find a good discriminating space anyway. optim.a.score simply runs the analysis for a range of retained PCs and tracks the output in terms of a-score.
To make it clear to other users, let's give an example using the microbov dataset (30 microsat of 15+ cattle breeds). Let's examine % of successful reassignment (i.e., quality of discrimination) for different numbers of retained PCs.
First, 3 PCs:
summary(dapc(microbov, n.da=100, n.pca=3))$assign.per.pop
Here, you can see that some breeds are well discriminated (Zebu, Lagunaire, > 90%) while others are overlooked (Bretone Pie Noire, Limousin, etc.). Not enough information. Let's keep more PCs:
summary(dapc(microbov, n.da=100, n.pca=300))$assign.per.pop
Almost 100% of discrimination for all groups. Good ? Nop. Actually, the space retained is so big anything would be well discriminated. If we complitely randomise the groups, this is what we obtain:
summary(dapc(x, n.da=100, n.pca=300))$assign.per.pop
Groups have been randomised, and yet we still get % above 80%. To avoid this kind of trouble, optim.a.score tries optimising the % on the actual group while minimizing the % for random groups. In this case:
dapc1=dapc(microbov, n.da=100, n.pca=100)
tells us that the optimal number of PCs to retain is around 10-20.
In your case, I would definitely go for 3 PCs. Keeping 45 % of the variance is fine if the rest is random noise, so there's no problem here. If you look at the results, you get very good scores for two groups (2,4) and decent for a third one (3) with only 3 PCs, with more than 95% of successful reassignment in 2 and 4. Adding more PCs, what happens is that you barely improve discrimination, but increase the chances of artefactual discrimination (hence 2,4 and 3 have lower a-scores).
It is natural that a-scores vary slightly from one trial to another, since they rely on random permutations of the groups (by default, 10 for each nb of retained PCs). To avoid such fluctuation, try increasing the number of simulations (argument n.sim), say to 30 or more. If the fluctuations are massive, then there is probably a problem, although I could imagine this might be possible when the sampling of the different groups is very uneven.
From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] On Behalf Of Mrinalini,Mrinalini [bsp22e at bangor.ac.uk]
Sent: 17 January 2011 17:36
To: adegenet-forum at r-forge.wu-wien.ac.at
Subject: [adegenet-forum] a.score
My find.clusters analysis for species delimitation resulted in 4
clusters and I retained 10PCs (representing about 75% of cumulative
variance) and 3 eigenvalues. I used this optim.a.score command but not
sure it is right...
optim.a.score(dapc1, n.pca=1:ncol(dapc1$tab), smart=TRUE, n=10,
plot=TRUE, n.sim=10, n.da=3)
If this is right, I get the optimal number of PCs to be 3 which
represents only 45% of the variance.
Then I redid the DAPC using 3 PCs (which only represents 45% of
variance) and it gave me the following a.scores
1 2 4 3
0.2227273 0.9666667 0.9571429 0.6333333
If I used all 10 PCs, I get a.scores
1 2 4 3
0.2818182 0.7500000 0.6571429 0.5333333
I'm not sure whether to use 10 or 3 PCs as it seems to be trade-off
between total variance and a.scores. But more confusingly, each time I
redo the DAPC with a certain number of PCs (say 3) it gives me different
Could you kindly clarify this.
Thanks for your help,
adegenet-forum mailing list
adegenet-forum at lists.r-forge.r-project.org
More information about the adegenet-forum