[adegenet-forum] DAPC for non-structured populations

Jombart, Thibaut t.jombart at imperial.ac.uk
Mon Jan 28 20:32:14 CET 2013


Hello, 

Sorry about the late reply. Using DAPC to analyse non-structured population on purpose is pretty much the same as using roller-blades for ice-skating. I may look like it is going to work, but it's not clear exactly how. 

For find.clusters, we would need a graph of BIC values. K=1 as no sense from a clustering point of view, as the output is trivial, so no, find.clusters is not supposed to return a clustering solution for K=1.

As for the outputs of DAPC, this is not surprising. DAPC attempts to find the best discrimination for a given cluster definition. If the space is large enough, a discriminating space will always exist in the data, and thus, be found by DAPC. Cross-validation based on subsets of data would help to identify such case. a.score / optim.a.score can also be used to detect such cases.

If there is no clustering and you are interested in the diversity within your panmictic population, I recommend using PCA - but even there I would not expect much structure, by definition.

Cheers

Thibaut


________________________________________
From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Joao Faria [jfariaos at gmail.com]
Sent: 21 January 2013 18:02
To: adegenet-forum at lists.r-forge.r-project.org
Subject: [adegenet-forum] DAPC for non-structured populations

Dear Thibaut and DAPC users,

I've been exploring DAPC for the past week on a 10 microsatellite dataset with 114 individuals from 3 geographically distinct locations. I've used individual based modeling (Structure) and genetic differentiation indices (Fst, Rst, Dest) to explore the structuring of these populations (not exclusively). As expected, I found no structure at all (it's a crustacean species with a huge larval dispersion capacity). I wanted to use DAPC to confirm these results and graphically represent the absence of divergence among such populations, but it consistently fails to present a valid output. I'll try to explain briefly my line of procedure:

I've performed analysis on two difference ways…one by using find.clusters and the other by assuming the number of clusters equal to my sampled locations (which are geographically separated).

1. Using find.clusters
I've retained all PCs (110) and got the lowest BIC for K = 2, with individuals from each actual group (ori) being ~equally divided among the two inferred groups (inf) (I guess that such evidence would be enough to considered my populations undifferentiated…but let's move forward...)

I've used the 2 inferred groups to perform a DAPC and selected 1/3 number of individuals of PCs (PCs = 38) (~60% cumulative variance…too much information missed). With two clusters I get one single discriminant function (one eigenvalue). If I the scatter the DAPC I obtain the density of the individuals for the single discriminant function and get perfect differentiated clusters!!

1.a) I understand that I've lost a bit of information by selecting few PCs but still shouldn't I get enough to observe undifferentiated populations?

1. b) Is there any limitation in find.clusters that impedes one to get K=1 and therefore the only K to work upon is K =2? Even if such method splits roughly half of the ori samples to each cluster? Is such inconsistency with original groups a sign of lack of structure?

2. Using populations as prior groups (3 clusters)
Number of PCs retained = 70; chosen as to capture a large amount of the variation  ~95%. All discriminate functions were retained (n.da = 2). As the number of retained PCs of PCA is too large >N/3, the DAPC outcome shows overfitting of the descriminant functions and perfectly (wrongly) differentiate the three clusters. At this point, I've taken a look at the a-score to the previous DAPC. I got 37 as an optimal number of PCs. Nevertheless, a-score mean was 0.07…a very low number, with the highest proportion for the optimal number of PCs of 0.11. Is this a clear sign of poor fitting? When performing the DAPC retaining the 37 PCs, I still get a perfect discrimination of clusters.

2.a) Is it possible to get a visual representation of unstructured populations using DAPC!?

I might be trying to do an impossible DAPC visualization, and skipping a lot of methodological constrains…and I do apologize for this huge text!!

Thanks in advance for your help.

Best Regards

João Faria
PhD student
University of Azores



More information about the adegenet-forum mailing list