[adegenet-forum] find.clusters and optim.a.score

jeff 5jr29 at queensu.ca
Mon Feb 28 19:08:55 CET 2011

    I just have a few questions regarding the find.clusters and the  
the optim.a.score functions. Basically, I am trying to use DAPC  
analysis to determine the number of genetic clusters in my dataset and  
because I have no real prior assumptions on the number or extent of  
populations I and using the find.clusters function. Using an  
assignment test (in STRUCTURE) I find two relatively strong genetic  
clusters, but when I use the find.clusters function, the BIC scores  
suggest that there are 4 clusters and essentially divides one of the 2  
clusters identified in STRUCTURE into 3. The problem is that these 3  
clusters do not really map out very well geographically. I have a few  
ideas of why this might be the case, but just want to make sure I am  
running the analysis correctly before I dive into this much further.

    I think my main problem I have is how many PCA axis (n.pca) to  
save for this analysis when using the find.clusters function. Because  
I do not have any prior population delineation I do not think it makes  
sense to use the optim.a.score to determine this. I have tried a few  
different values and they give different results, but what I ended up  
doing was setting this to a high value to capture a large amount of  
the variation (~95%), which seems to be what was done in the BMC  
genetics paper? Once I have the number of clusters (4 in this case) I  
assigned individuals to the 4 groups (using n.pca =100 again) and then  
used the optim.a.score function to determine the optimal number of PCA  
axis in assigning individuals to these 4 groups. I then reclassed  
individuals, determined posterior membership probabilities and  
produced scatter plots. Can anyone provide any comments/suggestions on  
if this is a proper way to proceed or if I am missing anything? Based  
on the geographic distribution of these clusters, my concern is that I  
am picking up some genetic structure that is very weak and does not  
really have any biological meaning, but using the optimal number of  
PCA axis (13) the classification rate is over 90% for all the 4  
groups, compared to 30-40% when I randomly shuffle the individuals so  
I don't want to discount it. I should probably also mention that I am  
using 17 microsat loci to conduct this analysis.

     Lastly, if I am running this analysis correctly, I want to try  
and identify the particular loci and alleles that are driving this  
structure and so am wondering if there was any code or examples that I  
could use to produce plots similar to figure 9 in the BMC genetics  



Jeffrey R. Row
PhD Candidate
Department of Biology, Queen's University
Kingston, ON
K7L 3N6
Phone: 613-533-6000 x 75051

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20110228/25ac30ce/attachment.htm>

More information about the adegenet-forum mailing list