[adegenet-forum] find.clusters and optim.a.score
jeff
5jr29 at queensu.ca
Mon Feb 28 19:08:55 CET 2011
Hello,
I just have a few questions regarding the find.clusters and the
the optim.a.score functions. Basically, I am trying to use DAPC
analysis to determine the number of genetic clusters in my dataset and
because I have no real prior assumptions on the number or extent of
populations I and using the find.clusters function. Using an
assignment test (in STRUCTURE) I find two relatively strong genetic
clusters, but when I use the find.clusters function, the BIC scores
suggest that there are 4 clusters and essentially divides one of the 2
clusters identified in STRUCTURE into 3. The problem is that these 3
clusters do not really map out very well geographically. I have a few
ideas of why this might be the case, but just want to make sure I am
running the analysis correctly before I dive into this much further.
I think my main problem I have is how many PCA axis (n.pca) to
save for this analysis when using the find.clusters function. Because
I do not have any prior population delineation I do not think it makes
sense to use the optim.a.score to determine this. I have tried a few
different values and they give different results, but what I ended up
doing was setting this to a high value to capture a large amount of
the variation (~95%), which seems to be what was done in the BMC
genetics paper? Once I have the number of clusters (4 in this case) I
assigned individuals to the 4 groups (using n.pca =100 again) and then
used the optim.a.score function to determine the optimal number of PCA
axis in assigning individuals to these 4 groups. I then reclassed
individuals, determined posterior membership probabilities and
produced scatter plots. Can anyone provide any comments/suggestions on
if this is a proper way to proceed or if I am missing anything? Based
on the geographic distribution of these clusters, my concern is that I
am picking up some genetic structure that is very weak and does not
really have any biological meaning, but using the optimal number of
PCA axis (13) the classification rate is over 90% for all the 4
groups, compared to 30-40% when I randomly shuffle the individuals so
I don't want to discount it. I should probably also mention that I am
using 17 microsat loci to conduct this analysis.
Lastly, if I am running this analysis correctly, I want to try
and identify the particular loci and alleles that are driving this
structure and so am wondering if there was any code or examples that I
could use to produce plots similar to figure 9 in the BMC genetics
paper?
Thanks,
Jeff
--
Jeffrey R. Row
PhD Candidate
Department of Biology, Queen's University
Kingston, ON
K7L 3N6
Phone: 613-533-6000 x 75051
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20110228/25ac30ce/attachment.htm>
More information about the adegenet-forum
mailing list