<html><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">Hello,<div> I just have a few questions regarding the <i>find.clusters</i> and the the <i>optim.a.score </i>functions. Basically, I am trying to use DAPC analysis to determine the number of genetic clusters in my dataset and because I have no real prior assumptions on the number or extent of populations I and using the <i>find.clusters</i> function. Using an assignment test (in STRUCTURE) I find two relatively strong genetic clusters, but when I use the <i>find.clusters</i> function, the BIC scores suggest that there are 4 clusters and essentially divides one of the 2 clusters identified in STRUCTURE into 3. The problem is that these 3 clusters do not really map out very well geographically. I have a few ideas of why this might be the case, but just want to make sure I am running the analysis correctly before I dive into this much further.</div><div><br></div><div> I think my main problem I have is how many PCA axis (n.pca) to save for this analysis when using the <i>find.clusters</i> function. Because I do not have any prior population delineation I do not think it makes sense to use the <i>optim.a.score</i> to determine this. I have tried a few different values and they give different results, but what I ended up doing was setting this to a high value to capture a large amount of the variation (~95%), which seems to be what was done in the BMC genetics paper? Once I have the number of clusters (4 in this case) I assigned individuals to the 4 groups (using n.pca =100 again) and then used the <i>optim.a.score </i>function to determine the optimal number of PCA axis in assigning individuals to these 4 groups. I then reclassed individuals, determined posterior membership probabilities and produced scatter plots. Can anyone provide any comments/suggestions on if this is a proper way to proceed or if I am missing anything? Based on the geographic distribution of these clusters, my concern is that I am picking up some genetic structure that is very weak and does not really have any biological meaning, but using the optimal number of PCA axis (13) the classification rate is over 90% for all the 4 groups, compared to 30-40% when I randomly shuffle the individuals so I don't want to discount it. I should probably also mention that I am using 17 microsat loci to conduct this analysis.</div><div><br></div><div> Lastly, if I am running this analysis correctly, I want to try and identify the particular loci and alleles that are driving this structure and so am wondering if there was any code or examples that I could use to produce plots similar to figure 9 in the BMC genetics paper?</div><div><br></div><div>Thanks,</div><div><br></div><div>Jeff</div>
<br><br><div> <span class="Apple-style-span" style="border-collapse: separate; color: rgb(0, 0, 0); font-family: Helvetica; font-size: medium; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; "><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div>--</div><div>Jeffrey R. Row</div><div>PhD Candidate</div><div>Department of Biology, Queen's University </div><div>Kingston, ON</div><div>K7L 3N6</div><div>Phone: 613-533-6000 x 75051</div><div><br></div><div><br></div></div></span><br class="Apple-interchange-newline"> </div><br></body></html>