[adegenet-forum] Problems with find.cluster

Caitlin Collins caitiecollins at gmail.com
Thu Sep 18 14:46:16 CEST 2014


Hi Siobhan,

As a preliminary suggestion that will be easy to investigate, I would
suggest that perhaps the number of PCs retained is affecting your results
from find.clusters.

Have you had a look at the xvalDapc function? Similar to a.score, xvalDapc
can be used to help mediate the trade-off between discriminatory power and
over-fitting. I would be curious to see what xvalDapc recommends as the
number of PCs to retain to best differentiate the four groups you are
identifying via other methods. If the optimal number of PCs selected by
xvalDapc for the four groups is greater than the 11 PCs you have selected
with a.score, this would suggest that you may not have enough information
for the BIC to identify more than one cluster, so I would recommend
re-running find.clusters with the number of PCs suggested by xvalDapc to
see if you get different results.

Of course, it is possible that the problem lies elsewhere, or that
according to the BIC there is simply not enough evidence for more than one
cluster, but at least it will be very easy to check this theory.  Please
let us know the results and we can then continue to search for other
solutions if necessary.

Best,
Caitlin.

On Tue, Sep 9, 2014 at 7:31 AM, Siobhan Dennison <siobhan.dennison at mq.edu.au
> wrote:

> I am working on genetic structure of a threatened species, and as such
> have rather small sample sizes. Two of my four populations are out of HWE,
> and so I am using DAPC to look at population clustering because it does not
> assume HWE.
>
> The DAPC yielded 4 clusters as I expected, using the location information,
> and retaining a very conservative 11 PCs (following a.score). However, when
> I wanted to look at clustering with no location priors on the data, things
> got a bit weird. I used the find.clusters option in adegenet, and I keep
> getting very different results to my other analyses - the lowest BIC falls
> at K=1, but the BIC values are extremely low (~420), steadily increasing
> from there (I attached the graph FYI).
>
> My Fst values based on microsatellites suggest high differentiation
> between the 4 sites. I standardised my Fst values following Miermans 2006,
> which gave rather high Fst values (0.2-0.4). My mitochondrial Fst values
> are also high (>0.5).
>
> Using Structure with LOCprior (accounting for low sample sizes), I get K=4
> as the most likely number of clusters, and PCA also shows delineation
> between the four sample sites.
>
> Given that all of my other analyses tell the same story (that there a four
> rather differentiated sites), I'm wondering if anyone can tell me where I
> might be going wrong here?
>
> Any pointers would be greatly appreciated!!
>
> Thanks,
> Siobhan
> --
>
>
> _______________________________________________
> adegenet-forum mailing list
> adegenet-forum at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20140918/8f4d2fd8/attachment.html>


More information about the adegenet-forum mailing list