[adegenet-forum] find.clusters producing different 'best' solutions in different runs

Pip Griffin pip.griffin at gmail.com
Wed Mar 16 05:58:23 CET 2011


Okay, I think I've answered my own question. By increasing the number of
random starts (n.start) to 1000 and the number of iterations (n.iter) to 100
(these numbers may be overkill, but the run doesn't take long to complete),
I get pretty much the same BIC curve each time, which indicates that K=9
(the previous modal value) is best. Hope this is useful for others!

Pip

On Tue, Mar 15, 2011 at 2:49 PM, Pip Griffin <pip.griffin at gmail.com> wrote:

> Dear Thibaut and Adegenet users,
>
> I have a polyploid dataset coded as binary (PA datatype) containing 297
> individuals and 97 'loci' (microsatellite alleles). I've been implementing
> the find.clusters command, retaining 40 PCA axes to capture >95% of the
> variance.
>
> The issue is that I get different 'best' solutions for the number of K
> clusters in different find.clusters runs, with a modal value of 9, but
> ranging from 6-12.  Obviously the actual differences in BIC value are pretty
> small, but even when I designate a 'cut-off' (e.g. when the BIC value must
> decrease by at least 2 for the solution to be 'better' than the previous K),
> there is variation in the solution.
>
> This variability is even higher when I choose fewer PCA axes to retain
> (e.g. retaining 80% of the variance), as would be expected, but even when I
> use 100 PCA axes (>>95% of variance), the value varies between 'runs'.
>
> Has anyone else observed this - and do you have any advice?
>
> Thanks for your help
>
> Pip
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20110316/8ca802ad/attachment.htm>


More information about the adegenet-forum mailing list