[adegenet-forum] more clusters with less PCs retained in find.clusters

Thibaut Jombart thibautjombart at gmail.com
Mon May 13 13:36:41 CEST 2019

Hi Lu,

you have a lot of variables for few individuals. Chances are when you
retain too many PCs, you end up keeping a lot of random noise and Kmeans
struggles to find meaningful centroids for your clusters. When keeping less
PCs, you likely get rid of a lot of the noise, but the signal (group
discrimination) stays.


Dr Thibaut Jombart
Associate Professor in Outbreak Analytics, London School of Hygiene and
Tropical Medicine
Senior Lecturer in Genetic Analysis, Imperial College London
President of RECON: repidemicsconsortium.org
Twitter: @TeebzR

On Sun, 12 May 2019 at 18:00, Lu Maffey <lucia.maffey at gmail.com> wrote:

> Hi everyone!
> I'm performing DAPC analyses on radseq data from mosquitoes. When I run
> the previous find.clusters function, I get larger K values (that is more
> clusters) when I retain fewer PCs. For example, I have a data set of 40
> individuals (with over 80.000 SNPS). When I retain 30-38 PCs, I get K=1 and
> when I choose to retain 15 PCs, I get K=3. Shouldn't it be the other way
> around? More PCs retained smaller K and vicecersa? When I run the same data
> in Structure I get  K=3 so I tend to think that I should retain fewer PCs
> in find.clusters but as the vignette explains that you should retain all of
> them, I'm worried I'm missing something here. I've been searching the
> archive but only found the same question unanswered.
> Thanks in advance!!!
> Lu
> _______________________________________________
> adegenet-forum mailing list
> adegenet-forum at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20190513/0c66b2cb/attachment.html>

More information about the adegenet-forum mailing list