[adegenet-forum] more clusters with less PCs retained in find.clusters

Mon May 13 13:36:41 CEST 2019

Hi Lu,

you have a lot of variables for few individuals. Chances are when you
retain too many PCs, you end up keeping a lot of random noise and Kmeans
struggles to find meaningful centroids for your clusters. When keeping less
PCs, you likely get rid of a lot of the noise, but the signal (group
discrimination) stays.

Best
Thibaut

--
Dr Thibaut Jombart
Associate Professor in Outbreak Analytics, London School of Hygiene and
Tropical Medicine
Senior Lecturer in Genetic Analysis, Imperial College London
President of RECON: repidemicsconsortium.org
https://thibautjombart.netlify.com
Twitter: @TeebzR

On Sun, 12 May 2019 at 18:00, Lu Maffey <lucia.maffey at gmail.com> wrote:

> Hi everyone!
> I'm performing DAPC analyses on radseq data from mosquitoes. When I run
> the previous find.clusters function, I get larger K values (that is more
> clusters) when I retain fewer PCs. For example, I have a data set of 40
> individuals (with over 80.000 SNPS). When I retain 30-38 PCs, I get K=1 and
> when I choose to retain 15 PCs, I get K=3. Shouldn't it be the other way
> around? More PCs retained smaller K and vicecersa? When I run the same data
> in Structure I get  K=3 so I tend to think that I should retain fewer PCs
> in find.clusters but as the vignette explains that you should retain all of
> them, I'm worried I'm missing something here. I've been searching the
> archive but only found the same question unanswered.
>
> Thanks in advance!!!
>
> Lu
> _______________________________________________
> adegenet-forum mailing list
> adegenet-forum at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20190513/0c66b2cb/attachment.html>