[adegenet-forum] Kmeans and DAPC on poolSeq data

DAUPHIN Benjamin benjamin.dauphin at unine.ch
Fri Feb 2 22:01:40 CET 2018


Thanks Thibaut.
Yes i have 7 pools (=7 rows or =7 individuals in the analysis), and i expect two clusters representing two already characterized lineages. I have found 4 likely clusters based on HCPC but i want to double check this, with a kmeans if possible.
Best
Ben
________________________________________
From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Thibaut Jombart [thibautjombart at gmail.com]
Sent: 02 February 2018 18:25
To: Benjamin Dauphin
Cc: adegenet-forum at lists.r-forge.r-project.org
Subject: Re: [adegenet-forum] Kmeans and DAPC on poolSeq data

Hi again,

such plot typically indicates no clustering. Just to confirm: are we talking about 7 rows and 100,000 columns?

If so, your pools are technically your statistical individuals, and the method explore clustering solutions for 1-6 clusters for 7 individuals, which won't go far - not enough individuals to detect clustering really. Apologies if I misunderstood.

Best
Thibaut


--
Dr Thibaut Jombart
Lecturer, Department of Infectious Disease Epidemiology, Imperial College London
Head of RECON: repidemicsconsortium.org<http://repidemicsconsortium.org>
WHO Consultant - outbreak analysis
https://thibautjombart.netlify.com
Twitter: @TeebzR
+44(0)20 7594 3658

On 2 February 2018 at 09:07, Benjamin Dauphin <benjamin.dauphin at wsl.ch<mailto:benjamin.dauphin at wsl.ch>> wrote:
Hi Mark,

Thanks for response. I’ve run find.clusters() with the matrix of allele frequencies as input file, and then run the DAPC using still the matrix (not the genind or genlight object) by assigning the group generated with kmeans (grp$grp). It works but I have a strange “inverted parabolic curve" for the kmean analysis.
Is it a common picture for pooldseq data?

Thanks,
Ben




> On 1 Feb 2018, at 18:01, Mark Coulson <Mark.Coulson.ic at uhi.ac.uk<mailto:Mark.Coulson.ic at uhi.ac.uk>> wrote:
>
> Hi Ben,
>
> I have used allelotype data with the input as a matrix of the frequency of the A allele in each group to run DAPC and it worked well. However, my groups were defined already but could the same type of input not be used to find.clusters?
>
> Mark
>
>
> -----Original Message-----
> From: adegenet-forum-bounces at lists.r-forge.r-project.org<mailto:adegenet-forum-bounces at lists.r-forge.r-project.org> [mailto:adegenet-forum-bounces at lists.r-forge.r-project.org<mailto:adegenet-forum-bounces at lists.r-forge.r-project.org>] On Behalf Of Benjamin Dauphin
> Sent: 31 January 2018 09:18
> To: adegenet-forum at lists.r-forge.r-project.org<mailto:adegenet-forum at lists.r-forge.r-project.org>
> Subject: [adegenet-forum] Kmeans and DAPC on poolSeq data
>
> Dear all,
>
> I am newly working on pool sequencing data and I simply wonder if I can use kmeans (find.cluster) and DAPC to investigate population structure from poolseq data (allele frequencies)? How find.clusters can deal with allele frequencies?
>
> Dataset: 7 pools and 100’000 SNPs
>
> Any comment or help would be much appreciated.
> Best regards
> Ben
>
>
> _______________________________________________
> adegenet-forum mailing list
> adegenet-forum at lists.r-forge.r-project.org<mailto:adegenet-forum at lists.r-forge.r-project.org>
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum
> Inverness College UHI, a partner in the University of the Highlands and Islands www.inverness.uhi.ac.uk<http://www.inverness.uhi.ac.uk> Board of Management of Inverness College (known as Inverness College UHI), Scottish Charity No SC021197.


_______________________________________________
adegenet-forum mailing list
adegenet-forum at lists.r-forge.r-project.org<mailto:adegenet-forum at lists.r-forge.r-project.org>
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum



More information about the adegenet-forum mailing list