Jombart, Thibaut
t.jombart at imperial.ac.uk
Wed Feb 9 11:56:32 CET 2011
Dear Ricardo,
--- I forward your email to the "adegenet-forum" ; "adegenet-commits" only stores development messages, which are pretty useless for most users ---
I am not familiar with STRUCTURE, but other people on this ML may want to reply on this point. Since there are many estimators of Fst, I am not too surprised to see two different programs give different values, although qualitatively the results should not change.
I have no idea how the triangle plot is made in STRUCTURE. This has to be compositional data, so I suspect it is somehow based on proportion of individual genomes falling into the three clusters. If this is the case, it is not necessarily a problem to have a reasonably low Fst values. Your triangle plot says that you can find a representation based on your data in which the three populations are separated. In other words, individuals all clearly belong to a single group, but STRUCTURE won't anyway tell you anything about the distance between these groups (and this is one of the reason why multivariate analysis is a useful complement to STRUCTURE).
Fst is a different thing: it measures the proportion of genetic variance which is due to differences between groups. You can have a perfect discrimination of a set of populations, and yet a small Fst.
Imagine we have 100,000 SNPs for 10 populations. Say that all alleles are distributed randomly, except for 10 SNPs which yield private alleles for one different population each. Based on these 10 SNPs only, we can perfectly separate the populations. If you used PCA, or better, DAPC, the first PCs would combine these alleles to show a perfect separation of the populations. Now, these are only 10 SNPs out of 100,000. The Fst will therefore be very low, since it will have to be some kind of average Fst over the loci, and 10 loci with a large variance between group have little weight compared to 99,990 loci without variance between group.
Best regards
Thibaut
Dear Thibaut and other friends,
I ran the Structure software and after I used your package adegenet to campare the values of FST. In both cases, I chose ONEFST=1 (STRUCTURE) and FSTonly=TRUE (ADEGENET).
Probably, the used methodology is a bit different but the result was similar.
My doubt is that in STRUCTURE I found in a triangle plot 3 clusters in differents extremes of triangle. And the found value of FST was close to 0.15, meaning that the populations are very close related.
I used K = 3 in Structure because I know that I have 3 populations very well caracterized (3 different breeds totally distint).
---------------
3
1 2
---------------
In my opinion, the values of FST are not matching with the found plot. Can anyone help me? Cheers, Ricardo
