[adegenet-forum] Population clustering idea
t.jombart at imperial.ac.uk
Wed May 4 11:37:59 CEST 2011
I don't think there is actually a problem in using Fst in this case. Even if HWE assumption does not hold, it can be used as a between-groups distance measure. It is actually very closely related to the quantity optimised by DAPC. Fst is (between-group variance)/(total variance), while DAPC optimizes (between-group variance)/(within-group variance). However, any other distance measure (e.g. implemented in dist.genpop) can be used.
I think one of the main interests of representing the between-group distances on DAPC scatterplot is that in some cases, especially in lower-order axes, coordinates might not fully display the relationships between groups. For instance, imagine a structure with 6 populations in three islands (a,b,c),(d,e),(f), assuming (f) is more distant to the other two islands. One axis might emphasize (a,b,c) vs (d,e), and (f) could fall close to the origin. Representing a minimum spanning tree based on between-population distances will remind us that (f) is fairly isolated, and prevent the naive interpretation that it is related to both (a,b,c) and (d,e).
From: Vladimir Mikryukov [vmikryukov at gmail.com]
Sent: 04 May 2011 08:23
To: adegenet forum
Cc: Mac Campbell; Jombart, Thibaut
Subject: Re: [adegenet-forum] Population clustering idea
Please correct me if I'm wrong,
but I think that viewing population differentiation with Fst has many limitations as well.
Why one should switch from a more robust method (DAPC doesn't care about Hardy-Weinberg equilibrium and linkage disequilibrium, isn't it?) to the other (Fst) approach?
Probably it's possible to utilize obtained principal component scores for that?
Or this method will overestimate the differentiation?
Using other genetic distance measures (especially those which assume particular mutation model, i.e. IAM or SSM for microsatellites) for the real data could be tricky as well.
PS. a brief summary of Fst's assumptions one may find here:
Or at least I'll suggest to use bias-corrected differentiation index (Dest) like in DEMEtics package (see reference). However, in my practice usually it is highly correlated with Fst (Mantel's r = 0.7 - 0.96)
Gerlach G., Jueterbock A., Kraemer P., Deppermann J., Harmand P. Calculations of population differentiation based on Gst and D: forget Gst but not all of statistics! // Molecular Ecology. 2010. V. 19. № 18. P. 3845-3852.
On Tue, May 3, 2011 at 10:53 PM, Mac Campbell <macampbell2 at alaska.edu<mailto:macampbell2 at alaska.edu>> wrote:
Yes, I agree there are many limitations to viewing populations in a tree like perspective. Initially, I was interested in quantifying how far apart the groups are on a scatter plot because it was hard to tell. I think the code Vladimir sent me does just that, at least it tells me which ones are closer to each other.
It will be cool to have a more biologically significant (Fst based) way implemented. One thing that came to mind too was if I wanted to use something like IMa2, I would need to have an assumption in tree form of how the populations are related.
On Sat, Apr 30, 2011 at 8:56 AM, Jombart, Thibaut <t.jombart at imperial.ac.uk<mailto:t.jombart at imperial.ac.uk>> wrote:
that's a good question. Actually I thought about implementing something along these lines for the dapc scatterplot. I agree with Russell's point that relationships between populations are not necessarily best presented by fully bifurcating trees. However, linking the populations which are the closest according to a given distance measure (e.g. Fst ) does make sense. I would go for a minimum spanning tree, which is a nice way of showing which are the closest neighbours in terms of genetic distances. It won't be too much of a pain to code either.
I will be working on the next adegenet release over the weeks to come, so will probably give it a go soon.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the adegenet-forum