[adegenet-forum] Population clustering idea

Wed May 4 15:18:21 CEST 2011

Hi all

I think one of the most importance differences to consider when comparing
Fst with DAPC is about the way a pop is defined. The link Vladimir sent
about the use in molecular anthropology of previously labelled pops is a
really good example of a non-genetic based approach of pops definition,
which as such probably implies erroneous assumptions by default
(unfortunately for me I have collected some experience with that). On the
contrary, starting from individuals to infer which is the best number of
clusters (can we then say pops, maybe?) based on the allelic variance
optimization is definitely better, at least is genetics. Once this is done,
a "pops tree" obtained with an Fst or a DAPC would probably result quite
similar (without forgetting that the variance decomposition is differently
interpreted by the two approaches as explained by Thibaut and amke different
assumptions as stated by Vladimir). But I've never tested it actually...

In general, I guess the clusters could be regarded as pops, but this
is presumably quite far from defining panmictic groups. So, biologically
speaking, I don't know what's the best way to consider the clusters. That
would be great to have a method to define panmictic groups without needing
to test an explicit biological model...but clustering is really useful.

Best regards

Valeria

2011/5/4 Jombart, Thibaut <t.jombart at imperial.ac.uk>

>  Hi there,
>
> I don't think there is actually a problem in using Fst in this case. Even
> if HWE assumption does not hold, it can be used as a between-groups distance
> measure. It is actually very closely related to the quantity optimised by
> DAPC. Fst is (between-group variance)/(total variance), while DAPC optimizes
> (between-group variance)/(within-group variance). However, any other
> distance measure (e.g. implemented in dist.genpop) can be used.
>
> I think one of the main interests of representing the between-group
> distances on DAPC scatterplot is that in some cases, especially in
> lower-order axes, coordinates might not fully display the relationships
> between groups. For instance, imagine a structure with 6 populations in
> three islands (a,b,c),(d,e),(f), assuming (f) is more distant to the other
> two islands. One axis might emphasize (a,b,c) vs (d,e), and (f) could fall
> close to the origin. Representing a minimum spanning tree based on
> between-population distances will remind us that (f) is fairly isolated, and
> prevent the naive interpretation that it is related to both (a,b,c) and
> (d,e).
>
> Cheers
>
> Thibaut
>
>
>  ------------------------------
> *From:* Vladimir Mikryukov [vmikryukov at gmail.com]
> *Sent:* 04 May 2011 08:23
> *To:* adegenet forum
> *Cc:* Mac Campbell; Jombart, Thibaut
>
> *Subject:* Re: [adegenet-forum] Population clustering idea
>
>  Hello,
> Please correct me if I'm wrong,
> but I think that viewing population differentiation with Fst has many
> limitations as well.
> Why one should switch from a more robust method (DAPC doesn't care about
> Hardy-Weinberg equilibrium and linkage disequilibrium, isn't it?) to the
> other (Fst) approach?
> Probably it's possible to utilize obtained principal component scores for
> that?
> Or this method will overestimate the differentiation?
>
> Using other genetic distance measures (especially those which assume
> particular mutation model, i.e. IAM or SSM for microsatellites) for the
> real data could be tricky as well.
>
> Vladimir.
>
>
> PS. a brief summary of Fst's assumptions one may find here:
>
> https://anthrogenetics.wordpress.com/2010/10/11/problems-with-fst-based-methods-human-populations-violate-important-assumptions/
>
> Or at least I'll suggest to use bias-corrected differentiation index (Dest)
> like in DEMEtics package (see reference). However, in my practice usually it
> is highly correlated with Fst (Mantel's r = 0.7 - 0.96)
>
> Gerlach G., Jueterbock A., Kraemer P., Deppermann J., Harmand P.
> Calculations of population differentiation based on Gst and D: forget Gst
> but not all of statistics! // Molecular Ecology. 2010. V. 19. No. 18. P.
> 3845-3852.
>
>
> On Tue, May 3, 2011 at 10:53 PM, Mac Campbell <macampbell2 at alaska.edu>wrote:
>
>> Hi,
>>
>>  Yes, I agree there are many limitations to viewing populations in a tree
>> like perspective.  Initially, I was interested in quantifying how far apart
>> the groups are on a scatter plot because it was hard to tell.  I think the
>> code Vladimir sent me does just that, at least it tells me which ones are
>> closer to each other.
>>
>>  It will be cool to have a more biologically significant (Fst based)  way
>> implemented.  One thing that came to mind too was if I wanted to use
>> something like IMa2, I would need to have an assumption in tree form of how
>> the populations are related.
>>
>>  Mac
>>
>> On Sat, Apr 30, 2011 at 8:56 AM, Jombart, Thibaut <
>> t.jombart at imperial.ac.uk> wrote:
>>
>>>  Hello,
>>>
>>> that's a good question. Actually I thought about implementing something
>>> along these lines for the dapc scatterplot. I agree with Russell's point
>>> that relationships between populations are not necessarily best presented by
>>> fully bifurcating trees. However, linking the populations which are the
>>> closest according to a given distance measure (e.g. Fst ) does make sense. I
>>> would go for a minimum spanning tree, which is a nice way of showing which
>>> are the closest neighbours in terms of genetic distances. It won't be too
>>> much of a pain to code either.
>>>
>>> I will be working on the next adegenet release over the weeks to come, so
>>> will probably give it a go soon.
>>>
>>> Cheers
>>>
>>> Thibaut
>>>
>>
>
> _______________________________________________
> adegenet-forum mailing list
> adegenet-forum at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20110504/12bf5364/attachment.htm>