[adegenet-forum] DAPC analyis and interpretation

Fri Jul 22 20:10:39 CEST 2011

Hi Elodie,

you seem to trust a lot the population labels for your samples. Which
criteria do you use in plants to assign pops? I am really curious to know it
(is that the sample location maybe?). I would say that the 11 SSR are
telling another story about the structure of these plants and consequently
about the ad-mixed individuals (if I didn't completely misunderstand what
you mean). Although I have no idea about the factors influencing dispersal
in this species and then even less about its dispersal behaviour, the
picture of "an invasive plant along a corridor of dispersal" suggests a
fairly messed up situation (more or less like a biologist seeking a postdoc,
you never know in which part of this bloody world you will end up...). If
you think previous labels are some way more reliable than the SSR, I
wouldn't know how to interpret this data apart from saying that you don't
have enough information in that (once excluding possible selective processes
since they are microsats). Otherwise you can consider challenging the labels
and trusting the data. I still think that a spatial principal component
analysis might help you to obtain an overall idea of the situation according
to the SSR. It's not a cluster analysis so it's not absolutely
straightforward to interpret for groups, but, at least, whether somewhere in
the data there are these 15 groups or an over-grouping of them, you might be
able to see it in one or few of the components (hopefully).

This is my opinion, but I am sure you can get more competent help on this
from someone else.

All the best

Valeria

On 22 July 2011 14:39, Elodie Blanchet <blanchet.elodie at gmail.com> wrote:

> **
> Hi Valeria,
> thanks a lot for your help!
>
> 1) Concerning the assignation of "actual" group to "assign" cluster, i
> don't expect to found 15 clusters, but that the majority of each "actual"
> population was assigned to "assign" cluster (i.e. 80% of "actual" population
> 1 is assigned to cluster A).
> I did not expect 15 clusters because i worked on an invasive plant along a
> corridor of dispersal.
>
> 2)Then concerning the individuals with low probability, i agree that it is
> normal to observe individual with low probability, but i wondered if i
> compared this second observation with the first (see above) what can i
> deduce about cluster revealed by the function?
>
> But, probably running "find.clusters" function with more iterations will
> able to obtain more consistent results.
>
> Thanks a lot  for your help.
> All the best.
> Elodie
>
>
> Le 22/07/2011 13:39, valeria montano a écrit :
>
> Hi Elodie,
>
>  I can try to give you a superficial opinion which I hope to be of some
> interest for you.
>
>  To obtain a consistent estimate of number of cluster you can try to
> increase the number of iteration (n.iter) and that should work out. I
> experimented the problem of a non consistent number of cluster when using a
> few components retained, but I assume you're retaining all the components.
>
>  When you say "actual groups" I guess that you expected to see your 15
> pops divided in 15 clusters.
>
>  If your pops are "actually" 15 pops, maybe your loci are not powerful
> enough to detect them. In any case, I wouldn't say that they are lying to
> you, it's merely the point of view of your 11 SSR.
> I would say that in general, population structure is a question of tones
> between complete isolation and panmixia. If analysing different sets of
> molecular data for the same sample, there is a concordant indication of
> structure, one can probably assume that is the best way to cluster the
> individuals and that probably mirror reality quite well.
> If you have other information that makes you be almost sure that your pops
> are 15 (I don't know, maybe something like: my pops are physically divided
> in 15 valleys, or other spatial information), you could try to run a sPCA.
> If you get a significant global structure (and there is the chance since
> you're working with nice plants and not stupid humans), you can see if one
> of the components gives you the expected 15 pops. Considering the result
> obtained with the DAPC, it won't probably be the first component, but maybe
> the second or the third...who knows...this could be a test to see if there
> is a global structure above the 15 pops and maybe your 15 is a kind of
> secondary structure (sorry, I am not explaining myself really well). In that
> case, you might be quite sure that your 15 SSR are giving you a good genetic
> point of view. Otherwise, if nothing that I've said will happen, you can
> only trust your 11 SSR and their clustering and try to find a good
> biological explanation to convince yourself and  the rest of world that your
> number of clusters is the best for you individuals, or type more markers...
>
>  Concerning the individuals with low probability, I have to confess that
> I've never worked at the individual level, but I imagine that it's perfectly
> normal to have those individuals in any cluster analysis. They might be
> hybrids, expression of the genetic/spatial continuity existing among natural
> pops.
>
>  I don't know what else to add...
>
>  good luck
>
>  Valeria
>
> On 21 July 2011 10:12, Elodie Blanchet <blanchet.elodie at gmail.com> wrote:
>
>>  Dear Dr. Jombart and Adegenet users,
>>
>>
>>
>> I have some questions about DAPC analysis.
>>
>> I worked on tetraploid plant, with 11 SSR markers, 15 populations sampled
>> with 30 individuals each.
>>
>>
>>
>> 1) When I ran ‘find.clusters’ function, elbow in the curve of BIC values
>> was not very clear so I ran it many time. But I obtained different optimal
>> number of cluster even if I increase “max.n.cluster” option.
>>
>>  I agree that it is made with Bayesian computation, but in this case how
>> can I choose the “best” optimal number of cluster?
>>
>> Maybe, these non-homogenous results between different runs are due to the
>> sampling pattern of my populations which were along a corridor (thus
>> suggesting a stepping-stone model of dispersal?)
>>
>>
>>
>>  2) Besides, if I took into account the most frequent “k” after ten runs
>> of “find.clusters” function (k=8), I observed that actual groups did not
>> correspond to inferred group. I mean that in the best case, only 17,5 % of
>> my actual group are inferred to clusters revealed by the analysis. Even if
>> individual posterior membership was upper than 75% in most of case, I did
>> not know if the genetic structure revealed by the analysis is supported or
>> not?
>>
>>
>>
>> 3) Moreover, some of the clusters revealed by the analysis, are made with
>> individuals having posterior membership probability <60%, how interpreting
>> these clusters? I would tend to run again the analysis and reduce “k”…?
>>
>>
>>
>>
>>
>>
>>
>> Sorry for this long mail, I hope it is sufficiently clear.
>>
>> Thanks in advance for your help.
>>
>> Elodie
>>
>> _______________________________________________
>> adegenet-forum mailing list
>> adegenet-forum at lists.r-forge.r-project.org
>>
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20110722/9fd4941f/attachment.htm>