[adegenet-forum] DAPC analyis and interpretation

Fri Jul 22 14:39:55 CEST 2011

Hi Valeria,
thanks a lot for your help!

1) Concerning the assignation of "actual" group to "assign" cluster, i 
don't expect to found 15 clusters, but that the majority of each 
"actual" population was assigned to "assign" cluster (i.e. 80% of 
"actual" population 1 is assigned to cluster A).
I did not expect 15 clusters because i worked on an invasive plant along 
a corridor of dispersal.

2)Then concerning the individuals with low probability, i agree that it 
is normal to observe individual with low probability, but i wondered if 
i compared this second observation with the first (see above) what can i 
deduce about cluster revealed by the function?

But, probably running "find.clusters" function with more iterations will 
able to obtain more consistent results.

Thanks a lot  for your help.
All the best.
Elodie

Le 22/07/2011 13:39, valeria montano a écrit :
> Hi Elodie,
>
> I can try to give you a superficial opinion which I hope to be of some 
> interest for you.
>
> To obtain a consistent estimate of number of cluster you can try to 
> increase the number of iteration (n.iter) and that should work out. I 
> experimented the problem of a non consistent number of cluster when 
> using a few components retained, but I assume you're retaining all the 
> components.
>
> When you say "actual groups" I guess that you expected to see your 15 
> pops divided in 15 clusters.
>
> If your pops are "actually" 15 pops, maybe your loci are not powerful 
> enough to detect them. In any case, I wouldn't say that they are lying 
> to you, it's merely the point of view of your 11 SSR.
> I would say that in general, population structure is a question of 
> tones between complete isolation and panmixia. If analysing different 
> sets of molecular data for the same sample, there is a concordant 
> indication of structure, one can probably assume that is the best way 
> to cluster the individuals and that probably mirror reality quite well.
> If you have other information that makes you be almost sure that your 
> pops are 15 (I don't know, maybe something like: my pops are 
> physically divided in 15 valleys, or other spatial information), you 
> could try to run a sPCA. If you get a significant global structure 
> (and there is the chance since you're working with nice plants and not 
> stupid humans), you can see if one of the components gives you the 
> expected 15 pops. Considering the result obtained with the DAPC, it 
> won't probably be the first component, but maybe the second or the 
> third...who knows...this could be a test to see if there is a global 
> structure above the 15 pops and maybe your 15 is a kind of secondary 
> structure (sorry, I am not explaining myself really well). In that 
> case, you might be quite sure that your 15 SSR are giving you a good 
> genetic point of view. Otherwise, if nothing that I've said will 
> happen, you can only trust your 11 SSR and their clustering and try to 
> find a good biological explanation to convince yourself and  the rest 
> of world that your number of clusters is the best for you individuals, 
> or type more markers...
>
> Concerning the individuals with low probability, I have to confess 
> that I've never worked at the individual level, but I imagine that 
> it's perfectly normal to have those individuals in any cluster 
> analysis. They might be hybrids, expression of the genetic/spatial 
> continuity existing among natural pops.
>
> I don't know what else to add...
>
> good luck
>
> Valeria
>
> On 21 July 2011 10:12, Elodie Blanchet <blanchet.elodie at gmail.com 
> <mailto:blanchet.elodie at gmail.com>> wrote:
>
>     Dear Dr. Jombart and Adegenet users,
>
>     I have some questions about DAPC analysis.
>
>     I worked on tetraploid plant, with 11 SSR markers, 15 populations
>     sampled with 30 individuals each.
>
>     1) When I ran ‘find.clusters’ function, elbow in the curve of BIC
>     values was not very clear so I ran it many time. But I obtained
>     different optimal number of cluster even if I increase
>     “max.n.cluster” option.
>
>     I agree that it is made with Bayesian computation, but in this
>     case how can I choose the “best” optimal number of cluster?
>
>     Maybe, these non-homogenous results between different runs are due
>     to the sampling pattern of my populations which were along a
>     corridor (thus suggesting a stepping-stone model of dispersal?)
>
>     2) Besides, if I took into account the most frequent “k” after ten
>     runs of “find.clusters” function (k=8), I observed that actual
>     groups did not correspond to inferred group. I mean that in the
>     best case, only 17,5 % of my actual group are inferred to clusters
>     revealed by the analysis. Even if individual posterior membership
>     was upper than 75% in most of case, I did not know if the genetic
>     structure revealed by the analysis is supported or not?
>
>     3) Moreover, some of the clusters revealed by the analysis, are
>     made with individuals having posterior membership probability
>     <60%, how interpreting these clusters? I would tend to run again
>     the analysis and reduce “k”…?
>
>     Sorry for this long mail, I hope it is sufficiently clear.
>
>     Thanks in advance for your help.
>
>     Elodie
>
>
>     _______________________________________________
>     adegenet-forum mailing list
>     adegenet-forum at lists.r-forge.r-project.org
>     <mailto:adegenet-forum at lists.r-forge.r-project.org>
>     https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20110722/44d70a6c/attachment-0001.htm>