[adegenet-forum] Question about genetic structure in admixed populations

Valeria Montano mirainoshojo at gmail.com
Sun Sep 8 20:44:07 CEST 2013


Hi Jutta!

well, ehm...sooo,

you already know about the limitations of Structure and, in general,
bayesian approaches to cluster analysis.

For what matters, I can give you my opinion/suggestion in brief:

1) Structure and DAPC can give different results in several cases,
depending on the evolutionary processes ongoing among specific inds/pops. I
wish they always agreed - that would make our lives happier. In general,
relying on a method rather than another is a decision that can be made
based on the knowledge of the models assumed in different approaches and
their limitations, and certainly the feeling you have about your case study
given all the results you already got. Personally, I never take a best k
out of the find.clusters unless the BIC shows a very clear cut-off (i.e.
the curve nicely rising up after a certain K), but this is really a
personal standard.

2) My understanding of the distribution of continuous populations (as this
is seems to be the case of your data) is that there is actually no best
clustering one can do. When the spatial distribution of the allele
frequencies is organized in gradients or clines, the clusters are not the
best tool to use to describe the data. That is why a method such as BAPS is
useful. GENELAND is cool too, but there is no explicit modelling of
gradients, plus the integration of the spatial info has never been totally
clear to me. I find BAPS and TESS more straightforward. In this sense, they
are good approaches to optimize a number of "clusters" although what you
find out cannot be really called clusters (in the structure or dapc
meaning).

It took me a while to learn how to manage the sense of panic/disorientation
provoked by the absence of best clustering in some genetic datasets, but
afterwards I even developed a preference for gradients, although I admit
clusters are very useful.

Hope this is somehow useful
Best wishes

Valeria


On 8 September 2013 15:35, Jutta Geismar <Jutta.Geismar at senckenberg.de>wrote:

>  Dear Valeria,******
>
>  ****
>
> thank you very much for your quick answer. I’m aware of the problems
> STUCTURE has to analyze genetic data of continuous populations (see also
> http://onlinelibrary.wiley.com/doi/10.1111/j.1365-2664.2008.01606.x/pdf).
> That is one reason I don’t want to use STUCTURE as the only cluster
> analysis.  I haven’t attempted to use BAPS yet, but I gave GENELAND a
> trial to include spatial information. Besides testing for IBD with a Mantel
> test, I also modified the geographic distances by resistance values etc. I
> inferred from a SDM. A spatial autocorrelations didn’t show a clear pattern
> of spatial relation (also in different distance classes).  A PCA
> indicates a big cloud around the center point. Each of the first two axes
> explained about 19 % of the variance.****
>
> Thanks to assure the correctness of my DAPC script. I set the maximum
> number of clusters to 50 to exclude a missing of structural shifts.****
>
> Nonetheless, I cannot explain the contrary results of structure indicating
> a panmictic population (4 parallel stripes) and DAPC assigning most
> individuals to one specific cluster. ****
>
> Thanks again for your comments. I will have a look at BAPS.****
>
> Best wishes, ****
>
> Jutta****
> >>> Valeria Montano <mirainoshojo at gmail.com> 9/5/2013 10:59 >>>
>  Dear Jutta,
>
> cluster analysis can be tricky when the samples analysed are distributed
> along a gradient and if there is no clear-cut subdivision, this can lead to
> contradictory results (have a look at this paper
> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.192.3029&rep=rep1&type=pdf).
> You may want to consider using TESS or BAPS with the admixture model
> option. These two software allow including the geographic coordinates as a
> prior information and the admixture model is a way to model spatial
> gradients. If you tested the IBD with a Mantel test, just be careful that a
> significant mantel test is not directly due to IBD, geo to gen correlation
> can be significant for different spatial/migratory schemes. I think your
> DAPC is ok, a part from the fact that there is no need to use the
> find.clusters with the number of PCs indicated by the optim.a.score. This
> procedure is used to optimize the discriminant space among clusters in the
> DAPC. To assign individuals to clusters you can simply retrieve all the
> variance (even though in your case is almost the same given that you have
> 98%). Only thing, I would try with max number of clusters around 20, more
> than your sampling locations. You can also give sPCA a try.
>
> Hope this helps
>
> Ciao
>
> Valeria
>
>
> On 4 September 2013 15:03, Jutta Geismar <Jutta.Geismar at senckenberg.de>wrote:
>
>>  Dear Mr Jombart and DAPC users,******
>>
>> ****
>>
>> I used DAPC to analyze genetic structure in a small region with 20
>> microsatellite markers. I analyzed 330 individuals (14 sampling sites) and
>> found little genetic differences (FST, D Jost), but a significant isolation
>> by distance pattern. A cluster analysis in STRUCTURE resulted in four
>> clusters (STRUCTURE Harvester) but all individuals had more or less equal
>> posterior probability in all of the four inferred clusters. Therefore I
>> assume a panmictic population structure. Since STRUCTURE is known for some
>> problems analyzing datasets under IBD I analyzed the data with DAPC. DAPC
>> resulted in 3 or 4 clusters (and tested up until K=7 to be sure), but in
>> both cases these were randomly distributed among all individuals without a
>> geographic context. Only 94 individuals were not assigned to one cluster
>> with more than 90% and therefore would be counted as “admixed” (example in
>> DAPC tutorial). For me the results of STRUCTURE and DAPC are in conflict to
>> each other, but I don’t know how a panmictic population would look like in
>> DAPC. Distances between sites are small and it is very likely that gene
>> flow occurs among my sampling points, which might cause problems in genetic
>> cluster analyses. I don’t know if I made any mistake in my thinking, that’s
>> why I want to explain my procedure briefly:****
>>
>> 1. I used dapc and chose 1/3 of the sample size as PC (as suggested) and
>> counted DAs in the plot (100% of the variability was included, 110 PC, 13
>> DA)****
>>
>> 2. To reduce variability I used optim.a.score (smart FALSE). The best
>> a-score was around 0.2 (PC 61)****
>>
>> 3. After that I wanted to estimate the number of clusters by
>> find.clusters and used the a-score as number of PCs and repeated the dapc
>> (conserved variance was still 98%, 61 PCs, 2 DA) ****
>>
>> I chose k in the BIC values after which the decrease was less compared to
>> the previous, but not the lowest k.****
>>
>> If I have some mistakes in my procedure I would appreciate some advice.
>> But also if the procedure is okay I cannot explain the contrariness of
>> these two analyses. ****
>>
>> Thanks a lot in advance for some help.****
>>
>> Jutta Geismar ****
>>
>> PhD student
>>
>> Germany****
>>
>> _______________________________________________
>> adegenet-forum mailing list
>> adegenet-forum at lists.r-forge.r-project.org
>>
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20130908/95373a63/attachment.html>


More information about the adegenet-forum mailing list