[adegenet-forum] Antw: Re: Question about genetic structure in admixed populations

Sun Sep 8 15:35:16 CEST 2013

Dear Valeria,

thank you very much for your quick answer. I’m aware of the problems
STUCTURE has to analyze genetic data of continuous populations (see also
http://onlinelibrary.wiley.com/doi/10.1111/j.1365-2664.2008.01606.x/pdf).
That is one reason I don’t want to use STUCTURE as the only cluster
analysis.  I haven’t attempted to use BAPS yet, but I gave GENELAND a
trial to include spatial information. Besides testing for IBD with a
Mantel test, I also modified the geographic distances by resistance
values etc. I inferred from a SDM. A spatial autocorrelations didn’t
show a clear pattern of spatial relation (also in different distance
classes).  A PCA indicates a big cloud around the center point. Each of
the first two axes explained about 19 % of the variance.
Thanks to assure the correctness of my DAPC script. I set the maximum
number of clusters to 50 to exclude a missing of structural shifts.
Nonetheless, I cannot explain the contrary results of structure
indicating a panmictic population (4 parallel stripes) and DAPC
assigning most individuals to one specific cluster. 
Thanks again for your comments. I will have a look at BAPS.
Best wishes, 
Jutta
>>> Valeria Montano <mirainoshojo at gmail.com> 9/5/2013 10:59 >>>
Dear Jutta,

cluster analysis can be tricky when the samples analysed are
distributed along a gradient and if there is no clear-cut subdivision,
this can lead to contradictory results (have a look at this paper
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.192.3029&rep=rep1&type=pdf).
You may want to consider using TESS or BAPS with the admixture model
option. These two software allow including the geographic coordinates as
a prior information and the admixture model is a way to model spatial
gradients. If you tested the IBD with a Mantel test, just be careful
that a significant mantel test is not directly due to IBD, geo to gen
correlation can be significant for different spatial/migratory schemes.
I think your DAPC is ok, a part from the fact that there is no need to
use the find.clusters with the number of PCs indicated by the
optim.a.score. This procedure is used to optimize the discriminant space
among clusters in the DAPC. To assign individuals to clusters you can
simply retrieve all the variance (even though in your case is almost the
same given that you have 98%). Only thing, I would try with max number
of clusters around 20, more than your sampling locations. You can also
give sPCA a try.

Hope this helps

Ciao

Valeria

On 4 September 2013 15:03, Jutta Geismar <Jutta.Geismar at senckenberg.de>
wrote:

Dear Mr Jombart and DAPC users,

I used DAPC to analyze genetic structure in a small region with 20
microsatellite markers. I analyzed 330 individuals (14 sampling sites)
and found little genetic differences (FST, D Jost), but a significant
isolation by distance pattern. A cluster analysis in STRUCTURE resulted
in four clusters (STRUCTURE Harvester) but all individuals had more or
less equal posterior probability in all of the four inferred clusters.
Therefore I assume a panmictic population structure. Since STRUCTURE is
known for some problems analyzing datasets under IBD I analyzed the data
with DAPC. DAPC resulted in 3 or 4 clusters (and tested up until K=7 to
be sure), but in both cases these were randomly distributed among all
individuals without a geographic context. Only 94 individuals were not
assigned to one cluster with more than 90% and therefore would be
counted as “admixed” (example in DAPC tutorial). For me the results of
STRUCTURE and DAPC are in conflict to each other, but I don’t know how a
panmictic population would look like in DAPC. Distances between sites
are small and it is very likely that gene flow occurs among my sampling
points, which might cause problems in genetic cluster analyses. I don’t
know if I made any mistake in my thinking, that’s why I want to explain
my procedure briefly:
1. I used dapc and chose 1/3 of the sample size as PC (as suggested)
and counted DAs in the plot (100% of the variability was included, 110
PC, 13 DA)
2. To reduce variability I used optim.a.score (smart FALSE). The best
a-score was around 0.2 (PC 61)
3. After that I wanted to estimate the number of clusters by
find.clusters and used the a-score as number of PCs and repeated the
dapc (conserved variance was still 98%, 61 PCs, 2 DA) 
I chose k in the BIC values after which the decrease was less compared
to the previous, but not the lowest k.
If I have some mistakes in my procedure I would appreciate some advice.
But also if the procedure is okay I cannot explain the contrariness of
these two analyses. 
Thanks a lot in advance for some help.
Jutta Geismar 
PhD student
Germany

_______________________________________________
adegenet-forum mailing list
adegenet-forum at lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20130908/fc82b5d5/attachment-0001.html>