[adegenet-forum] adegenet-forum Digest, Vol 61, Issue 4

Frederik Van den Broeck Frederik.VandenBroeck at bio.kuleuven.be
Mon Sep 9 13:33:43 CEST 2013

Dear Jutta,

Did you already try to use individual based distance methods (which I prefer in most cases) such as the inverse proportion of shared alleles or euclidean distances? Did you try to do a PCA analysis? All this can be quickly done in adegenet and will give you major insight in the structure of your data. Another software to study genetic structure I also like a lot is SPAGeDi (http://ebe.ulb.ac.be/ebe/SPAGeDi.html).
I know this doesn't answer your questions, but I merely wanted to mention some alternatives to cluster analysis that could also give you insight into population structure.

Kind regards

From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of adegenet-forum-request at lists.r-forge.r-project.org [adegenet-forum-request at lists.r-forge.r-project.org]
Sent: Monday, September 09, 2013 12:00 PM
To: adegenet-forum at lists.r-forge.r-project.org
Subject: adegenet-forum Digest, Vol 61, Issue 4

Send adegenet-forum mailing list submissions to
        adegenet-forum at lists.r-forge.r-project.org

To subscribe or unsubscribe via the World Wide Web, visit

or, via email, send a message with subject or body 'help' to
        adegenet-forum-request at lists.r-forge.r-project.org

You can reach the person managing the list at
        adegenet-forum-owner at lists.r-forge.r-project.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of adegenet-forum digest..."

Today's Topics:

   1. Re: Question about genetic structure in admixed   populations
      (Valeria Montano)


Message: 1
Date: Sun, 8 Sep 2013 20:44:07 +0200
From: Valeria Montano <mirainoshojo at gmail.com>
To: Jutta Geismar <Jutta.Geismar at senckenberg.de>
Cc: "adegenet-forum at lists.r-forge.r-project.org"
        <adegenet-forum at lists.r-forge.r-project.org>
Subject: Re: [adegenet-forum] Question about genetic structure in
        admixed populations
        <CADEmh=sVic5YSnRFGA6Cb8Z=9jWhWG=SJV3QjFLGUNBuzD8j5Q at mail.gmail.com>
Content-Type: text/plain; charset="windows-1252"

Hi Jutta!

well, ehm...sooo,

you already know about the limitations of Structure and, in general,
bayesian approaches to cluster analysis.

For what matters, I can give you my opinion/suggestion in brief:

1) Structure and DAPC can give different results in several cases,
depending on the evolutionary processes ongoing among specific inds/pops. I
wish they always agreed - that would make our lives happier. In general,
relying on a method rather than another is a decision that can be made
based on the knowledge of the models assumed in different approaches and
their limitations, and certainly the feeling you have about your case study
given all the results you already got. Personally, I never take a best k
out of the find.clusters unless the BIC shows a very clear cut-off (i.e.
the curve nicely rising up after a certain K), but this is really a
personal standard.

2) My understanding of the distribution of continuous populations (as this
is seems to be the case of your data) is that there is actually no best
clustering one can do. When the spatial distribution of the allele
frequencies is organized in gradients or clines, the clusters are not the
best tool to use to describe the data. That is why a method such as BAPS is
useful. GENELAND is cool too, but there is no explicit modelling of
gradients, plus the integration of the spatial info has never been totally
clear to me. I find BAPS and TESS more straightforward. In this sense, they
are good approaches to optimize a number of "clusters" although what you
find out cannot be really called clusters (in the structure or dapc

It took me a while to learn how to manage the sense of panic/disorientation
provoked by the absence of best clustering in some genetic datasets, but
afterwards I even developed a preference for gradients, although I admit
clusters are very useful.

Hope this is somehow useful
Best wishes


On 8 September 2013 15:35, Jutta Geismar <Jutta.Geismar at senckenberg.de>wrote:

>  Dear Valeria,******
>  ****
> thank you very much for your quick answer. I?m aware of the problems
> STUCTURE has to analyze genetic data of continuous populations (see also
> http://onlinelibrary.wiley.com/doi/10.1111/j.1365-2664.2008.01606.x/pdf).
> That is one reason I don?t want to use STUCTURE as the only cluster
> analysis.  I haven?t attempted to use BAPS yet, but I gave GENELAND a
> trial to include spatial information. Besides testing for IBD with a Mantel
> test, I also modified the geographic distances by resistance values etc. I
> inferred from a SDM. A spatial autocorrelations didn?t show a clear pattern
> of spatial relation (also in different distance classes).  A PCA
> indicates a big cloud around the center point. Each of the first two axes
> explained about 19 % of the variance.****
> Thanks to assure the correctness of my DAPC script. I set the maximum
> number of clusters to 50 to exclude a missing of structural shifts.****
> Nonetheless, I cannot explain the contrary results of structure indicating
> a panmictic population (4 parallel stripes) and DAPC assigning most
> individuals to one specific cluster. ****
> Thanks again for your comments. I will have a look at BAPS.****
> Best wishes, ****
> Jutta****
> >>> Valeria Montano <mirainoshojo at gmail.com> 9/5/2013 10:59 >>>
>  Dear Jutta,
> cluster analysis can be tricky when the samples analysed are distributed
> along a gradient and if there is no clear-cut subdivision, this can lead to
> contradictory results (have a look at this paper
> http://citeseerx.ist.psu.edu/viewdoc/download?doi=
> You may want to consider using TESS or BAPS with the admixture model
> option. These two software allow including the geographic coordinates as a
> prior information and the admixture model is a way to model spatial
> gradients. If you tested the IBD with a Mantel test, just be careful that a
> significant mantel test is not directly due to IBD, geo to gen correlation
> can be significant for different spatial/migratory schemes. I think your
> DAPC is ok, a part from the fact that there is no need to use the
> find.clusters with the number of PCs indicated by the optim.a.score. This
> procedure is used to optimize the discriminant space among clusters in the
> DAPC. To assign individuals to clusters you can simply retrieve all the
> variance (even though in your case is almost the same given that you have
> 98%). Only thing, I would try with max number of clusters around 20, more
> than your sampling locations. You can also give sPCA a try.
> Hope this helps
> Ciao
> Valeria
> On 4 September 2013 15:03, Jutta Geismar <Jutta.Geismar at senckenberg.de>wrote:
>>  Dear Mr Jombart and DAPC users,******
>> ****
>> I used DAPC to analyze genetic structure in a small region with 20
>> microsatellite markers. I analyzed 330 individuals (14 sampling sites) and
>> found little genetic differences (FST, D Jost), but a significant isolation
>> by distance pattern. A cluster analysis in STRUCTURE resulted in four
>> clusters (STRUCTURE Harvester) but all individuals had more or less equal
>> posterior probability in all of the four inferred clusters. Therefore I
>> assume a panmictic population structure. Since STRUCTURE is known for some
>> problems analyzing datasets under IBD I analyzed the data with DAPC. DAPC
>> resulted in 3 or 4 clusters (and tested up until K=7 to be sure), but in
>> both cases these were randomly distributed among all individuals without a
>> geographic context. Only 94 individuals were not assigned to one cluster
>> with more than 90% and therefore would be counted as ?admixed? (example in
>> DAPC tutorial). For me the results of STRUCTURE and DAPC are in conflict to
>> each other, but I don?t know how a panmictic population would look like in
>> DAPC. Distances between sites are small and it is very likely that gene
>> flow occurs among my sampling points, which might cause problems in genetic
>> cluster analyses. I don?t know if I made any mistake in my thinking, that?s
>> why I want to explain my procedure briefly:****
>> 1. I used dapc and chose 1/3 of the sample size as PC (as suggested) and
>> counted DAs in the plot (100% of the variability was included, 110 PC, 13
>> DA)****
>> 2. To reduce variability I used optim.a.score (smart FALSE). The best
>> a-score was around 0.2 (PC 61)****
>> 3. After that I wanted to estimate the number of clusters by
>> find.clusters and used the a-score as number of PCs and repeated the dapc
>> (conserved variance was still 98%, 61 PCs, 2 DA) ****
>> I chose k in the BIC values after which the decrease was less compared to
>> the previous, but not the lowest k.****
>> If I have some mistakes in my procedure I would appreciate some advice.
>> But also if the procedure is okay I cannot explain the contrariness of
>> these two analyses. ****
>> Thanks a lot in advance for some help.****
>> Jutta Geismar ****
>> PhD student
>> Germany****
>> _______________________________________________
>> adegenet-forum mailing list
>> adegenet-forum at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20130908/95373a63/attachment-0001.html>


adegenet-forum mailing list
adegenet-forum at lists.r-forge.r-project.org

End of adegenet-forum Digest, Vol 61, Issue 4

More information about the adegenet-forum mailing list