<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    Dear Thibaut and all<br>

    resuming the earlier discussion (mails below for reference) :<br>

    I want to narrow it down a little bit; what could be the causal

    factor/s for this pattern ..as you already mentioned that this is

    mostly visible in IBD (see your mail below), where it fails to find

    any clusters or would it be possible for high gene flow among

    populations, so all of them are quite mixed up and showing up no

    signature of clusters; since both scenarios are true at least to

    some extent with my data; <br>

    so to summarize, what would I consider?<br>

    thanks in advance<br>

    cheers<br>

    AVIK<br>

    &nbsp;&nbsp; <br>

    <br>

    <br>

    On 7/5/2011 2:37 PM, Jombart, Thibaut wrote:

    <blockquote

      cite="mid:2CB2DA8E426F3541AB1907F98ABA65700E9D36BE@icexch-m1.ic.ac.uk"

      type="cite">

      <meta http-equiv="Content-Type" content="text/html;

        charset=ISO-8859-1">

      <style id="owaParaStyle" type="text/css">P {margin-top:0;margin-bottom:0;}</style>

      <div style="direction: ltr;font-family: Tahoma;color:

        #000000;font-size: 10pt;">Hello,

        <br>

        <br>

        actually I doubt there is ever a true K in real biological data,

        if only for the fact that there is no clear definition of

        'genetic clusters'. What we consider as "clusters" are models of

        reality, and so false by definition.

        <br>

        <br>

        Anyway. In your case I would stick to BIC-based choice of K. The

        reason for this is that DAPC scatterplots show you only a few

        dimensions, while k-means+BIC takes much more (if not all,

        depending on how many PCs retained) of the genetic information

        into account.<br>

        <br>

        Cheers<br>

        <br>

        Thibaut<br>

        <div style="font-family: Times New Roman; color: rgb(0, 0, 0);

          font-size: 16px;">

          <hr tabindex="-1">

          <div style="direction: ltr;" id="divRpF996999"><font

              color="#000000" face="Tahoma" size="2"><b>From:</b> AVIK

              RAY [<a class="moz-txt-link-abbreviated" href="mailto:avik.ray.kol@gmail.com">avik.ray.kol@gmail.com</a>]<br>

              <b>Sent:</b> 05 July 2011 07:33<br>

              <b>To:</b> Jombart, Thibaut<br>

              <b>Subject:</b> Re: [adegenet-forum] PCA query?<br>

            </font><br>

          </div>

          <div>Dear Thibaut<br>

            It is quite unlikely that there is no true K ! <br>

            if so, then how can I account for the quite divergent

            clusters obtained in DAPC analysis, refer to the images

            attached; say in the image

            <u>DAPC clust 6</u> - clusters 2, cluster 3 and cluster

            4,5,1&nbsp; are quite divergent genetic groups it seems, even 6

            is well separated&nbsp; from 2 and 3; similarly in the image

            <u>DAPC cluster 8</u>- clusters 3,4,7 and 3,8 and 2,6 are

            widely divergent (however, if you compare both these it

            appears both very similar except some clusters are breaking

            into sub clusters which is quite reasonable)

            <br>

            I think it (in my case) may be wise to optimize number of

            clusters by looking at BIC curve as well as cluster diagram

            considering highly divergent clusters<br>

            what do you think?<br>

            <br>

            cheers<br>

            <br>

            AVIK <br>

            <br>

            <br>

            On 6/22/2011 2:49 PM, Jombart, Thibaut wrote:

            <blockquote type="cite">

              <pre>Dear Avik, 

the BIC plot you sent resembles what we usually get under IBD models. In this case, it is not surprising that STRUCTURE identifies less clusters than DAPC (see the paper, STRUCTURE basically failed to identify clusters under the IBD model).

There is probably no "true k", but just a choice of a number of groups useful to summarize the data. You may want to have a look at the section "how many clusters..." in the DAPC vignette, online in "Documents" on the website.

Cheers

Thibaut 

________________________________________

From: AVIK RAY [<a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:avik.ray.kol@gmail.com" target="_blank">avik.ray.kol@gmail.com</a>]

Sent: 21 June 2011 19:08

To: Jombart, Thibaut; <a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:adegenet-forum@r-forge.wu-wien.ac.at" target="_blank">adegenet-forum@r-forge.wu-wien.ac.at</a>

Subject: Re: [adegenet-forum] PCA query?

Dear Thibaut

Thanks for very effective reply; it seems DAPC is more suitable for my

dataset and for the question I'm looking at!

I did few mock runs to see the very initial results, and the BIC curve

shows gradual leveling off after K=9 it seems, however from STRUCTURE

(Bayesian) and FLOCK (Max Likelihood) number of putative clusters

appears to be 2/3; so wondering what made this difference? or I am

wrongly interpreting it ! ....anyways my dataset contains lot of missing

data, does that matter much, shall I remove those and then try!

I am attaching BIC and retained PC curves for reference

Thanks

cheers

AVIK

On 6/20/2011 6:58 PM, Jombart, Thibaut wrote:

</pre>

              <blockquote type="cite">

                <pre>Hello,

in none, as far as PCoA / MDS are concerned, they do the same as PCA, but just allow for using fancier Euclidean distances. Loosing information in terms of total variance does not necessarily imply loosing information in terms of group discrimination. But if you're looking for clusters, you don't necessarily need to reduce the dimensionality of the data - most clustering algorithm don't.

Please have a look at the DAPC paper which is really on these topics. You may also be interested in the DAPC vignette for the next release of adegenet.

DAPC paper is here:

<a moz-do-not-send="true" class="moz-txt-link-freetext" href="http://www.biomedcentral.com/1471-2156/11/94" target="_blank">http://www.biomedcentral.com/1471-2156/11/94</a>

DAPC vignette is there:

<a moz-do-not-send="true" class="moz-txt-link-freetext" href="http://adegenet.r-forge.r-project.org/files/adegenet-dapc.pdf" target="_blank">http://adegenet.r-forge.r-project.org/files/adegenet-dapc.pdf</a>

Cheers

Thibaut

________________________________________

From: <a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:adegenet-forum-bounces@r-forge.wu-wien.ac.at" target="_blank">adegenet-forum-bounces@r-forge.wu-wien.ac.at</a> [<a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:adegenet-forum-bounces@r-forge.wu-wien.ac.at" target="_blank">adegenet-forum-bounces@r-forge.wu-wien.ac.at</a>] on behalf of AVIK RAY [<a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:avik.ray.kol@gmail.com" target="_blank">avik.ray.kol@gmail.com</a>]

Sent: 20 June 2011 13:12

To: <a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:adegenet-forum@r-forge.wu-wien.ac.at" target="_blank">adegenet-forum@r-forge.wu-wien.ac.at</a>

Subject: [adegenet-forum] PCA query?

Hi all

bit of confusion with PCA in general, I did PCA in adegenet and it has

shown some plot with multiple clusters. My data is tetraploid

microsatellite data and I need to find out potential clusters i.e. some

individuals are more similar than others with allele data. But If not

mistaken PCA converts allele information into some synthetic variable

and does clustering where we tend to loose out lot of information since

it will select most but not all alleles; so in that sense does PCoA/

Multidimentional scaling or simply clustering analysis (e.g. K means or

hierarchical clustering) make more sense?

Thanks in advance for reply

AVIK

_______________________________________________

adegenet-forum mailing list

<a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:adegenet-forum@lists.r-forge.r-project.org" target="_blank">adegenet-forum@lists.r-forge.r-project.org</a>

<a moz-do-not-send="true" class="moz-txt-link-freetext" href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum" target="_blank">https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum</a>

</pre>

              </blockquote>

            </blockquote>

            -- <br>

            <pre class="moz-signature" cols="72">AVIK RAY

Visiting Fellow 

National Center for Biological Sciences

Tata Institute of Fundamental Research

GKVK Campus

Bellary Road

Bangalore-560065

India

Ph 91-80-23666340

Fax 91-80-2363 6662

</pre>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

    <br>

    <pre class="moz-signature" cols="72">-- 

AVIK RAY

Visiting Fellow 

National Center for Biological Sciences

Tata Institute of Fundamental Research

GKVK Campus

Bellary Road

Bangalore-560065

India

Ph 91-80-23666340

Fax 91-80-2363 6662

</pre>

  </body>

</html>