<style>

<!--

 /* Font Definitions */

@font-face

        {font-family:Cambria;

        panose-1:2 4 5 3 5 4 6 3 2 4;

        mso-font-charset:0;

        mso-generic-font-family:auto;

        mso-font-pitch:variable;

        mso-font-signature:3 0 0 0 1 0;}

 /* Style Definitions */

p.MsoNormal, li.MsoNormal, div.MsoNormal

        {mso-style-parent:"";

        margin:0in;

        margin-bottom:.0001pt;

        mso-pagination:widow-orphan;

        font-size:12.0pt;

        font-family:"Times New Roman";

        mso-ascii-font-family:Cambria;

        mso-fareast-font-family:Cambria;

        mso-hansi-font-family:Cambria;

        mso-bidi-font-family:"Times New Roman";}

@page Section1

        {size:8.5in 11.0in;

        margin:1.0in 1.0in 1.0in 1.0in;

        mso-header-margin:.5in;

        mso-footer-margin:.5in;

        mso-paper-source:0;}

div.Section1

        {page:Section1;}

-->

</style>

<p style="font-family:arial,helvetica,sans-serif" class="MsoNormal"><font>Hello again Dr. Jombart and Adegenet users,</font></p><font style="font-family:arial,helvetica,sans-serif" size="2">

</font><p style="font-family:arial,helvetica,sans-serif" class="MsoNormal"><font> </font></p><font style="font-family:arial,helvetica,sans-serif" size="2">

</font><p style="font-family:arial,helvetica,sans-serif" class="MsoNormal"><font>I have a follow-up question related to the grouping of

individuals not using k-means.</font></p><font style="font-family:arial,helvetica,sans-serif" size="2">

</font><p style="font-family:arial,helvetica,sans-serif" class="MsoNormal"><font> </font></p><font style="font-family:arial,helvetica,sans-serif" size="2">

</font><p style="font-family:arial,helvetica,sans-serif" class="MsoNormal"><font>We would like to test whether the group assignment (assigned

by us) is significantly related to the location of individuals in the

discriminant function (DF) space. To do this we have taken the following

approach:</font></p><font style="font-family:arial,helvetica,sans-serif" size="2">

</font><p style="font-family:arial,helvetica,sans-serif" class="MsoNormal"><font> </font></p><font style="font-family:arial,helvetica,sans-serif" size="2">

</font><p style="font-family:arial,helvetica,sans-serif" class="MsoNormal"><font>1. Perform a MANOVA on the individual DF coordinates with

group class as the predictor variable. The idea here is that (A) the Wilks

lamba test provides a metric of separation among the groups and (B) accounts

for correlation among variables (DFs). The test code is: </font></p><font style="font-family:arial,helvetica,sans-serif" size="2">

</font><p style="font-family:arial,helvetica,sans-serif" class="MsoNormal"><font> </font></p><font style="font-family:arial,helvetica,sans-serif" size="2">

</font><p style="font-family:arial,helvetica,sans-serif" class="MsoNormal"><font>model <- manova(dapcobject$ind.coord~genindobject$pop)</font></p><font style="font-family:arial,helvetica,sans-serif" size="2">

</font><p style="font-family:arial,helvetica,sans-serif" class="MsoNormal"><font>summary(model, test=”Wilks”)</font></p><font style="font-family:arial,helvetica,sans-serif" size="2">

</font><p style="font-family:arial,helvetica,sans-serif" class="MsoNormal"><font> </font></p><font style="font-family:arial,helvetica,sans-serif" size="2">

</font><p style="font-family:arial,helvetica,sans-serif" class="MsoNormal"><font>2. However, we are worried that the significance value

obtained by MANOVA (which was remarkably small) might be anti-conservative

(i.e. high Type-I error) because DAPC has already maximized among group

variation and uncovered structure that might be evident even in random datasets.

</font></p><font style="font-family:arial,helvetica,sans-serif" size="2">

</font><p style="font-family:arial,helvetica,sans-serif" class="MsoNormal"><font> </font></p><font style="font-family:arial,helvetica,sans-serif" size="2">

</font><p style="font-family:arial,helvetica,sans-serif" class="MsoNormal"><font>Therefore, we came up with a randomization test.<span>  </span>We first create a null DF distribution

by randomizing the rows/individuals in the “genind” data object so that the

number of individuals per group remains the same, but the individuals contained

in each group are now randomized. We do this 1000 times and perform the DAPC

and MANOVA operations on all 1000 sets to obtain the randomized distribution. Lastly,

we compare our empirical Wilks lambda value with the randomized distribution to

determine if our Wilks is larger than expected based on random chance.<span>    </span></font></p><font style="font-family:arial,helvetica,sans-serif" size="2">

</font><p style="font-family:arial,helvetica,sans-serif" class="MsoNormal"><font> </font></p><font style="font-family:arial,helvetica,sans-serif" size="2">

</font><p style="font-family:arial,helvetica,sans-serif" class="MsoNormal"><font>Does this seem reasonable? Our hesitation is related to

some initial results from our dataset. When we run the empirical dataset with 3

defined groups, the DAPC produces 3 clear clusters with some small overlap

(i.e. the 3 a priori groups segregate very nicely in DF space). However, when

we randomized the alleles and genotypes, the resulting DAPC with the same group

sizes also results in 3 clear clusters, but that have noticeably more ellipse

overlap than the empirical data. So we are wondering whether the a priori group

designation (related to a substantial habitat and phenotypic difference in our

case) will mandate some level of clustering – but with DAPC also looking to

optimize grouping segregation in DF space the patterns become clearer and maybe

somewhat spurious (at least in our case)? </font></p><font style="font-family:arial,helvetica,sans-serif" size="2">

</font><p style="font-family:arial,helvetica,sans-serif" class="MsoNormal"><font> </font></p><font style="font-family:arial,helvetica,sans-serif" size="2">

</font><p style="font-family:arial,helvetica,sans-serif" class="MsoNormal"><font>Any insight you can provide would be greatly appreciated.

Thank you in advance.<span>  </span></font></p><font style="font-family:arial,helvetica,sans-serif" size="2">

</font><p style="font-family:arial,helvetica,sans-serif" class="MsoNormal"><font> </font></p><font style="font-family:arial,helvetica,sans-serif" size="2">

</font><p style="font-family:arial,helvetica,sans-serif" class="MsoNormal"><font>Jon</font></p>

<br><br><br><div class="gmail_quote">On Thu, Feb 23, 2012 at 9:08 AM, Jombart, Thibaut <span dir="ltr"><<a href="mailto:t.jombart@imperial.ac.uk" target="_blank">t.jombart@imperial.ac.uk</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

Hello,<br>

<br>

so I think the in the DAPC vignette, the example based on H3N2 data (section 3.4) uses the year of sampling as group factor in DAPC. Also, in the same document, the microbov example (p25-34) uses the cattle breeds as group factor in DAPC. The H3N2 example was also presented in the original paper.<br>

<br>

So yes, it does make sense. DAPC provides the best achievable reduced space representation of between-group diversity (in the sense of a F statistic, var between / var within). It is comparable to STRUCTURE or any other similar method when the same groups are used, to the extent that the methods give comparable outputs - in this case, the only common thing is group membership probabilities.<br>

<br>

Cheers<br>

<br>

Thibaut<br>

<br>

<br>

________________________________________<br>

From: <a href="mailto:adegenet-forum-bounces@r-forge.wu-wien.ac.at" target="_blank">adegenet-forum-bounces@r-forge.wu-wien.ac.at</a> [<a href="mailto:adegenet-forum-bounces@r-forge.wu-wien.ac.at" target="_blank">adegenet-forum-bounces@r-forge.wu-wien.ac.at</a>] on behalf of J. Richardson [<a href="mailto:jrichardson4@gmail.com" target="_blank">jrichardson4@gmail.com</a>]<br>

Sent: 22 February 2012 22:30<br>

To: <a href="mailto:adegenet-forum@r-forge.wu-wien.ac.at" target="_blank">adegenet-forum@r-forge.wu-wien.ac.at</a><br>

Subject: [adegenet-forum] DAPC group choice<br>

<div><div><br>

Hi Dr. Jombart and Adegenet users,<br>

<br>

I have a question related to DAPC that I have not found in the manual, tutorials or forum archive.<br>

<br>

I am wondering what the DAPC operation is doing (i.e. how it is configuring clusters relative to each other) when you<br>

do not use the groups created in "find.clusters" (i.e. grp$grp output), but rather use the population of origin as the<br>

group designation (i.e. dataset$pop)?<br>

<br>

I ran "find.clusters" and performed the DAPC with these created groups. I also performed a DAPC with the groups set<br>

as the sampling sites (populations of origin) using the number of clusters derived from k-means. Interestingly, the DAPC using the k-means<br>

groupings don't make a lot of intuitive sense. However, the DAPC results using the sampling sites/populations of origin for the group<br>

designation make sense and correspond closely to the output from STRUCTURE using their location prior.<br>

<br>

So I am wondering if using the sampling site/population designation as the group designation is (A) analogous to the<br>

STRUCTURE operation using the location prior or "population flags", and (B) if this is valid if you have good a priori information on<br>

your population delineations (e.g. a species breeding in discrete, contained habitats)?<br>

<br>

Thank you so much in advance for any insight you can provide.<br>

<br>

Jon<br>

<br>

<br>

</div></div></blockquote></div><br>