[adegenet-forum] identification of hybrids

Jombart, Thibaut t.jombart at imperial.ac.uk
Tue Nov 12 21:07:17 CET 2013


Hi there, 

by definition, no, the analysis cannot assign new individuals to a group that was not part of the 'training' set.

Cheers
Thibaut 
________________________________________
From: Mark Coulson [M.Coulson at MARLAB.AC.UK]
Sent: 12 November 2013 13:01
To: sebastien.devillard at univ-lyon1.fr; Jombart, Thibaut; adegenet-forum at lists.r-forge.r-project.org
Subject: RE: identification of hybrids

Many thanks for the addition re: the comparison between STRUCTURE and adegenet. I am working with three distinct groups and STRUCTURE has a hard time separating groups 2 and 3 (so thereby really only identifying 2 groups). The third group is a much smaller sample (n=75) compared to the other two baselines (100s-1000s) and I suspect that is having an effect as described in Kalinowski 2011. If one uses supplementary individuals to assign to these three groups, what would happen if some of the individuals were from a 4th distinct group that had not been sampled in the baseline. In other words, can the posterior probabilities not assign this individual to any of the three represented groups (or at least with poor probability) and thereby be considered excluded from these baselines?

Thanks,
Mark




-----Original Message-----
From: Sebastien Devillard [mailto:sebastien.devillard at univ-lyon1.fr]
Sent: Tue 11/12/2013 09:30
To: Jombart, Thibaut; Mark Coulson; adegenet-forum at lists.r-forge.r-project.org
Subject: Re: identification of hybrids

hi,

just a small add to the Thibaut's answer.
 From my own unpublished experience in comparing /interpreting results
from STRUCTURE and DAPC in identifying hybrids of different generations
(simulated microsatellite genotypes), I recorded a clear tendancy of
having a less continous distribution of "individual introgression"
coefficients (namely q score in STRUCTURE and membership probability in
DAPC) in DAPC. In other words, higher scores to one of the parental
populations are more often found in DAPC than in STRUCTURE, hence, the
population hybridization rate tends to be lower in DAPC than in
STRUCTURE (although I never made simulations to check whether STRUCTURE
or DAPC is closer to the truth) . As Thibaut underlined, there is in
STRUCTURE a genetic model which is not present in DAPC and it is likely
the origin of the difference.

Hope this helps

Sébastien

Le 11/11/2013 16:06, Jombart, Thibaut a écrit :
> Hi again,
>
> there can be multiple explanation for the overfitting patterns you observe, so of which could well lie within the data themself (e.g. outliers, or groups defined by few individuals). The main expectation is that there should be a number of PCs which is optimal in terms of prediction; there may be many drivers for the variance in non-optimal solutions.
>
> As for the second point, yes, this is exactly the projection of supplementary individuals described at the end of the DAPC vignette. You calibrate the DAPC with individuals from known groups, and predict the group membership of the supplementary individuals.
>
> Cheers
> Thibaut
>
>
> ________________________________________
> From: Mark Coulson [M.Coulson at MARLAB.AC.UK]
> Sent: 11 November 2013 12:47
> To: Jombart, Thibaut; adegenet-forum at lists.r-forge.r-project.org
> Cc: sebastien.devillard at univ-lyon1.fr
> Subject: RE: identification of hybrids
>
> Hi Dr. Jombart,
>
> Many thanks for your quick reply and I will try out the xvalDapc option, however, I have a question on this. I did the example for this option provided and found that both fewer and many more components had a higher variance in success than say ~ 50-70. Why would more components have a higher variance, as I would have thought this many might actually overfit the data?
>
> furthermore, I should clarify that I have three known baselines (and these will routinely be used to compare individuals of unknown origin to identify possible hybrids. Therefore is it possible to bring in the unknowns as a separate file and to have them be imposed upon the discriminant space provided by the baseline (i.e. similar to pre-specifying the origin of some individuals to assist with clustering of unknowns in STRUCTURE).
>
> Many thanks,
>
> Mark
>
>
>
>
>
> -----Original Message-----
> From: Jombart, Thibaut [mailto:t.jombart at imperial.ac.uk]
> Sent: Mon 11/11/2013 10:17
> To: Mark Coulson; adegenet-forum at lists.r-forge.r-project.org
> Cc: sebastien.devillard at univ-lyon1.fr
> Subject: RE: identification of hybrids
>
> Hello,
>
> STRUCTURE uses a mixture model to partition each genotype into membership to the different populations, which is probably what one is looking for when investigating hybridization. However, this is pending that STRUCTURE actually detects the population structuring in the first place, which it may fail to do, especially when the system departs from a standard island model.
>
> DAPC is usually better at finding the existing population structure, but the group membership probabilities are not derived from a population genetic model. These values are derived from the position of the genotypes on the discriminant factors. This can be practical, but is slightly less satisfying from a theoretical point of view. Still, one expects hybrids to fall between their parental groups, so it should work.
>
> The important point one needs to be careful about is the fact that these will change if the discriminant functions change (i.e. if different numbers of PCA axes are retained). I strongly recommend using cross validation for this purpose (see function xvalDapc). Then, if you can find a DAPC giving satisfying group prediction, the compoplot should indeed point out hybrids.
>
> Sébastien Devillard has worked on exactly these issues, but I am unsure if the paper has been published - I'll leave him comment on that.
>
> Best
> Thibaut
>
> --
> ######################################
> Dr Thibaut JOMBART
> MRC Centre for Outbreak Analysis and Modelling
> Department of Infectious Disease Epidemiology
> Imperial College - School of Public Health
> St Mary's Campus
> Norfolk Place
> London W2 1PG
> United Kingdom
> Tel. : 0044 (0)20 7594 3658
> t.jombart at imperial.ac.uk
> http://sites.google.com/site/thibautjombart/
> http://adegenet.r-forge.r-project.org/
> ________________________________________
> From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Mark Coulson [M.Coulson at MARLAB.AC.UK]
> Sent: 11 November 2013 09:50
> To: adegenet-forum at lists.r-forge.r-project.org
> Subject: [adegenet-forum] identification of hybrids
>
> Hello,
>
> I am attempting to use adegenet in a similar fashion to how one may use STRUCTURE to identify hybrids/admixed individuals. I know the compoplot function will allow for a STRUCTURE-like bar plot but my question is given the differences between STRUCTURE and compoplot, can one still make the same inferences about the identification of hybrids? In STRUCTURE I have been using a q-value cut-off from known individuals to identify possible hybrids (also simulating known hybrids) so that individuals falling below the q-value for 'pure species membership' would fall into this category. Given compoplot is a probability rather than a membership coefficient, is this type of an approach valid?
>
> Best,
>
> Mark
>
>
> ______________________________________________________________________
> This email has been scanned by the Symantec Email Security.cloud service.
> For more information please visit http://www.symanteccloud.com
> ______________________________________________________________________
>
> ______________________________________________________________________
> This email has been scanned by the Symantec Email Security.cloud service.
> For more information please visit http://www.symanteccloud.com
> ______________________________________________________________________
>
>
> ______________________________________________________________________
> This email has been scanned by the Symantec Email Security.cloud service.
> For more information please visit http://www.symanteccloud.com
> ______________________________________________________________________.
>


--

SébastienDevillard, PhD, Associate Professor

UMR 5558 "Biometry and Evolutionary Biology"

43 bd du 11 novembre 1918,

69622 Villeurbanne cedex

France

Phone :+33 (0)4 72 44 81 70

Fax : +33 (0)4 72 43 13 88

sebastien.devillard at univ-lyon1.fr <mailto:sebastien.devillard at univ-lyon1.fr>

http://lbbe.univ-lyon1.fr/-Devillard-Sebastien-.html

http://sebastien.devillard.perso.sfr.fr
<http://sebastien.devillard.perso.sfr.fr/>



______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
______________________________________________________________________


______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
______________________________________________________________________


More information about the adegenet-forum mailing list