[adegenet-forum] identification of hybrids

Mon Nov 11 16:06:47 CET 2013

Hi again, 

there can be multiple explanation for the overfitting patterns you observe, so of which could well lie within the data themself (e.g. outliers, or groups defined by few individuals). The main expectation is that there should be a number of PCs which is optimal in terms of prediction; there may be many drivers for the variance in non-optimal solutions.

As for the second point, yes, this is exactly the projection of supplementary individuals described at the end of the DAPC vignette. You calibrate the DAPC with individuals from known groups, and predict the group membership of the supplementary individuals. 

Cheers
Thibaut

________________________________________
From: Mark Coulson [M.Coulson at MARLAB.AC.UK]
Sent: 11 November 2013 12:47
To: Jombart, Thibaut; adegenet-forum at lists.r-forge.r-project.org
Cc: sebastien.devillard at univ-lyon1.fr
Subject: RE: identification of hybrids

Hi Dr. Jombart,

Many thanks for your quick reply and I will try out the xvalDapc option, however, I have a question on this. I did the example for this option provided and found that both fewer and many more components had a higher variance in success than say ~ 50-70. Why would more components have a higher variance, as I would have thought this many might actually overfit the data?

furthermore, I should clarify that I have three known baselines (and these will routinely be used to compare individuals of unknown origin to identify possible hybrids. Therefore is it possible to bring in the unknowns as a separate file and to have them be imposed upon the discriminant space provided by the baseline (i.e. similar to pre-specifying the origin of some individuals to assist with clustering of unknowns in STRUCTURE).

Many thanks,

Mark

-----Original Message-----
From: Jombart, Thibaut [mailto:t.jombart at imperial.ac.uk]
Sent: Mon 11/11/2013 10:17
To: Mark Coulson; adegenet-forum at lists.r-forge.r-project.org
Cc: sebastien.devillard at univ-lyon1.fr
Subject: RE: identification of hybrids

Hello,

STRUCTURE uses a mixture model to partition each genotype into membership to the different populations, which is probably what one is looking for when investigating hybridization. However, this is pending that STRUCTURE actually detects the population structuring in the first place, which it may fail to do, especially when the system departs from a standard island model.

DAPC is usually better at finding the existing population structure, but the group membership probabilities are not derived from a population genetic model. These values are derived from the position of the genotypes on the discriminant factors. This can be practical, but is slightly less satisfying from a theoretical point of view. Still, one expects hybrids to fall between their parental groups, so it should work.

The important point one needs to be careful about is the fact that these will change if the discriminant functions change (i.e. if different numbers of PCA axes are retained). I strongly recommend using cross validation for this purpose (see function xvalDapc). Then, if you can find a DAPC giving satisfying group prediction, the compoplot should indeed point out hybrids.

Sébastien Devillard has worked on exactly these issues, but I am unsure if the paper has been published - I'll leave him comment on that.

Best
Thibaut

--
######################################
Dr Thibaut JOMBART
MRC Centre for Outbreak Analysis and Modelling
Department of Infectious Disease Epidemiology
Imperial College - School of Public Health
St Mary's Campus
Norfolk Place
London W2 1PG
United Kingdom
Tel. : 0044 (0)20 7594 3658
t.jombart at imperial.ac.uk
http://sites.google.com/site/thibautjombart/
http://adegenet.r-forge.r-project.org/
________________________________________
From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Mark Coulson [M.Coulson at MARLAB.AC.UK]
Sent: 11 November 2013 09:50
To: adegenet-forum at lists.r-forge.r-project.org
Subject: [adegenet-forum] identification of hybrids

Hello,

I am attempting to use adegenet in a similar fashion to how one may use STRUCTURE to identify hybrids/admixed individuals. I know the compoplot function will allow for a STRUCTURE-like bar plot but my question is given the differences between STRUCTURE and compoplot, can one still make the same inferences about the identification of hybrids? In STRUCTURE I have been using a q-value cut-off from known individuals to identify possible hybrids (also simulating known hybrids) so that individuals falling below the q-value for 'pure species membership' would fall into this category. Given compoplot is a probability rather than a membership coefficient, is this type of an approach valid?

Best,

Mark

______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
______________________________________________________________________