[adegenet-forum] Detecting Genetically Unique Individuals in a Well Mixed Population

Thibaut Jombart t.jombart at imperial.ac.uk
Wed May 1 09:45:02 CEST 2013


True, but reconstructing a tree is still possible. One can use nj on the squared Euclidean distances on allelic profiles (@tab) and still assess outliers that way.
May be worth a try.
Cheers
Thibaut

Sony Xperia U on O2

Valeria Montano <mirainoshojo at gmail.com> wrote:

>ok, just realised you have only microsat, phylogeny wouldn't work much :P
>
>On 1 May 2013 00:16, Valeria Montano <mirainoshojo at gmail.com> wrote:
>
>> Hi Nate,
>>
>> I think Thibaut's answer is already more than appropriate and actually
>> points out the main question among your questions. As I understand, your
>> population is not really easy to deal with since you have this high genetic
>> homogeneity which does not leave much room to imagination (a bit
>> frustrating I believe). Focusing on outliers can be an option, but it
>> really depends on your scientific aim. If I were you, I would try with a
>> statistics estimating individual genetic distances (for instance the mean
>> number of pairwise distances using dist.gene in the ape package), calculate
>> the mean of the distances of every ind from all the others, and than put
>> a threshold to define 'outliers', does it make sense? A wee bit arbitrary
>> maybe...moreover, in this case you would have 'outliers' compared to the
>> general population, and I am not sure it would help...
>>
>> On the other hand, to understand whether outliers are immigrants from
>> distant pops, you could build a network or use any phylogenetic
>> reconstruction and see if outliers appear to be long but derived branches
>> within their geographic neighbours or if they are more basal. This is the
>> only tool that comes to my mind.
>>
>> Anyway good luck with it, flat populations are upsetting.
>>
>> with the occasion, happy Labor day everybody! (or happy transition from
>> Spring to Summer - just in case you follow the Celtic tradition)
>>
>> Valeria
>>
>>
>> On 30 April 2013 12:14, Jombart, Thibaut <t.jombart at imperial.ac.uk> wrote:
>>
>>> Dear Nate,
>>>
>>> the problem here is that it is not clear what is meant by 'outliers'. If
>>> we're talking about a few migrants from another population, then they
>>> should fall in a small cluster of there own (e.g. using find.clusters). If
>>> the definition is spatial, then 'outliers' may be individuals that are
>>> genetically distinct from their neighbours (without having to be migrants
>>> from another population). Or, 'outliers' can be individuals with
>>> rare/original alleles (without having to be any of the above). Or
>>> 'outliers' can be whatever does not fall within the inertia ellipse, and in
>>> this case you will always have 'outliers' with the default parameters of
>>> s.class.
>>>
>>> All of these definitions of 'outliers' would require different techniques
>>> to pin them down. I would really avoid anything based on the distance from
>>> the centroid. This implies that the cloud of point of the population is
>>> well represented in only 2D and more importantly is spherical, which is
>>> very unlikely. Detection based on inertia ellipses (not intertia - inertia
>>> is the squared length of a vector, which in PCA is the variance of the
>>> corresponding scores) is bound to fail to. There the assumption is that the
>>> cloud of point of the population is bivariate normal, which again is
>>> unlikely. But if it is the case, the default inertia ellipse in s.class
>>> contains 2/3 of the points. It would be far-fetched to call the remaining
>>> third 'outliers'. One can change this parameter, but again, that means
>>> arbitrarily deciding of a fixed number of outliers.
>>>
>>> But again, the problem here as I understand it is not technical (for now)
>>> - what is meant by 'outliers' needs to be clarified first.
>>>
>>> All the best
>>>
>>> Thibaut
>>>
>>> ________________________________________
>>> From: adegenet-forum-bounces at lists.r-forge.r-project.org [
>>> adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Nathan
>>> Truelove [nathan.truelove at manchester.ac.uk]
>>> Sent: 23 April 2013 13:46
>>> To: adegenet-forum at lists.r-forge.r-project.org
>>> Subject: [adegenet-forum] Detecting Genetically Unique Individuals in a
>>> Well    Mixed Population
>>>
>>> Dear Thibaut and Adegenet Users,
>>>
>>> I would like to begin by thanking Thibaut and everyone else who created
>>> Adegenet, it has to be the most useful data analysis tool that I have used
>>> for my PhD research.
>>>
>>> I am PhD student working on the population genetics of Caribbean spiny
>>> lobster using 16 microsatellite markers. The species has a huge potential
>>> for migration since it can spend up to a year floating/swimming in ocean
>>> currents before settling in shallow coastal habitat. Adults can also
>>> migrate 10s to 100s of km. It's no big surprise that I am finding very
>>> little differentiation in PCA, PCoA, and DAPC analyses. The trend that
>>> comes out in all these analyses is that ~80% of individuals from all
>>> sampling sites fall within the interia ellipse (s.class) or the contour
>>> polygon (s.chull). Several of the individuals outside the interia ellipse
>>> (or polygons) are located quite far away from the "core" of individuals
>>> within the ellipse. These outlier individuals are not associated with any
>>> particular site, however on the spatial level, there appear to be more
>>> outliers in southern sites than in northern sites. I've been trying a
>>> variety of techniques to try and figure out the ecological
>>>  importance of these outlier individuals. For example, a recent paper by
>>> Elphie et al. entitled "Detecting immigrants in a highly genetically
>>> homogeneous spiny lobster population (Palinurus elephas) in the northwest
>>> Mediterranean Sea" explores a similar issue in a different species of
>>> lobster. In this paper the authors use non-metric multidimensional scaling
>>> to separate out the genetic distances of their individuals in multivariate
>>> space. They then classified all individuals within a 50% radius of the
>>> barycentre as the "reference population" and all individuals outside the
>>> 50% radius as an "assignment population". They then used Geneclass2 to run
>>> assignment tests and any individuals that had a p-value < 0.05 are
>>> considered "genetically different". The authors argue that the most likely
>>> explanation for the genetic differences is that the genetically unique
>>> individuals detected in Geneclass are migrants from populations that have
>>> genetically diverged. I imagine there are severa
>>>  l other ecological or selective processes that could also lead to
>>> genetically unique individuals, so calling them migrants is up for debate.
>>>
>>> For my data I ran a similar analysis in Adegenet using the functions
>>> s.class and s.chull along with dudi.pca to select the reference and
>>> assignment populations for Genclass2. I compared these results to a similar
>>> analysis using non-metric multidimensional scaling in the Vegan package.
>>> The Adegenet PCA analyses contained about twice as many individuals in the
>>> reference population than the nMDS technique, yet the overall trend of
>>> Geneclass finding more unique individuals in the south than the north was
>>> consistent among all techniques. Also, most of the distant outliers in PCA
>>> analysis in Adegenet were also significantly different in the Geneclass
>>> analysis.
>>>
>>> It would be excellent to get your opinions on this technique and discuss
>>> potential options for improving it:
>>>
>>> 1) Would it be possible to get additional information using Adegenet on
>>> how different the outliers in PCA are from the "core" of individuals inside
>>> the inertia ellipse? It would be nice to run the entire analysis in
>>> Adegenet and not have to use Geneclass2 at all.
>>>
>>> 2) Is there a simple way to identify each individual within an inertia
>>> ellipse. I have been using the function identify to select the individuals
>>> that are located within the ellipse, yet it is rather clunky since you have
>>> to click on every point.
>>>
>>> 3) Any additional advice concerning how to detect genetic outliers in
>>> homogeneous populations using Adegenet would be greatly appreciated.
>>>
>>> Thank you very much for your time.
>>>
>>> Best Wishes,
>>>
>>> Nate
>>>
>>>
>>>
>>> _______________________________________________
>>> adegenet-forum mailing list
>>> adegenet-forum at lists.r-forge.r-project.org
>>>
>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum
>>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20130501/71a24179/attachment.html>


More information about the adegenet-forum mailing list