[adegenet-forum] DAPC in mixed ploidy species

Jombart, Thibaut t.jombart at imperial.ac.uk
Mon Oct 1 13:03:20 CEST 2012


Dear Jukka, 

thanks for forwarding this to the ML. 

In principle, DAPC can be applied to data with varying ploidy because the analysis is performed on a table of allele frequencies. That is, assuming that the genotypes are actually known. In your case, there are at least two issues:
1) How to code uncertain genotypes in the triploid case?
2) Is the uncertainty for triploid data an issue?

For 1), coding triploid heterozygotes (AAB and ABB) as normal diploid heterozygotes (AB) is an issue, because it would assume equal allele frequencies. However, if you assume random association of alleles, then amongst triploid heterozygotes (A?B), there is a proportion p(A) of AAB and a proportion p(B) of ABB, where p(A) and p(B) are the allele frequencies which can be estimated from all diploid data and triploid homozygotes. This is not perfect as ideally you'd like to use the actual allele frequencies of the triploid population, but that's a start. Whether or not to include diploid data to estimate p(A) and p(B) depends on how conservative you want to be. Using them will not add between-group differences to the data, while not using them will make group-definition more clear-cut and groups more homogeneous.

The pain is that you will have to code the table of allele frequencies manually. Assuming you have 5 genotypes AA, AB, A?B (acutally, AAB), A?B (acutally, ABB), and BBB, the table for this locus would look like:
A   B
1   0
0.5  0.5
p(A)  p(B)
p(A)  p(B)
0   1

For 2), the fact that triploid data will be more noisy won't be much of an issue in DAPC, as the method minimizes variance within groups. This would have been different in e.g.  PCA or MDS.

Hope this helps.

Cheers

Thibaut.

--
######################################
Dr Thibaut JOMBART
MRC Centre for Outbreak Analysis and Modelling
Department of Infectious Disease Epidemiology
Imperial College - School of Public Health
St Mary’s Campus
Norfolk Place
London W2 1PG
United Kingdom
Tel. : 0044 (0)20 7594 3658
t.jombart at imperial.ac.uk
http://sites.google.com/site/thibautjombart/
http://adegenet.r-forge.r-project.org/
________________________________________
From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Jokela, Jukka [Jukka.Jokela at eawag.ch]
Sent: 01 October 2012 11:23
To: adegenet-forum at lists.r-forge.r-project.org
Subject: [adegenet-forum] DAPC in mixed ploidy species

thanks for developing DAPC, it is a really nice analysis tool.
I am working on triploid(asexual)-diploid(sexual) mixed populations with SNP markers which have two alleles per locus. I can score heterozygotes AB in both diploid and triploid but cannot separate AAB and ABB heterozygotes. Now I am analysing the data with DAPC as if all individuals were diploid, but I do know the ploidy of each individual and can use that as a prior substructure within populations if I want.

you write in your BMCGenetics paper (2010) about the versatility of DAPC with respect to ploidy. I understand this so that one can use data sets of different ploidy (I have done that, looks good and makes sense) but is a bit unclear what would happen if one has a mixed sample of alterative ploidy in the same dataset coded as diploid as I describe above. In my study species (Potamopyrgus antipodarum) triploids are supposedly derived from local sexual diploids, and this is one reason why we would like to ask if DAPC supports that prediction. I have now looked at the results but I am worried that using mixed samples as above is not justified and that I am introducing a bias that I cannot control. What would be your opinion?
Thanks a lot if you have time to comment on this.

cheers, jukka
jukka jokela / ETH-Zürich


More information about the adegenet-forum mailing list