[adegenet-forum] DAPC on allele frequency matrix

Mark Coulson Mark.Coulson.ic at uhi.ac.uk
Fri May 19 12:27:02 CEST 2017


Hello,

I'm using DAPC to try to discriminate between two groups. However, the data are not individual genotypes, but rather the result of genotyping pools of samples. There are 20 individual pools in each of the two groups.  So basically I am providing the analysis with a frequency of the A allele (all dimorphic SNPs) for each pool. There are ~600,000 SNPs in the dataset. I ran the xvalDapc function and it identified 20 PC as the optimum. However when I run the DAPC on the 20, I get the following warning:

Warning message:
In dapc.data.frame(as.data.frame(x), ...) :
  number of retained PCs of PCA may be too large (> N /3)
results may be unstable

What does this mean in terms of my discrimination, which is pretty good among the two groups? In other analyses such as ranking SNPs according to FST, outlier analyses, etc. the separation is pretty good but not as clear as with DAPC overall.

Therefore I am not sure if 1) DAPC is genuinely doing a better job at separating the groups or (2) there is still over-fitting of the data with DAPC given the large number of variables and am I simply finding a solution (which may not be real?)

Any thoughts would be helpful


Mark
Inverness College UHI, a partner in the University of the Highlands and Islands www.inverness.uhi.ac.uk Board of Management of Inverness College (known as Inverness College UHI), Scottish Charity No SC021197.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20170519/ba8b4567/attachment.html>


More information about the adegenet-forum mailing list