[adegenet-forum] dapc on allele frequencies
Mark Coulson
Mark.Coulson.ic at uhi.ac.uk
Wed Jul 12 17:58:21 CEST 2017
Hi Thibaut,
I have been using in just as a normal matrix (i.e. not a genlight object) and it is pretty fast on a decent cpu. However, I still have an outstanding question on the DAPC itself. My earlier post was as follows:
I'm using DAPC to try to discriminate between two groups. However, the data are not individual genotypes, but rather the result of genotyping pools of samples. There are 20 individual pools in each of the two groups. So basically I am providing the analysis with a frequency of the A allele (all dimorphic SNPs) for each pool. There are ~600,000 SNPs in the dataset. I ran the xvalDapc function and it identified 20 PC as the optimum. However when I run the DAPC on the 20, I get the following warning:
Warning message:
In dapc.data.frame(as.data.frame(x), ...) :
number of retained PCs of PCA may be too large (> N /3)
results may be unstable
What does this mean in terms of my discrimination, which is pretty good among the two groups? In other analyses such as ranking SNPs according to FST, outlier analyses, etc. the separation is pretty good but not as clear as with DAPC overall.
Therefore I am not sure if 1) DAPC is genuinely doing a better job at separating the groups or (2) there is still over-fitting of the data with DAPC given the large number of variables and am I simply finding a solution (which may not be real?)
Also, I have a question on the xvalDapc function.
When I run the following
xval1 <- xvalDapc(FD_t, group, n.pca.max=40, result="groupMean", center=TRUE, scale=FALSE, xval.plot=TRUE)
I get results back at 5, 10, 15, 20, 25, 30, 35
However, when I run (on the same dataset)
xval1a <- xvalDapc(FD_t, group, n.pca.max=40, result="groupMean", training.set=0.7, center=TRUE, scale=FALSE, xval.plot=TRUE)
I get results back at 13 different PCA axes levels, roughly by increments of 2
Also, I am looking to specify the increments so tried something like the following:
xval2 <- xvalDapc(FD_t, group, n.pca.max=40, result="groupMean", training.set=0.7, center=TRUE, scale=FALSE, n.pca=seq(5, by=5,to=40),xval.plot=TRUE)
but I don't get these exact increments.
So what determines the scale of the x-axis?
Any thoughts would be helpful
Dr Mark Coulson
Researcher – Rivers and Lochs Institute
T: 01463 273576 / 279477
Normal working days: Tues-Friday
[cid:image006.jpg at 01D2FB30.0CE9DA20]<http://www.inverness.uhi.ac.uk/>
1 Inverness Campus
Inverness
IV2 5NA
[cid:image005.png at 01D05FDC.CF5914F0]<http://www.facebook.com/invernesscollegeuhi>[cid:image006.png at 01D05FDC.CF5914F0]<https://twitter.com/ic_uhi>
www.inverness.uhi.ac.uk<http://www.inverness.uhi.ac.uk/>
[IIP_GOLD_19] [CSEUK Primary (r) RGB]
From: adegenet-forum-bounces at lists.r-forge.r-project.org [mailto:adegenet-forum-bounces at lists.r-forge.r-project.org] On Behalf Of Thibaut Jombart
Sent: 12 July 2017 15:22
To: Mark Coulson <coulsonmw at gmail.com>
Cc: adegenet-forum at lists.r-forge.r-project.org
Subject: Re: [adegenet-forum] dapc on allele frequencies
Hi Mark,
in principle you could use genlight, setting the ploidy for each pool to (the number of individuals) * ploidy. It should still be quite efficient in terms of memory savings, and run decently fast for a small number of pools (<100).
Best
Thibaut
--
Dr Thibaut Jombart
Lecturer, Department of Infectious Disease Epidemiology, Imperial College London
Head of RECON: repidemicsconsortium.org<http://repidemicsconsortium.org>
WHO Consultant - outbreak analysis
sites.google.com/site/thibautjombart/<http://sites.google.com/site/thibautjombart/>
Twitter: @TeebzR
+44(0)20 7594 3658
On 17 May 2017 at 16:48, Mark Coulson <coulsonmw at gmail.com<mailto:coulsonmw at gmail.com>> wrote:
Hi
I have allele frequency data for pools of individuals (no individual genotype data) for >500,000 SNPs. I know I can do a dapc on allele frequencies directly but given this many SNPs should I be using a ‘genlight’ object or is this only for individual genotypes?
Thanks,
_______________________________________________
adegenet-forum mailing list
adegenet-forum at lists.r-forge.r-project.org<mailto:adegenet-forum at lists.r-forge.r-project.org>
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum
Inverness College UHI, a partner in the University of the Highlands and Islands www.inverness.uhi.ac.uk Board of Management of Inverness College (known as Inverness College UHI), Scottish Charity No SC021197.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20170712/9b6daa44/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.png
Type: image/png
Size: 665 bytes
Desc: image002.png
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20170712/9b6daa44/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image003.png
Type: image/png
Size: 708 bytes
Desc: image003.png
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20170712/9b6daa44/attachment-0003.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image006.jpg
Type: image/jpeg
Size: 2689 bytes
Desc: image006.jpg
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20170712/9b6daa44/attachment-0003.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image007.jpg
Type: image/jpeg
Size: 1928 bytes
Desc: image007.jpg
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20170712/9b6daa44/attachment-0004.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image008.jpg
Type: image/jpeg
Size: 2873 bytes
Desc: image008.jpg
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20170712/9b6daa44/attachment-0005.jpg>
More information about the adegenet-forum
mailing list