[adegenet-forum] dapc on allele frequencies
Thibaut Jombart
thibautjombart at gmail.com
Wed Jul 12 19:03:37 CEST 2017
Hi Mark
sorry about the delay in the reply.
> I'm using DAPC to try to discriminate between two groups. However, the data are not individual genotypes, but rather the result of genotyping pools of samples. There are 20 individual pools in each of the two groups. So basically I am providing the analysis with a frequency of the A allele (all dimorphic SNPs) for each pool. There are ~600,000 SNPs in the dataset. I ran the xvalDapc function and it identified 20 PC as the optimum. However when I run the DAPC on the 20, I get the following warning:
>
>
>
> Warning message:
>
> In dapc.data.frame(as.data.frame(x), ...) :
>
> number of retained PCs of PCA may be too large (> N /3)
>
> results may be unstable
>
>
> What does this mean in terms of my discrimination, which is pretty good among the two groups? In other analyses such as ranking SNPs according to FST, outlier analyses, etc. the separation is pretty good but not as clear as with DAPC overall.
You can safely ignore the warning. I think it's been removed from the
devel version, and will be gone in the next CRAN release.
> Therefore I am not sure if 1) DAPC is genuinely doing a better job at separating the groups or (2) there is still over-fitting of the data with DAPC given the large number of variables and am I simply finding a solution (which may not be real?)
I would just examine the results of the cross validation for this. Are
the predictions significantly better than the random expectation
(dashed horizontal lines)?
> Also, I have a question on the xvalDapc function.
> When I run the following
>
> xval1 <- xvalDapc(FD_t, group, n.pca.max=40, result="groupMean", center=TRUE, scale=FALSE, xval.plot=TRUE)
> I get results back at 5, 10, 15, 20, 25, 30, 35
> However, when I run (on the same dataset)
> xval1a <- xvalDapc(FD_t, group, n.pca.max=40, result="groupMean", training.set=0.7, center=TRUE, scale=FALSE, xval.plot=TRUE)
> I get results back at 13 different PCA axes levels, roughly by increments of 2
> Also, I am looking to specify the increments so tried something like the following:
>
> xval2 <- xvalDapc(FD_t, group, n.pca.max=40, result="groupMean", training.set=0.7, center=TRUE, scale=FALSE, n.pca=seq(5, by=5,to=40),xval.plot=TRUE)
> but I don't get these exact increments.
> So what determines the scale of the x-axis?
Can you try with the current devel version? I suspect it might have
been a bug which has been fixed since the last release.
Best
Thibaut
More information about the adegenet-forum
mailing list