[adegenet-forum] Request of DAPC analysis

Das, Roma (ICRISAT-IN) r.das at cgiar.org
Wed Aug 28 06:20:59 CEST 2019


Hello Everyone,
I am using DAPC using adegenet package for cluster analysis. However I am not sure if I am following the correct way to select n.pca and n.clust based on cross-validation.

I am following below steps


1.       I am using a genind object

2.       Used find.clusters() grp <- find.clusters() and interactively chose n.pca and n.clust. Based on plot, I selected n.pca=200 and n.clust=21

3.       Next used xvalDapc() to get some idea about number of PCs
xval <- xvalDapc(tab(fdat, NA.method = "mean"), grp$grp, n.pca.max = 300, n.rep = 30)

4.       Based on number of PCs achieving highest mean success and lowest MSE, I selected n.pca=50

5.       Further, I tried to narrowed the search of PC's with n.pca = 30:60
xval_optimum <- xvalDapc(tab(fdat, NA.method = "mean"), grp$grp, n.pca = 30:60, n.rep = 100,parallel = "multicore", ncpus = 6L )

6.       Finally I selected n.pca=30 based on number of PCs achieving highest mean success and lowest MSE from xval_optimum

My questions are:

7.       From cross-validation, it seems the optimum number of PCs is 30. Should I re-run find.clusters() with n.pca=30 and select n.clust interactively from plot

8.       And then re-run dapc() with n.pca=30 and output of n.clust from step 6. Please advise

Thanks,
Roma
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20190828/70d3b905/attachment.html>


More information about the adegenet-forum mailing list