[adegenet-forum] Large data set problem for DAPC
Jianan Wang
jnwang at ksu.edu
Tue Dec 12 18:41:00 CET 2017
Dear Thibaut,
I’m running DAPC for a genomic data set including 400,000 SNPs and about 2000 individuals.
I converted my vcf genomic data set to the genlight format, and used multiple cores.
The major command line is as below:
dapc1 <- dapc(GBSgenlight, combine_race, n.rep = 3, n.pca=10, parallel = "multicore", ncpus = 4)
If I run a data subset including about 1000 individuals and pre-defined subpopulation names, the DAPC runs very well. However, when I run the full data set (2000 individuals), the DAPC job in the High Performance Cluster always quit automatically and no any result returned, after a long calculating or frozen.
It will be appreciated if anyone would like share any suggestions or solutions to address my problem.
In addition, can DAPC run a whole genome re-sequencing SNP data set? Or what’s the maximum data set can the DAPC cope with?
Thanks in advance.
Jianan
More information about the adegenet-forum
mailing list