[adegenet-forum] Large data set problem for DAPC

Tue Dec 12 18:41:00 CET 2017

Dear Thibaut,

I’m running DAPC for a genomic data set including 400,000 SNPs and about 2000 individuals.

I converted my vcf genomic data set to the genlight format, and used multiple cores.

The major command line is as below:

dapc1 <- dapc(GBSgenlight, combine_race, n.rep = 3, n.pca=10, parallel = "multicore", ncpus = 4)

If I run a data subset including about 1000 individuals and pre-defined subpopulation names, the DAPC runs very well. However, when I run the full data set (2000 individuals), the DAPC job in the High Performance Cluster always quit automatically and no any result returned, after a long calculating or frozen.

It will be appreciated if anyone would like share any suggestions or solutions to address my problem.

In addition, can DAPC run a whole genome re-sequencing SNP data set? Or what’s the maximum data set can the DAPC cope with?

Thanks in advance.

Jianan