[adegenet-forum] Order of SNPs biasing DAPC results?

Mon Jun 27 14:40:25 CEST 2016

Hi!

I was performing pairwise DAPCs on a data set containing > 62.000 SNPs in 4 populations.  

Because of a problem with my PLINK input, my SNPs were initially in a “wrong” order. When I noticed the error, I repeated the DAPC analysis with a correctly ordered input file, primarily to get correct SNP-IDs for the allele loadings. Thereby I noticed differences between the two analyses:
Individual coordinates (dapc$ind.scores) and DAPC eigenvalues (dapc$eig) differed and allele loadings (dapc$var.contr) were also slightly different. The output of find.clusters was not different between the input files.

My code:
> data1 <- read.PLINK("file.raw", map.file = NULL, quiet = FALSE, chunkSize = 1000, parallel = FALSE)
> clust1 <- find.clusters(data1, stat="BIC", choose.n.clust=TRUE, max.n.clust=10, n.iter=1e5, n.start=10,  pca.select="percVar", perc.pca=100, glPca=NULL, parallel=FALSE)
> dapc1 <- dapc(data1, pop = clust1$grp, pca.select = "percVar", perc.pca = 100, parallel=FALSE)
> loadings1 <- dapc1$var.contr
> ld1 <- loadings1[order(loadings1[,1], decreasing = TRUE),]
> write.table(ld1, file = "PW1_loadings_rc.txt")

DAPC Output 1 (unsorted SNPs): 
$eig (eigenvalues): 8.495e+32  vector    length content                   
$ind.scores
    LD1
Cluster1    -8.413869e+15    #(same value for all 6 individuals in this cluster)
Cluster2    8.413869e+15    #(same value for all 6 individuals in this cluster)
$var.contr (in decreasing order)
[line 72]    "60732" 0.000171552889525985
[line 72]    "2993" 0.000154678359525634
[line 72]    "976" 0.000152483049214648

DAPC Output 2 (correctly sorted SNPs):
$eig (eigenvalues): 4.459e+33  vector    length content  
$ind.scores
    LD1
Cluster1    -1.927727e+16    #(same value for all 6 individuals in this cluster)
Cluster2    1.927727e+16    #(same value for all 6 individuals in this cluster)
$var.contr (in decreasing order) 
[line 72]    "60819" 0.000171552889525985
[line 73]    "42995" 0.000154678359525636
[line 74]    "697" 0.000152483049214647
(-> of course the SNP-IDs don’t match, but the values in each line of the output should correspond )

I used exactly the same code and the only difference between the analyses was the sorting of SNPs. I realize that the difference in allele loadings (my main interest) is marginal but I was surprised to find differences in the first place. Is that normal? Shouldn’t these results be independent from the sorting of SNPs? Could it be because of rounding errors? 

Thank you for your time!
Cheers,
Maike

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20160627/4ef9d9be/attachment.html>