[adegenet-forum] Order of SNPs biasing DAPC results?
Thibaut Jombart
thibautjombart at gmail.com
Mon Jul 4 15:34:01 CEST 2016
Dear Maike,
no, the ordering of alleles should not change anything. See for instance
using sim2pop:
> library(adegenet)
> data(sim2pop)
/// adegenet 2.0.1 is loaded ////////////
> overview: '?adegenet'
> tutorials/doc/questions: 'adegenetWeb()'
> bug reports/feature requests: adegenetIssues()
> dapc1 <- dapc(tab(sim2pop), grp=pop(sim2pop), n.pca=10, n.da=1)
> dapc2 <- dapc(tab(sim2pop)[,sample(1:ncol(tab(sim2pop)))],
grp=pop(sim2pop), n.pca=10, n.da=1)
> dapc1$eig
[1] 484.1916
> dapc2$eig
[1] 484.1916
> dapc1$li
> sum(dapc1$ind.coord-dapc2$ind.coord)
[1] -1.110223e-16 # this is a zero
Best
Thibaut
--
Dr Thibaut Jombart
Lecturer, Department of Infectious Disease Epidemiology
Imperial College London
https://sites.google.com/site/thibautjombart/
https://github.com/thibautjombart
Twitter: @TeebzR <https://twitter.com/TeebzR>
On 27 June 2016 at 13:40, <herrmann-m at uni-landau.de> wrote:
> Hi!
>
> I was performing pairwise DAPCs on a data set containing > 62.000 SNPs in
> 4 populations.
>
> Because of a problem with my PLINK input, my SNPs were initially in a
> “wrong” order. When I noticed the error, I repeated the DAPC analysis with
> a correctly ordered input file, primarily to get correct SNP-IDs for the
> allele loadings. Thereby I noticed differences between the two analyses:
> Individual coordinates (dapc$ind.scores) and DAPC eigenvalues (dapc$eig)
> differed and allele loadings (dapc$var.contr) were also slightly different.
> The output of find.clusters was not different between the input files.
>
> My code:
> > data1 <- read.PLINK("file.raw", map.file = NULL, quiet = FALSE,
> chunkSize = 1000, parallel = FALSE)
> > clust1 <- find.clusters(data1, stat="BIC", choose.n.clust=TRUE,
> max.n.clust=10, n.iter=1e5, n.start=10, pca.select="percVar",
> perc.pca=100, glPca=NULL, parallel=FALSE)
> > dapc1 <- dapc(data1, pop = clust1$grp, pca.select = "percVar", perc.pca
> = 100, parallel=FALSE)
> > loadings1 <- dapc1$var.contr
> > ld1 <- loadings1[order(loadings1[,1], decreasing = TRUE),]
> > write.table(ld1, file = "PW1_loadings_rc.txt")
>
> DAPC Output 1 (unsorted SNPs):
> $eig (eigenvalues): 8.495e+32 vector length content
> $ind.scores
> LD1
> Cluster1 -8.413869e+15 #(same value for all 6 individuals in this
> cluster)
> Cluster2 8.413869e+15 #(same value for all 6 individuals in this
> cluster)
> $var.contr (in decreasing order)
> [line 72] "60732" 0.000171552889525985
> [line 72] "2993" 0.000154678359525634
> [line 72] "976" 0.000152483049214648
>
> DAPC Output 2 (correctly sorted SNPs):
> $eig (eigenvalues): 4.459e+33 vector length content
> $ind.scores
> LD1
> Cluster1 -1.927727e+16 #(same value for all 6 individuals in this
> cluster)
> Cluster2 1.927727e+16 #(same value for all 6 individuals in this
> cluster)
> $var.contr (in decreasing order)
> [line 72] "60819" 0.000171552889525985
> [line 73] "42995" 0.000154678359525636
> [line 74] "697" 0.000152483049214647
> (-> of course the SNP-IDs don’t match, but the values in each line of the
> output should correspond )
>
> I used exactly the same code and the only difference between the analyses
> was the sorting of SNPs. I realize that the difference in allele loadings
> (my main interest) is marginal but I was surprised to find differences in
> the first place. Is that normal? Shouldn’t these results be independent
> from the sorting of SNPs? Could it be because of rounding errors?
>
>
> Thank you for your time!
> Cheers,
> Maike
>
>
>
>
>
>
>
>
> _______________________________________________
> adegenet-forum mailing list
> adegenet-forum at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20160704/ac21ce73/attachment.html>
More information about the adegenet-forum
mailing list