[adegenet-forum] no arch in cross validation results

Nikki Vollmer nlv209 at hotmail.com
Tue Dec 10 18:59:24 CET 2019


Hi!
I have a data set of 19 microsatellite loci about 1000 individuals. Results from STRUCTURE suggest 4 populations with sample sizes ranging from 130-468 (pairwise Fst range from 0.02-0.06). I wanted to run DAPC on the same data to see what happens.  When I do find.clusters the BIC graph is jagged and inconsistent when I re-run it (gives lowest BIC typically for somewhere between 5-10 clusters, and the differences in BIC between them are sometimes very small and sometimes not).  Regardless of the number of clusters I choose to continue with, when I do cross validation (n.pca.max=200, training.set=0.9, n.rep = 50) the number of PCs with the lowest RMSE is always very near my n.pca.max (usually 180, so I am not getting a nice arch in my graph).  Furthermore, the mean successful assignment rate continually increases with the number of PCA axes retained, typically reaching in the high 80's or even 90's around 180 PCs.  This seems fishy to me, but I am not sure why.  I've never before, prior to xval implementation or post with other data sets, ever kept anywhere near the max number of PC's - either because I didn't want to over-fit the data or more recently because of xval recommendation.  I guess that is why I am wary to do this now.  But am I wrong to distrust the xval results, or is this perhaps an indication that my data isn't powerful enough?

Not surprisingly, there is a noticeable difference is cluster number and individuals assigned to each cluster depending on how many PC's I retain for the DAPC analysis.

Any help/insight is greatly appreciated, thank you!
Nikki
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20191210/0e5e9fee/attachment.html>


More information about the adegenet-forum mailing list