[adegenet-forum] no arch in cross validation results (repost)

Thibaut Jombart thibautjombart at gmail.com
Wed Jan 15 12:50:11 CET 2020


Dear Nikki

sorry for the delayed reply. In principle, one may end up having to retain
all PCs, if the numbers of alleles is small compared to the number of
individuals, especially if alleles are not independent (LD). For the BIC
graph, make sure find.clusters is run with many starting points (eg.
n.start = 100). Otherwise, have you tried running snapclust on your data to
see what clusters look like?

Best
Thibaut

--
Dr Thibaut Jombart
Associate Professor in Outbreak Analytics, London School of Hygiene and
Tropical Medicine
Senior Lecturer in Genetic Analysis, Imperial College London
President of RECON: repidemicsconsortium.org
https://thibautjombart.netlify.com
Twitter: @TeebzR


On Tue, 14 Jan 2020 at 14:28, Nikki Vollmer <nlv209 at hotmail.com> wrote:

> Hi all!
> This is a repost of an earlier inquiry from late last year. Still having the same issue and looking for help...
> I have a data set of 19 microsatellite loci about 1000 individuals. Results from STRUCTURE suggest 4 populations with sample sizes ranging from 130-468 (pairwise Fst range from 0.02-0.06). I wanted to run DAPC on the same data to see what happens.  When I do find.clusters the BIC graph is jagged and inconsistent when I re-run it (gives lowest BIC typically for somewhere between 5-10 clusters, and the differences in BIC between them are sometimes very small and sometimes not).  Regardless of the number of clusters I choose to continue with, when I do cross validation (n.pca.max=200, training.set=0.9, n.rep = 50) the number of PCs with the lowest RMSE is always very near my n.pca.max (usually 180, so I am not getting a nice arch in my graph).  Furthermore, the mean successful assignment rate increases with the number of PCA axes retained, typically reaching in the high 80's or even 90's around 180 PCs.  This seems fishy to me, but I am not sure why.  I've never before, either prior to xval implementation or using xval with other data sets, ever kept anywhere near the max number of PC's - either because I didn't want to over-fit the data or more recently because of xval recommendation.  I guess that is why I am wary to do this now.  But am I wrong to distrust the xval results, or is this perhaps an indication that my data isn't powerful enough?
>
> Not surprisingly, there is a noticeable difference is cluster number and individuals assigned to each cluster depending on how many PC's I retain for the DAPC analysis.
>
> Any help/insight is greatly appreciated, thank you!
> Nikki
>
>
> _______________________________________________
> adegenet-forum mailing list
> adegenet-forum at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20200115/83750e5e/attachment.html>


More information about the adegenet-forum mailing list