[adegenet-forum] When cross-validating DAPC (using web server), is it best to use 'group' or 'overall'?
thibautjombart at gmail.com
Wed Jul 12 16:29:52 CEST 2017
do your groups have very different sizes? If so this would explain the
discrepancy. When optimizing cross validation on each group, you basically
make sure that every group is predicted as well as possible. When using
overall classification, the largest group really is what gets optimized.
Dr Thibaut Jombart
Lecturer, Department of Infectious Disease Epidemiology, Imperial College
Head of RECON: repidemicsconsortium.org
WHO Consultant - outbreak analysis
+44(0)20 7594 3658
On 29 June 2017 at 22:20, Stephanie Coster <stephanie.coster at gmail.com>
> Thanks in advance!
> I have a dataset of genotypes from microsatellite loci and I'm looking to
> analyze population structure. Program STRUCTURE shows essentially no
> clusters, and I'd like to use DAPC to get another perspective.
> I've run the 'find.clusters' code and the BIC suggests K=2, but the
> assignments are unreliable (equal assignments across all sites to both
> clusters). I am interpreting this to mean that K=1 and all samples likely
> form a single cluster.
> Now, I'd like to use my apriori site groupings to draw a scatterplot and
> am using the DAPC web server to cross-validate and suggest the number of
> PCs to retain. I get notably different scatterplots depending on whether I
> choose 'group' or 'overall' to assess. The sites have more spatial
> differentiation when using 'group', and essentially all overlap when using
> 'overall'. I understand that success is calculated by my groups or overall
> depending on the choice, but what does this mean in application? Can
> someone help explain why these plots differ and which is better to use?
> Many thanks!
> adegenet-forum mailing list
> adegenet-forum at lists.r-forge.r-project.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the adegenet-forum