[adegenet-forum] When cross-validating DAPC (using web server), is it best to use 'group' or 'overall'?
thibautjombart at gmail.com
Wed Jul 12 17:54:26 CEST 2017
Good, so then the best choice sounds like the group otimization (rather
than overall optim).
Dr Thibaut Jombart
Lecturer, Department of Infectious Disease Epidemiology, Imperial College
Head of RECON: repidemicsconsortium.org
WHO Consultant - outbreak analysis
+44(0)20 7594 3658
On 12 July 2017 at 16:02, Stephanie Coster <stephanie.coster at gmail.com>
> Yes, they do have different sizes. Thanks, that helps explain it.
> On Wed, Jul 12, 2017 at 10:29 AM, Thibaut Jombart <
> thibautjombart at gmail.com> wrote:
>> Hi Stephanie,
>> do your groups have very different sizes? If so this would explain the
>> discrepancy. When optimizing cross validation on each group, you basically
>> make sure that every group is predicted as well as possible. When using
>> overall classification, the largest group really is what gets optimized.
>> Dr Thibaut Jombart
>> Lecturer, Department of Infectious Disease Epidemiology, Imperial College
>> Head of RECON: repidemicsconsortium.org
>> WHO Consultant - outbreak analysis
>> Twitter: @TeebzR
>> +44(0)20 7594 3658 <+44%2020%207594%203658>
>> On 29 June 2017 at 22:20, Stephanie Coster <stephanie.coster at gmail.com>
>>> Thanks in advance!
>>> I have a dataset of genotypes from microsatellite loci and I'm looking
>>> to analyze population structure. Program STRUCTURE shows essentially no
>>> clusters, and I'd like to use DAPC to get another perspective.
>>> I've run the 'find.clusters' code and the BIC suggests K=2, but the
>>> assignments are unreliable (equal assignments across all sites to both
>>> clusters). I am interpreting this to mean that K=1 and all samples likely
>>> form a single cluster.
>>> Now, I'd like to use my apriori site groupings to draw a scatterplot and
>>> am using the DAPC web server to cross-validate and suggest the number of
>>> PCs to retain. I get notably different scatterplots depending on whether I
>>> choose 'group' or 'overall' to assess. The sites have more spatial
>>> differentiation when using 'group', and essentially all overlap when using
>>> 'overall'. I understand that success is calculated by my groups or overall
>>> depending on the choice, but what does this mean in application? Can
>>> someone help explain why these plots differ and which is better to use?
>>> Many thanks!
>>> adegenet-forum mailing list
>>> adegenet-forum at lists.r-forge.r-project.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the adegenet-forum