[adegenet-forum] When cross-validating DAPC (using web server), is it best to use 'group' or 'overall'?

Thibaut Jombart thibautjombart at gmail.com
Wed Jul 12 17:54:26 CEST 2017


Good, so then the best choice sounds like the group otimization (rather
than overall optim).

Best

Thibaut


--
Dr Thibaut Jombart
Lecturer, Department of Infectious Disease Epidemiology, Imperial College
London
Head of RECON: repidemicsconsortium.org
WHO Consultant - outbreak analysis
sites.google.com/site/thibautjombart/
Twitter: @TeebzR
+44(0)20 7594 3658

On 12 July 2017 at 16:02, Stephanie Coster <stephanie.coster at gmail.com>
wrote:

> Yes, they do have different sizes. Thanks, that helps explain it.
>
> Stephanie
>
> On Wed, Jul 12, 2017 at 10:29 AM, Thibaut Jombart <
> thibautjombart at gmail.com> wrote:
>
>> Hi Stephanie,
>>
>> do your groups have very different sizes? If so this would explain the
>> discrepancy. When optimizing cross validation on each group, you basically
>> make sure that every group is predicted as well as possible. When using
>> overall classification, the largest group really is what gets optimized.
>>
>> Best
>> Thibaut
>>
>>
>> --
>> Dr Thibaut Jombart
>> Lecturer, Department of Infectious Disease Epidemiology, Imperial College
>> London
>> Head of RECON: repidemicsconsortium.org
>> WHO Consultant - outbreak analysis
>> sites.google.com/site/thibautjombart/
>> Twitter: @TeebzR
>> +44(0)20 7594 3658 <+44%2020%207594%203658>
>>
>> On 29 June 2017 at 22:20, Stephanie Coster <stephanie.coster at gmail.com>
>> wrote:
>>
>>> Thanks in advance!
>>>
>>> I have a dataset of genotypes from microsatellite loci and I'm looking
>>> to analyze population structure. Program STRUCTURE shows essentially no
>>> clusters, and I'd like to use DAPC to get another perspective.
>>>
>>> I've run the 'find.clusters' code and the BIC suggests K=2, but the
>>> assignments are unreliable (equal assignments across all sites to both
>>> clusters). I am interpreting this to mean that K=1 and all samples likely
>>> form a single cluster.
>>>
>>> Now, I'd like to use my apriori site groupings to draw a scatterplot and
>>> am using the DAPC web server to cross-validate and suggest the number of
>>> PCs to retain. I get notably different scatterplots depending on whether I
>>> choose 'group' or 'overall' to assess. The sites have more spatial
>>> differentiation when using 'group', and essentially all overlap when using
>>> 'overall'. I understand that success is calculated by my groups or overall
>>> depending on the choice, but what does this mean in application? Can
>>> someone help explain why these plots differ and which is better to use?
>>>
>>> Many thanks!
>>>
>>> Stephanie
>>>
>>>
>>>
>>> _______________________________________________
>>> adegenet-forum mailing list
>>> adegenet-forum at lists.r-forge.r-project.org
>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo
>>> /adegenet-forum
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20170712/add1bfda/attachment.html>


More information about the adegenet-forum mailing list