[adegenet-forum] DaPC vs. BAPS results question
Thibaut Jombart
thibautjombart at gmail.com
Tue Dec 6 18:06:35 CET 2016
Not really. In situation like this as in most cases, there is no true K -
only some clustering solutions are a more efficient caricature of the data
than others.
In this case, K=2, 3, ... 10 are all equivalently good caricatures.
Cheers
Thibaut
--
Dr Thibaut Jombart
Lecturer, Department of Infectious Disease Epidemiology, Imperial College
London
Head of RECON: repidemicsconsortium.org
sites.google.com/site/thibautjombart/
github.com/thibautjombart
Twitter: @TeebzR <http://twitter.com/TeebzR>
On 6 December 2016 at 16:59, Felipe Hernández <fhernandeu at uc.cl> wrote:
> Ok, thanks! So just putting attention in the lower k-mean value doesn't
> relate to the more likely number of clusters at the end? Ultimately, may
> K=5 be considered as the most probable number of genetic clusters explained
> by my dataset, or should I consider other factors too? I tried your
> suggestions and see what I can get. Thanks!
>
> Best,
>
> 2016-12-06 11:13 GMT-05:00 Thibaut Jombart <thibautjombart at gmail.com>:
>
>> Hello,
>>
>> the results will be a bit more stable if you increase the number of
>> starting points for the k-means (see arg. n.start).
>>
>> It should not really impact the outcome though: here, any K from 2 to 12
>> is an equally good solution, at least as judged by the BIC.
>>
>> Cheers
>> Thibaut
>>
>>
>> --
>> Dr Thibaut Jombart
>> Lecturer, Department of Infectious Disease Epidemiology, Imperial
>> College London
>> Head of RECON: repidemicsconsortium.org
>> sites.google.com/site/thibautjombart/
>> github.com/thibautjombart
>> Twitter: @TeebzR <http://twitter.com/TeebzR>
>>
>> On 6 December 2016 at 15:17, Felipe Hernández <fhernandeu at uc.cl> wrote:
>>
>>> Thanks Thibaut,
>>>
>>> Here you have the image and values for each estimated K. Any advice is
>>> more than welcome, thanks!
>>>
>>> Best,
>>> Felipe
>>>
>>> > grp
>>> $Kstat
>>> K=1 K=2 K=3 K=4 K=5 K=6 K=7 K=8
>>> 1494.756 1481.467 1473.864 1472.002 1470.633 1472.970 1470.754 1472.011
>>> K=9 K=10 K=11 K=12 K=13 K=14 K=15 K=16
>>> 1471.813 1473.632 1473.924 1476.759 1476.699 1475.433 1479.546 1481.119
>>> K=17 K=18 K=19 K=20 K=21 K=22 K=23 K=24
>>> 1481.292 1485.865 1488.130 1488.356 1493.552 1494.979 1501.182 1499.258
>>> K=25 K=26 K=27 K=28 K=29 K=30 K=31 K=32
>>> 1500.146 1504.113 1511.598 1511.550 1513.889 1516.275 1522.144 1524.733
>>> K=33 K=34 K=35 K=36 K=37 K=38 K=39 K=40
>>> 1528.089 1530.409 1535.778 1538.049 1541.269 1546.197 1547.656 1552.127
>>>
>>> $stat
>>> K=5
>>> 1470.633
>>>
>>>
>>>
>>> 2016-12-05 10:10 GMT-05:00 Thibaut Jombart <thibautjombart at gmail.com>:
>>>
>>>> Dear Felipe,
>>>>
>>>> this is always a hard question, as different methods essentially do..
>>>> different things. The K-means in find.clusters optimizes the variance
>>>> between groups, while BAPS maximizes a likelihood function under a
>>>> given population genetics model. So it may be the case that you have
>>>> ~17 demes roughly at HWE, but that only 4-5 groups are optimum in
>>>> terms of clearly delineated groups. And this is assuming both methods
>>>> are 'right'. They may be prone to all sorts of biases. Namely, largely
>>>> different group variances for the K-means, and deviations from the
>>>> original model in BAPS.
>>>>
>>>> Feel free to post the image (or a link to it) of the BIC for
>>>> find.clusters if you want a 2-cents advice on the number of K to look
>>>> at.
>>>>
>>>> Best
>>>> Thibaut
>>>>
>>>> --
>>>> Dr Thibaut Jombart
>>>> Lecturer, Department of Infectious Disease Epidemiology, Imperial
>>>> College London
>>>> Head of RECON: repidemicsconsortium.org
>>>> sites.google.com/site/thibautjombart/
>>>> github.com/thibautjombart
>>>> Twitter: @TeebzR
>>>>
>>>>
>>>> On 5 December 2016 at 14:29, Felipe Hernández <fhernandeu at uc.cl> wrote:
>>>> > Good morning,
>>>> >
>>>> > I wonder if you may guide me with this question (that may be pretty
>>>> basic
>>>> > surely). After a run DaPC analysis using adegenet, I'm usually
>>>> getting K
>>>> > between 4 and 5 for my dataset (480 hogs, 59 microsats, 39 sampling
>>>> sites).
>>>> > Maximum number of clusters tried are 40. Afterwards, I tried to
>>>> estimate
>>>> > number of clusters (spatial clustering by individuals) using another
>>>> > software (BAPS 6.0), but I got an even higher number of estimated
>>>> cluster
>>>> > (K=17), after testing different maximum number of K's (i.e., K=5
>>>> through
>>>> > K=20). Any clue about what's the reason of this? Maybe related to the
>>>> > maximum number of cluster tested? Or, linkage disequilibrium between
>>>> some
>>>> > loci? Sorry if the question is really basic, but I would appreciate
>>>> any
>>>> > advice.
>>>> >
>>>> > Regards,
>>>> > Felipe
>>>> >
>>>> > --
>>>> > Felipe Hernández
>>>> > Médico Veterinario (DVM), MSc.
>>>> > PhD. Candidate
>>>> > Interdisciplinary Ecology Program
>>>> > School of Natural Resources and Environment
>>>> > Wildlife Ecology and Conservation Department
>>>> > University of Florida
>>>> >
>>>> > _______________________________________________
>>>> > adegenet-forum mailing list
>>>> > adegenet-forum at lists.r-forge.r-project.org
>>>> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo
>>>> /adegenet-forum
>>>>
>>>
>>>
>>>
>>> --
>>> Felipe Hernández
>>> Médico Veterinario (DVM), MSc.
>>> PhD. Candidate
>>> Interdisciplinary Ecology Program
>>> School of Natural Resources and Environment
>>> Wildlife Ecology and Conservation Department
>>> University of Florida
>>>
>>
>>
>
>
> --
> Felipe Hernández
> Médico Veterinario (DVM), MSc.
> PhD. Candidate
> Interdisciplinary Ecology Program
> School of Natural Resources and Environment
> Wildlife Ecology and Conservation Department
> University of Florida
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20161206/2e0d24b4/attachment.html>
More information about the adegenet-forum
mailing list