[adegenet-forum] snapclust when HW is not expected

Fri Feb 16 23:56:06 CET 2018

Thank you for a very thoughtful response! I think a summary is that we can
bend the rules, just try not to break things. And I think that was a
message expressed by Pritchard's group. They had a paper where they used
STRUCTURE on Helicobacter pylori. I think an issue though is that there are
many in the biological community do not understand the methods well enough
to know if and when they may have gone too far. I appreciate your
recommendations, but for many of these projects we have a reason to expect
mixed mating modes, but we do not know how much of any particular mode to
expect. In fact, the research goal is frequently to infer mating mode. Or
perhaps which groups of samples may be outcrossing and which are not. I
suspect that might be a lot to ask for.

I appreciate your insights! And I find it encouraging that you would like
to see more work on this. Perhaps we'll get to that one day?
Brian

On Fri, Feb 16, 2018 at 9:24 AM, Thibaut Jombart <thibautjombart at gmail.com>
wrote:

> Hi Brian
>
> thanks for reposting your question here. I am assuming that by 'DAPC' you
> mean the K-means clustering presented in the DAPC paper, not the factorial
> method itself. It is an interesting topic, and there are many possible
> answers. I'll try to mention a few.
>
> snapclust uses HW to compute the likelihood, like most other model-based
> (likelihood, bayesian) clustering methods I know of. Similarly, it assumes
> independence of loci, as that: (global log-likelihood) = sum(likelihood of
> every loci)
>
> Deviation from HW and linkage between loci will have the same kind of
> effect: the computed likelihood will be an approximation of the true,
> unknown likelihood. How good the approximation is in a particular case? I
> don't think we know, in general, but I'd like to see such a study
> published. And then, the next question is: how does it change the
> clustering solution? Again, more work would be interesting on this topic.
>
> I suspect attitudes will vary, pretty much depending on whether one
> decides to be purist or pragmatic. As an anecdote, developing various
> Bayesian of ML methods, it happened several times to realise the likelihood
> was 'wrong' (coding error), sometimes even one full component of the
> likelihood was entirely left out, and the reason I had not flagged it out
> before was results were still okay. Similarly, a linear regression may
> still give sensible results despite non-normally distributed results.
> k-means clustering is often used without checking that groups have similar
> within-group variances. And ML phylogenies from full alignments are
> commonplace, while the likelihood also assumes independence of loci - see
> Joe Felsenstein's cheeky comment on that in his pruning algorithm paper.
>
> In short: it could be a problem, but we (at least, I) don't know which
> impact it'll have. I know, disappointing. My 2 cents would be:
> - fairly evenly distributed LD: snapclust should be fine
> - a bit of clonality mixed up with some recombination / sexual
> reproduction: should be worth looking at
> - full clonality: work on haplotype frequencies / MLST type of markers
> (see apex package), and then snapclust will be fine
> - never rely on a single method if you can avoid it; I like using a
> hierarchical clustering and further exploration using factorial methods
> (PCA, DAPC) as a complement
>
> Please feel free to comment / discuss, everyone. I might put this in a
> podcast, time allowing.
>
> Best
> Thibaut
>
>
>
> --
> Dr Thibaut Jombart
> Lecturer, Department of Infectious Disease Epidemiology, Imperial College
> London
> Head of RECON: repidemicsconsortium.org
> WHO Consultant - outbreak analysis
> https://thibautjombart.netlify.com
> Twitter: @TeebzR
> +44(0)20 7594 3658 <+44%2020%207594%203658>
>
> On 16 February 2018 at 15:57, brian knaus <briank.lists at gmail.com> wrote:
>
>> Hi and congrats on your snapclust paper! I was thinking of trying the
>> method on a couple of projects I'm working on. However, I work with fungi
>> and fungus-like plant pathogens that exhibit a mixture of reproductive
>> modes (e.g., selfing, clonality, mitotic reproduction). This means that we
>> do not necessarily expect Hardy-Weinberg assumptions to be met. Your manual
>> seems to come out pretty early stating that HW is important. I would guess
>> that linkage disequilibrium (non-independence of loci) may be an issue
>> also. So this raises my question: in systems where HW may not be assumed
>> and where there may be linkage disequilibrium would I be better of using
>> DAPC than snapclust?
>>
>> Thanks!
>> Brian
>>
>> _______________________________________________
>> adegenet-forum mailing list
>> adegenet-forum at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo
>> /adegenet-forum
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20180216/e4ecf2c2/attachment-0001.html>