<div dir="ltr">Hi Brian<div><br></div><div>thanks for reposting your question here. I am assuming that by 'DAPC' you mean the K-means clustering presented in the DAPC paper, not the factorial method itself. It is an interesting topic, and there are many possible answers. I'll try to mention a few.</div><div><br></div><div>snapclust uses HW to compute the likelihood, like most other model-based (likelihood, bayesian) clustering methods I know of. Similarly, it assumes independence of loci, as that: (global log-likelihood) = sum(likelihood of every loci)</div><div><br></div><div>Deviation from HW and linkage between loci will have the same kind of effect: the computed likelihood will be an approximation of the true, unknown likelihood. How good the approximation is in a particular case? I don't think we know, in general, but I'd like to see such a study published. And then, the next question is: how does it change the clustering solution? Again, more work would be interesting on this topic. </div><div><br></div><div>I suspect attitudes will vary, pretty much depending on whether one decides to be purist or pragmatic. As an anecdote, developing various Bayesian of ML methods, it happened several times to realise the likelihood was 'wrong' (coding error), sometimes even one full component of the likelihood was entirely left out, and the reason I had not flagged it out before was results were still okay. Similarly, a linear regression may still give sensible results despite non-normally distributed results. k-means clustering is often used without checking that groups have similar within-group variances. And ML phylogenies from full alignments are commonplace, while the likelihood also assumes independence of loci - see Joe Felsenstein's cheeky comment on that in his pruning algorithm paper. </div><div><br></div><div>In short: it could be a problem, but we (at least, I) don't know which impact it'll have. I know, disappointing. My 2 cents would be:</div><div>- fairly evenly distributed LD: snapclust should be fine</div><div>- a bit of clonality mixed up with some recombination / sexual reproduction: should be worth looking at</div><div>- full clonality: work on haplotype frequencies / MLST type of markers (see apex package), and then snapclust will be fine</div><div>- never rely on a single method if you can avoid it; I like using a hierarchical clustering and further exploration using factorial methods (PCA, DAPC) as a complement</div><div><br></div><div>Please feel free to comment / discuss, everyone. I might put this in a podcast, time allowing.</div><div><br></div><div>Best</div><div>Thibaut</div><div><br></div></div><div class="gmail_extra"><br clear="all"><div><div class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><br>--<br>Dr Thibaut Jombart<br>Lecturer, Department of Infectious Disease Epidemiology, Imperial College London<br>Head of RECON: <a href="http://repidemicsconsortium.org" target="_blank">repidemicsconsortium.org</a><br>WHO Consultant - outbreak analysis</div><div><a href="https://thibautjombart.netlify.com" target="_blank">https://thibautjombart.netlify.com</a><br>Twitter: @TeebzR<br>+44(0)20 7594 3658</div></div></div></div>
<br><div class="gmail_quote">On 16 February 2018 at 15:57, brian knaus <span dir="ltr"><<a href="mailto:briank.lists@gmail.com" target="_blank">briank.lists@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div>Hi and congrats on your snapclust paper! I was thinking of trying the method on a couple of projects I'm working on. However, I work with fungi and fungus-like plant pathogens that exhibit a mixture of reproductive modes (e.g., selfing, clonality, mitotic reproduction). This means that we do not necessarily expect Hardy-Weinberg assumptions to be met. Your manual seems to come out pretty early stating that HW is important. I would guess that linkage disequilibrium (non-independence of loci) may be an issue also. So this raises my question: in systems where HW may not be assumed and where there may be linkage disequilibrium would I be better of using DAPC than snapclust?<br><br></div>Thanks!<span class="HOEnZb"><font color="#888888"><br></font></span></div><span class="HOEnZb"><font color="#888888">Brian<br></font></span></div>
<br>______________________________<wbr>_________________<br>
adegenet-forum mailing list<br>
<a href="mailto:adegenet-forum@lists.r-forge.r-project.org">adegenet-forum@lists.r-forge.<wbr>r-project.org</a><br>
<a href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum" rel="noreferrer" target="_blank">https://lists.r-forge.r-<wbr>project.org/cgi-bin/mailman/<wbr>listinfo/adegenet-forum</a><br></blockquote></div><br></div>