<html dir="ltr">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<style id="owaParaStyle" type="text/css">P {margin-top:0;margin-bottom:0;}</style>
</head>
<body ocsi="0" fpstyle="1" bgcolor="#FFFFFF">
<div style="direction: ltr;font-family: Tahoma;color: #000000;font-size: 10pt;">Hi there,
<br>
<br>
there is a bunch of questions there, and I may miss one or two. <br>
<br>
In a nutshell:<br>
<br>
- It happens that k-means finds clusters where STRUCTURE fails (see original paper); this is not necessarily a sign that find.clusters is wrong; in your case, for the microsat data, it looks like if there are any clusters these are not linked to the geographical
locations; hard to say more without seeing outputs/the data<br>
<br>
- The graph of your second analysis (SNPs) shows no structure. k=1 is not nonsensical, it is just a suggestion that there are no clusters in your data.<br>
<br>
- xvalDapc has not been implemented (yet) for genlight objects; to convert data into a suitable format try as.matrix(...).<br>
<br>
- cross validation is to be preferred to the a-score<br>
<br>
- MDS is not a clustering method<br>
<br>
- MDS optimizes overall diversity so may fail to detect group structure<br>
<br>
Cheers<br>
Thibaut<br>
<div><br>
<div style="font-family:Tahoma; font-size:13px">
<div class="BodyFragment"><font size="2"><span style="font-size:10pt">
<div style="font-family: Times New Roman; color: #000000; font-size: 16px">
<hr tabindex="-1">
<div style="direction: ltr;" id="divRpF844449"><font color="#000000" face="Tahoma" size="2"><b>From:</b> adegenet-forum-bounces@lists.r-forge.r-project.org [adegenet-forum-bounces@lists.r-forge.r-project.org] on behalf of Peri Bolton [peri.bolton@students.mq.edu.au]<br>
<b>Sent:</b> 30 October 2015 11:40<br>
<b>To:</b> adegenet-forum@lists.r-forge.r-project.org<br>
<b>Subject:</b> [adegenet-forum] Very different number of clusters in different datasets.<br>
</font><br>
</div>
<div>Dear adegenet developers and users,<br>
<br>
I have a dataset with 50 individuals across 5 sampling locations in a microsatellite dataset, and roughly equivalent numbers of individuals in a SNP dataset with 3839 loci.<br>
I have just been interested in finding whether there is any population structure in my species. However, when I run the different datasets I get different answers, and some of them look strange.
<br>
<br>
microsatellite dataset. <br>
Fst, mantel test for IBD and STRUCTURE both find zero evidence of structure...<br>
<br>
find.clusters says k=4 or 5<br>
then I run optima.a.score and xvalDapc to find the best number of PCs to retain for a dapc, and I have nice groups in the final answer, with apparently good assignment power back to the original groups.
<br>
However, my alpha scores for that dapc run is as follows<br>
1 2 3 4 <br>
0.4905714 0.5570149 0.7075510 0.5962500 <br>
<br>
Further, when I visualise this as a compoplot there is no evidence that these structures actually represent any kind of geographic structure in the data, as the groups are just randomly dispersed through my individuals.
<br>
<br>
I have read on topics in the forums that if there is enough space in the data it will find an optimal clustering solution, no matter whether it is biologically realistic. I have also read that find.clusters shouldn't find an optimal solution for k=1 because
it is meant to be a non-sense solution for a cluster. Indeed this makes sense because when you use sampling locality as a prior in dapc it all comes out as one big cluster.<br>
<br>
HOWEVER, when I run my SNP dataset things get really strange. <br>
<br>
I ran essentially all the same procedures and I've come up against a number of hurdles:<br>
<br>
1. I can't get the xvalDapc to work on a genlight object. I keep getting an error:
<br>
<br>
Error in as.data.frame.default(x[[i]], optional = TRUE) : <br>
cannot coerce class "structure("SNPbin", package = "adegenet")" to a data.frame<br>
In addition: Warning message:<br>
In min(dim(x)) : no non-missing arguments to min; returning Inf<br>
<br>
Obviously this is because genlight doesn't store the genetic data in the same way as the genind objects do. Is there a work around for using this function?<br>
<br>
So far I have got xvalDapc to work on my genind objects, but I do get a bunch of "warning messages "49: In if (result == "overall") { ... :<br>
the condition has length > 1 and only the first element will be used", but it seems to spit out an output at least....<br>
<br>
2. when I run find.clusters my cumulative variance plot is nearly linear... as is my BICvsK plot, with the optimal solution being the supposedly non-sensical k=1 (see the attached pdf of the output)? Is there something weird with my data? Or, is that the genuine
signal coming through? When I use other clustering methods such as fastSTRUCTURE and mds I don't get any indication of structure either. HOWEVER, I don't know how to reconcile the two clustering solutions from the two nuclear data sources.<br>
<br>
3. When I run an a.score analysis it is basically a flat line, and although it finds an "optimal" pca retention it doesn't seem very reliable to me (see also attached)<br>
<br>
<br>
<br>
So I am aware that there are a few problems there, but hopefully the itemisation and the context of my questions help any good hearted helping people out there.
<br>
<br>
Sincerely,<br>
<br>
Peri<br>
<br>
<div class="moz-signature">-- <br>
<b>Peri Bolton</b> <br>
PhD Candidate, <a href="http://bio.mq.edu.au/avianbehaviouralecology/" target="_blank">
Griffith Lab </a><br>
Department of Biological Sciences <br>
Macquarie University, NSW 2109, Australia <br>
</div>
</div>
</div>
</span></font></div>
</div>
</div>
</div>
</body>
</html>