<html dir="ltr">

<head>

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

<style id="owaParaStyle" type="text/css">P {margin-top:0;margin-bottom:0;}</style>

</head>

<body ocsi="0" fpstyle="1" bgcolor="#FFFFFF">

<div style="direction: ltr;font-family: Tahoma;color: #000000;font-size: 10pt;">Hi there,

<br>

<br>

there is a bunch of questions there, and I may miss one or two. <br>

<br>

In a nutshell:<br>

<br>

- It happens that k-means finds clusters where STRUCTURE fails (see original paper); this is not necessarily a sign that find.clusters is wrong; in your case, for the microsat data, it looks like if there are any clusters these are not linked to the geographical

 locations; hard to say more without seeing outputs/the data<br>

<br>

- The graph of your second analysis (SNPs) shows no structure. k=1 is not nonsensical, it is just a suggestion that there are no clusters in your data.<br>

<br>

- xvalDapc has not been implemented (yet) for genlight objects; to convert data into a suitable format try as.matrix(...).<br>

<br>

- cross validation is to be preferred to the a-score<br>

<br>

- MDS is not a clustering method<br>

<br>

- MDS optimizes overall diversity so may fail to detect group structure<br>

<br>

Cheers<br>

Thibaut<br>

<div><br>

<div style="font-family:Tahoma; font-size:13px">

<div class="BodyFragment"><font size="2"><span style="font-size:10pt">

<div style="font-family: Times New Roman; color: #000000; font-size: 16px">

<hr tabindex="-1">

<div style="direction: ltr;" id="divRpF844449"><font color="#000000" face="Tahoma" size="2"><b>From:</b> adegenet-forum-bounces@lists.r-forge.r-project.org [adegenet-forum-bounces@lists.r-forge.r-project.org] on behalf of Peri Bolton [peri.bolton@students.mq.edu.au]<br>

<b>Sent:</b> 30 October 2015 11:40<br>

<b>To:</b> adegenet-forum@lists.r-forge.r-project.org<br>

<b>Subject:</b> [adegenet-forum] Very different number of clusters in different datasets.<br>

</font><br>

</div>

<div>Dear adegenet developers and users,<br>

<br>

I have a dataset with 50 individuals across 5 sampling locations in a microsatellite dataset, and roughly equivalent numbers of individuals in a SNP dataset with 3839 loci.<br>

I have just been interested in finding whether there is any population structure in my species. However, when I run the different datasets I get different answers, and some of them look strange.

<br>

<br>

microsatellite dataset. <br>

Fst, mantel test for IBD and STRUCTURE both find zero evidence of structure...<br>

<br>

find.clusters says k=4 or 5<br>

then I run optima.a.score and xvalDapc to find the best number of PCs to retain for a dapc, and I have nice groups in the final answer, with apparently good assignment power back to the original groups.

<br>

However, my alpha scores for that dapc run is as follows<br>

        1         2         3         4 <br>

0.4905714 0.5570149 0.7075510 0.5962500 <br>

<br>

Further, when I visualise this as a compoplot there is no evidence that these structures actually represent any kind of geographic structure in the data, as the groups are just randomly dispersed through my individuals.

<br>

<br>

I have read on topics in the forums that if there is enough space in the data it will find an optimal clustering solution, no matter whether it is biologically realistic. I have also read that find.clusters shouldn't find an optimal solution for k=1 because

 it is meant to be a non-sense solution for a cluster. Indeed this makes sense because when you use sampling locality as a prior in dapc it all comes out as one big cluster.<br>

<br>

HOWEVER, when I run my SNP dataset things get really strange. <br>

<br>

I ran essentially all the same procedures and I've come up against a number of hurdles:<br>

<br>

1. I can't get the xvalDapc to work on a genlight object. I keep getting an error:

<br>

<br>

Error in as.data.frame.default(x[[i]], optional = TRUE) : <br>

cannot coerce class "structure("SNPbin", package = "adegenet")" to a data.frame<br>

In addition: Warning message:<br>

In min(dim(x)) : no non-missing arguments to min; returning Inf<br>

<br>

Obviously this is because genlight doesn't store the genetic data in the same way as the genind objects do. Is there a work around for using this function?<br>

<br>

So far I have got xvalDapc to work on my genind objects, but I do get a bunch of "warning messages  "49: In if (result == "overall") { ... :<br>

  the condition has length > 1 and only the first element will be used", but it seems to spit out an output at least....<br>

<br>

2. when I run find.clusters my cumulative variance plot is nearly linear... as is my BICvsK plot, with the optimal solution being the supposedly non-sensical k=1 (see the attached pdf of the output)? Is there something weird with my data? Or, is that the genuine

 signal coming through?  When I use other clustering methods such as fastSTRUCTURE and mds I don't get any indication of structure either. HOWEVER, I don't know how to reconcile the two clustering solutions from the two nuclear data sources.<br>

<br>

3. When I run an a.score analysis it is basically a flat line, and although it finds an "optimal" pca retention it doesn't seem very reliable to me (see also attached)<br>

<br>

<br>

<br>

So I am aware that there are a few problems there, but hopefully the itemisation and the context of my questions help any good hearted helping people out there.

<br>

<br>

Sincerely,<br>

<br>

Peri<br>

<br>

<div class="moz-signature">-- <br>

<b>Peri Bolton</b> <br>

PhD Candidate, <a href="http://bio.mq.edu.au/avianbehaviouralecology/" target="_blank">

Griffith Lab </a><br>

Department of Biological Sciences <br>

Macquarie University, NSW 2109, Australia <br>

</div>

</div>

</div>

</span></font></div>

</div>

</div>

</div>

</body>

</html>