[adegenet-forum] Very different number of clusters in different datasets.

Peri Bolton peri.bolton at students.mq.edu.au
Fri Oct 30 12:40:55 CET 2015


Dear adegenet developers and users,

I have a dataset with 50 individuals across 5 sampling locations in a 
microsatellite dataset, and roughly equivalent numbers of individuals in 
a SNP dataset with 3839 loci.
I have just been interested in finding whether there is any population 
structure in my species. However, when I run the different datasets I 
get different answers, and some of them look strange.

microsatellite dataset.
Fst, mantel test for IBD and STRUCTURE both find zero evidence of 
structure...

find.clusters says k=4 or 5
then I run optima.a.score and xvalDapc to find the best number of PCs to 
retain for a dapc, and I have nice groups in the final answer, with 
apparently good assignment power back to the original groups.
However, my alpha scores for that dapc run is as follows
         1         2         3         4
0.4905714 0.5570149 0.7075510 0.5962500

Further, when I visualise this as a compoplot there is no evidence that 
these structures actually represent any kind of geographic structure in 
the data, as the groups are just randomly dispersed through my individuals.

I have read on topics in the forums that if there is enough space in the 
data it will find an optimal clustering solution, no matter whether it 
is biologically realistic. I have also read that find.clusters shouldn't 
find an optimal solution for k=1 because it is meant to be a non-sense 
solution for a cluster. Indeed this makes sense because when you use 
sampling locality as a prior in dapc it all comes out as one big cluster.

HOWEVER, when I run my SNP dataset things get really strange.

I ran essentially all the same procedures and I've come up against a 
number of hurdles:

1. I can't get the xvalDapc to work on a genlight object. I keep getting 
an error:

Error in as.data.frame.default(x[[i]], optional = TRUE) :
cannot coerce class "structure("SNPbin", package = "adegenet")" to a 
data.frame
In addition: Warning message:
In min(dim(x)) : no non-missing arguments to min; returning Inf

Obviously this is because genlight doesn't store the genetic data in the 
same way as the genind objects do. Is there a work around for using this 
function?

So far I have got xvalDapc to work on my genind objects, but I do get a 
bunch of "warning messages  "49: In if (result == "overall") { ... :
   the condition has length > 1 and only the first element will be 
used", but it seems to spit out an output at least....

2. when I run find.clusters my cumulative variance plot is nearly 
linear... as is my BICvsK plot, with the optimal solution being the 
supposedly non-sensical k=1 (see the attached pdf of the output)? Is 
there something weird with my data? Or, is that the genuine signal 
coming through?  When I use other clustering methods such as 
fastSTRUCTURE and mds I don't get any indication of structure either. 
HOWEVER, I don't know how to reconcile the two clustering solutions from 
the two nuclear data sources.

3. When I run an a.score analysis it is basically a flat line, and 
although it finds an "optimal" pca retention it doesn't seem very 
reliable to me (see also attached)



So I am aware that there are a few problems there, but hopefully the 
itemisation and the context of my questions help any good hearted 
helping people out there.

Sincerely,

Peri

-- 
*Peri Bolton*
PhD Candidate, Griffith Lab <http://bio.mq.edu.au/avianbehaviouralecology/>
Department of Biological Sciences
Macquarie University, NSW 2109, Australia
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20151030/0243b459/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: findclusters.pdf
Type: application/pdf
Size: 7623 bytes
Desc: not available
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20151030/0243b459/attachment-0001.pdf>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ascoreoptimisation.jpeg
Type: image/jpeg
Size: 65733 bytes
Desc: not available
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20151030/0243b459/attachment-0001.jpeg>


More information about the adegenet-forum mailing list