[adegenet-forum] compoplot, STRUCTURE, and the analysis of a hybrid zone
Jombart, Thibaut
t.jombart at imperial.ac.uk
Tue Feb 12 16:12:40 CET 2013
Hi Stefano,
thanks for reposting on the forum. It gives me the chance to clarify an important point.
For the first point, there is not a linear relationship between 'stability' of DAPC results and the number of PCs retained in the PCA step. 'xxx' PCs can represent 2% of the variance in one analysis and 60% in another. If the two data table have fairly comparable dimensions, it would be best to retain roughly the same proportion of variance. If their dimensions are very different, then the same number of PCs makes sense.
STRUCTURE or similar approaches have a model which partitions genotypes into groups. It is basically a mixture distribution problem with a multinomial distribution for each locus and group. So the 'admixture' coefficient has a a straightforward biological interpretation.
In DAPC, assignment of individuals to groups using the discriminant functions are based on a geometric criteria. In other words, "tell me where you are in the discriminant space, I will tell you the probability that you belong to groups xxx, yyy and zzz". This is of course dependent on the discriminant space. The more dimensions retained in the PCA step, the easier it is the find a space providing perfect discrimination. The obtained group membership probabilities can reflect admixture, but they do not represent the proportion of the genome assigned to a given group. In your case, use a smaller space, you may start seeing less clear-cut group definition. optim.a.score may help selecting the number of PCs.
Cheers
Thibaut.
________________________________________
From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Stefano Montanari [stefanomontanari at gmail.com]
Sent: 11 February 2013 21:58
To: adegenet-forum at lists.r-forge.r-project.org
Subject: [adegenet-forum] compoplot, STRUCTURE, and the analysis of a hybrid zone
Dear Dr. Jombart,
I hope this email finds you well. We have exchanged thoughts before, and I wish to thank you for having gotten back to me in the past.
I have been going through your latest vignette about dapc in adegenet (Nov 2012). I have used dapc on a butterflyfish hybrid zone in the past (Montanari et al 2012, Ecology and Evolution), and now I am going through a second dataset, and would like to compare the 2. Hence, I have a couple of questions for you:
- am I correct in thinking that I want the same level of stability between the 2 analyses if I am to compare the results? (eg, in both have retained PCs = N/3)
- in your tutorial you mention that dapc$posterior used to construct compoplot are not the same as structure admixture coefficients. Could you point me in a direction that would allow me to understand how they are not? I have run the results through structure and the hybrids show up nicely as 50/50 clustred with parent 1 and 2 (k=2). adegenet also reckons that k=2 should be the best, but the compoplot shows no membership misassignment (even if the # of PCs is conservative). Do you have any suggestions as to why?
Hoping to have been clear enough and not to have bored you senseless, I look forward to hearing back from you.
Best regards,
Stef
--------------------------
Stefano R. Montanari
PhD Candidate
James Cook University
School of Marine and Tropical Biology
ATSIP (Building 145 James Cook Drive)
4811 Townsville QLD
stefanomontanari at gmail.com<mailto:stefanomontanari at gmail.com>
Work: +61 7 4781 5441
Mob: +61 404 736 509
More information about the adegenet-forum
mailing list