[adegenet-forum] compoplot, STRUCTURE, and the analysis of a hybrid zone
Jombart, Thibaut
t.jombart at imperial.ac.uk
Thu Feb 14 10:37:38 CET 2013
Hello,
Why 'not possible since PC >=2'? You can choose to retain only on PC if you wish.
This suggests that the first PC of the second analysis already contains all the between-group discrimination.
Cheers
Thibaut
________________________________________
From: Stefano Montanari [stefanomontanari at gmail.com]
Sent: 13 February 2013 01:36
To: Jombart, Thibaut
Cc: adegenet-forum at lists.r-forge.r-project.org
Subject: Re: [adegenet-forum] compoplot, STRUCTURE, and the analysis of a hybrid zone
Hi Thibaut,
thank you for your prompt reply, it was very clear. Just a quick question about optim.a.score: I had used it before, and this morning I tried again just to make sure I remembered the results correctly. For one dataset (N=109, 12 loci) it finds that 17 PCs is the best; for the other (N=83, 20 loci), retaining only 1 PC (not possible since PC=>2) gives the highest a score. This worries me. Do you think these data should not be used for DAPC?
Cheers
Stef
--------------------------
Stefano R. Montanari
PhD Candidate
James Cook University
School of Marine and Tropical Biology
ATSIP (Building 145 James Cook Drive)
4811 Townsville QLD
stefanomontanari at gmail.com<mailto:stefanomontanari at gmail.com>
Work: +61 7 4781 5441
Mob: +61 404 736 509
On 13 February 2013 01:12, Jombart, Thibaut <t.jombart at imperial.ac.uk<mailto:t.jombart at imperial.ac.uk>> wrote:
Hi Stefano,
thanks for reposting on the forum. It gives me the chance to clarify an important point.
For the first point, there is not a linear relationship between 'stability' of DAPC results and the number of PCs retained in the PCA step. 'xxx' PCs can represent 2% of the variance in one analysis and 60% in another. If the two data table have fairly comparable dimensions, it would be best to retain roughly the same proportion of variance. If their dimensions are very different, then the same number of PCs makes sense.
STRUCTURE or similar approaches have a model which partitions genotypes into groups. It is basically a mixture distribution problem with a multinomial distribution for each locus and group. So the 'admixture' coefficient has a a straightforward biological interpretation.
In DAPC, assignment of individuals to groups using the discriminant functions are based on a geometric criteria. In other words, "tell me where you are in the discriminant space, I will tell you the probability that you belong to groups xxx, yyy and zzz". This is of course dependent on the discriminant space. The more dimensions retained in the PCA step, the easier it is the find a space providing perfect discrimination. The obtained group membership probabilities can reflect admixture, but they do not represent the proportion of the genome assigned to a given group. In your case, use a smaller space, you may start seeing less clear-cut group definition. optim.a.score may help selecting the number of PCs.
Cheers
Thibaut.
________________________________________
From: adegenet-forum-bounces at lists.r-forge.r-project.org<mailto:adegenet-forum-bounces at lists.r-forge.r-project.org> [adegenet-forum-bounces at lists.r-forge.r-project.org<mailto:adegenet-forum-bounces at lists.r-forge.r-project.org>] on behalf of Stefano Montanari [stefanomontanari at gmail.com<mailto:stefanomontanari at gmail.com>]
Sent: 11 February 2013 21:58
To: adegenet-forum at lists.r-forge.r-project.org<mailto:adegenet-forum at lists.r-forge.r-project.org>
Subject: [adegenet-forum] compoplot, STRUCTURE, and the analysis of a hybrid zone
Dear Dr. Jombart,
I hope this email finds you well. We have exchanged thoughts before, and I wish to thank you for having gotten back to me in the past.
I have been going through your latest vignette about dapc in adegenet (Nov 2012). I have used dapc on a butterflyfish hybrid zone in the past (Montanari et al 2012, Ecology and Evolution), and now I am going through a second dataset, and would like to compare the 2. Hence, I have a couple of questions for you:
- am I correct in thinking that I want the same level of stability between the 2 analyses if I am to compare the results? (eg, in both have retained PCs = N/3)
- in your tutorial you mention that dapc$posterior used to construct compoplot are not the same as structure admixture coefficients. Could you point me in a direction that would allow me to understand how they are not? I have run the results through structure and the hybrids show up nicely as 50/50 clustred with parent 1 and 2 (k=2). adegenet also reckons that k=2 should be the best, but the compoplot shows no membership misassignment (even if the # of PCs is conservative). Do you have any suggestions as to why?
Hoping to have been clear enough and not to have bored you senseless, I look forward to hearing back from you.
Best regards,
Stef
--------------------------
Stefano R. Montanari
PhD Candidate
James Cook University
School of Marine and Tropical Biology
ATSIP (Building 145 James Cook Drive)
4811 Townsville QLD
stefanomontanari at gmail.com<mailto:stefanomontanari at gmail.com><mailto:stefanomontanari at gmail.com<mailto:stefanomontanari at gmail.com>>
Work: +61 7 4781 5441<tel:%2B61%207%204781%205441>
Mob: +61 404 736 509<tel:%2B61%20404%20736%20509>
More information about the adegenet-forum
mailing list