[adegenet-forum] Interpretation of DAPC results
Jombart, Thibaut
t.jombart at imperial.ac.uk
Wed Oct 14 13:02:18 CEST 2015
Hi Kirsty,
going through all your analyses to check that they are correct is asking for more time than most people will be ready to contribute on any given forum, especially since the outcome will benefit to a single person as opposed to the community. You are more likely to get help asking short specific questions, or tackling general issues.
DAPC is quite extensively documented so following the steps of the online tutorial should be relatively safe. If you have specific questions then you are more than welcome to post them here forum.
After going quickly over the graphs: it looks like 3 clusters might be a good summary of your data, and that most of your data fits in a fairly small space (adding PCs past the 4th one don't add much explained variance).
Cheers
Thibaut
==============================
Dr Thibaut Jombart
MRC Centre for Outbreak Analysis and Modelling
Department of Infectious Disease Epidemiology
Imperial College - School of Public Health
Norfolk Place, London W2 1PG, UK
Tel. : 0044 (0)20 7594 3658
http://sites.google.com/site/thibautjombart/
http://sites.google.com/site/therepiproject/
http://adegenet.r-forge.r-project.org/
Twitter: @thibautjombart
________________________________
From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Kirsty Medcalf [kirsty.m.medcalf at gmail.com]
Sent: 11 October 2015 07:25
To: adegenet-forum at lists.r-forge.r-project.org
Subject: [adegenet-forum] Interpretation of DAPC results
Dear Jombert and the adegenet forum,
I have used the function find.clusters and conducted a DAPC analysis to create a scatterplot. In addition to conducting a cross validation using a 70 % training set over 30 repeats using the function xvalDapc. If this is possible, I was wondering if I could please ask for advice regarding the interpretation of my results. If this is possible, then I would be deeply appreciative and would hold you in the highest regard.
The code can be found on my stack overflow page and the output figures are attached.
http://stackoverflow.com/questions/32704902/discriminant-analysis-of-principal-components-and-how-to-graphically-show-the-di
After running a DAPC analysis, as well as a basic PCA and LDA analysis. The PCA output describes two PC's to explain the models variance, and the LDA found one discriminant function to describe the variance of the data. My data is a multivariate analysis rather than an exploration of genetic clusters. The reproducible data can be found in the link above. If this is possible, I was wondering if anyone can explain why the DAPC found more structure in the data by finding 3 clusters. More specifically I was wondering if anyone would mind reading my stack overflow page to see if I have followed the correct steps to create an accurate model.
My code for the cross validation (below) shows the probability of assigning the correct PC's more than random chance. Would I be right in saying that the accuracy rate of selecting the right number of PC's is 61 % and that the most optimal model would only contain an assignment of only one PC. My goal is to completely understand each step of this analysis.
Thank you so much for your patience If you read the whole content of this post if someone has the ability to provide advice regarding the interpretation of these result, then thank you in advance.
Best wishes
Kirsty
My cross validation code and results are:
xval <- xvalDapc(x, grp1$grp, n.pca.max = 2, training.set = 0.7,
result = "groupMean", center = TRUE, scale = FALSE,
n.pca = NULL, n.rep = 30, xval.plot = TRUE)
$`Cross-Validation Results`
n.pca success
1 1 0.5833333
2 1 0.6000000
3 1 0.6000000
4 1 0.6666667
5 1 0.5833333
6 1 0.6666667
7 1 0.6000000
8 1 0.5833333
9 1 0.6666667
10 1 0.5833333
11 1 0.6000000
12 1 0.5833333
13 1 0.5833333
14 1 0.6666667
15 1 0.6000000
16 1 0.6666667
17 1 0.5833333
18 1 0.6666667
19 1 0.5833333
20 1 0.6000000
21 1 0.6666667
22 1 0.6666667
23 1 0.5833333
24 1 0.4666667
25 1 0.6666667
26 1 0.6666667
27 1 0.5166667
28 1 0.6666667
29 1 0.6000000
30 1 0.5166667
$`Median and Confidence Interval for Random Chance`
2.5% 50% 97.5%
0.2360938 0.3270833 0.4355208
$`Mean Successful Assignment by Number of PCs of PCA`
1
0.6094444
$`Number of PCs Achieving Highest Mean Success`
[1] "1"
$`Root Mean Squared Error by Number of PCs of PCA`
1
0.3939708
$`Number of PCs Achieving Lowest MSE`
[1] "1"
$DAPC
#################################################
# Discriminant Analysis of Principal Components #
#################################################
class: dapc
$call: dapc.data.frame(x = x, grp = grp, n.pca = n.pca, n.da = n.da)
$n.pca: 1 first PCs of PCA used
$n.da: 1 discriminant functions saved
$var (proportion of conserved variance): 0.605
$eig (eigenvalues): 54.9 vector length content
1 $eig 1 eigenvalues
2 $grp 80 prior group assignment
3 $prior 3 prior group probabilities
4 $assign 80 posterior group assignment
5 $pca.cent 12 centring vector of PCA
6 $pca.norm 12 scaling vector of PCA
7 $pca.eig 12 eigenvalues of PCA
data.frame nrow ncol content
1 $tab 80 1 retained PCs of PCA
2 $means 3 1 group means
3 $loadings 1 1 loadings of variables
4 $ind.coord 80 1 coordinates of individuals (principal components)
5 $grp.coord 3 1 coordinates of groups
6 $posterior 80 3 posterior membership probabilities
7 $pca.loadings 12 1 PCA loadings of original variables
8 $var.contr 12 1 contribution of original variables
Kirsty Medcalf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20151014/51e9486c/attachment.html>
More information about the adegenet-forum
mailing list