[adegenet-forum] Interpretation of DAPC results

Kirsty Medcalf kirsty.m.medcalf at gmail.com
Sun Oct 11 08:25:14 CEST 2015


Dear Jombert and the adegenet forum,

I have used the function find.clusters and conducted a DAPC analysis to
create a scatterplot.  In addition to conducting a cross validation using a
70 % training set over 30 repeats using the function xvalDapc. If this is
possible, I was wondering if I could please ask for advice regarding the
interpretation of my results.   If this is possible, then I would be deeply
appreciative and would hold you in the highest regard.

The code can be found on my stack overflow page and the output figures are
attached.

http://stackoverflow.com/questions/32704902/discriminant-analysis-of-principal-components-and-how-to-graphically-show-the-di

After running a DAPC analysis, as well as a basic PCA and LDA analysis.
The PCA output describes two PC's to explain the models variance, and the
LDA found one discriminant function to describe the variance of the data.
My data is a multivariate analysis rather than an exploration of genetic
clusters. The reproducible data can be found in the link above. If this is
possible, I was wondering if anyone can explain why the DAPC found more
structure in the data by finding 3 clusters.  More specifically I was
wondering if anyone would mind reading my stack overflow page to see if I
have followed the correct steps to create an accurate model.

My code for the cross validation (below) shows the probability of assigning
the correct PC's more than random chance.  Would I be right in saying that
the accuracy rate of selecting the right number of PC's is 61 %  and that
the most optimal model would only contain an assignment of only one PC. My
goal is to completely understand each step of this analysis.


Thank you so much for your patience If you  read the whole content of this
post   if someone has the ability to provide advice regarding the
interpretation of these result, then thank you in advance.

Best wishes

Kirsty

My cross validation code and results are:

xval <- xvalDapc(x, grp1$grp, n.pca.max = 2, training.set = 0.7,
                 result = "groupMean", center = TRUE, scale = FALSE,
                 n.pca = NULL, n.rep = 30, xval.plot = TRUE)

$`Cross-Validation Results`
   n.pca   success
1      1 0.5833333
2      1 0.6000000
3      1 0.6000000
4      1 0.6666667
5      1 0.5833333
6      1 0.6666667
7      1 0.6000000
8      1 0.5833333
9      1 0.6666667
10     1 0.5833333
11     1 0.6000000
12     1 0.5833333
13     1 0.5833333
14     1 0.6666667
15     1 0.6000000
16     1 0.6666667
17     1 0.5833333
18     1 0.6666667
19     1 0.5833333
20     1 0.6000000
21     1 0.6666667
22     1 0.6666667
23     1 0.5833333
24     1 0.4666667
25     1 0.6666667
26     1 0.6666667
27     1 0.5166667
28     1 0.6666667
29     1 0.6000000
30     1 0.5166667

$`Median and Confidence Interval for Random Chance`
     2.5%       50%     97.5%
0.2360938 0.3270833 0.4355208

$`Mean Successful Assignment by Number of PCs of PCA`
        1
0.6094444

$`Number of PCs Achieving Highest Mean Success`
[1] "1"

$`Root Mean Squared Error by Number of PCs of PCA`
        1
0.3939708

$`Number of PCs Achieving Lowest MSE`
[1] "1"

$DAPC
#################################################
# Discriminant Analysis of Principal Components #
#################################################
class: dapc
$call: dapc.data.frame(x = x, grp = grp, n.pca = n.pca, n.da = n.da)

$n.pca: 1 first PCs of PCA used
$n.da: 1 discriminant functions saved
$var (proportion of conserved variance): 0.605

$eig (eigenvalues): 54.9  vector    length content
1 $eig      1      eigenvalues
2 $grp      80     prior group assignment
3 $prior    3      prior group probabilities
4 $assign   80     posterior group assignment
5 $pca.cent 12     centring vector of PCA
6 $pca.norm 12     scaling vector of PCA
7 $pca.eig  12     eigenvalues of PCA

  data.frame    nrow ncol content
1 $tab          80   1    retained PCs of PCA
2 $means        3    1    group means
3 $loadings     1    1    loadings of variables
4 $ind.coord    80   1    coordinates of individuals (principal components)
5 $grp.coord    3    1    coordinates of groups
6 $posterior    80   3    posterior membership probabilities
7 $pca.loadings 12   1    PCA loadings of original variables
8 $var.contr    12   1    contribution of original variables



Kirsty Medcalf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20151010/4f9461a6/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PCA2.jpeg
Type: image/jpeg
Size: 63078 bytes
Desc: not available
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20151010/4f9461a6/attachment-0003.jpeg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Biplot.jpeg
Type: image/jpeg
Size: 98022 bytes
Desc: not available
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20151010/4f9461a6/attachment-0004.jpeg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: DAPC.jpeg
Type: image/jpeg
Size: 195092 bytes
Desc: not available
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20151010/4f9461a6/attachment-0005.jpeg>


More information about the adegenet-forum mailing list