[adegenet-forum] a.score versus cross validation and number of discriminant functions to retain

Jombart, Thibaut t.jombart at imperial.ac.uk
Wed Oct 21 12:30:21 CEST 2015


Hi,

in bulk:
#1 yes, by default colors are taken from a palette with one color per group

#2 if you have clusters defined already this graph may not be very useful; it just compares previous cluster definition to Kmean's

#3 ?scatter.dapc -> argument 'col', which you are using already

#4 there are K-1 discriminant functions, so '300' will just retain K-1

#5 if in doubt, use Xval - more advanced and easier to interpret; in your case your data are very well separated in just a few dimensions; 10 PCs should do the trick


As for your extra question on eigenvalues, they are stored in the $eig of the object. Please do read the tutorial as it is described there.

Cheers
Thibaut

________________________________
From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Ella Bowles [ebowles at ucalgary.ca]
Sent: 20 October 2015 19:45
To: adegenet-forum at lists.r-forge.r-project.org
Subject: Re: [adegenet-forum] a.score versus cross validation and number of discriminant functions to retain

ps Also, which function do I use to get numeric values for the percentage of variation that is explained by the two principle components that are reflected on the scatter plot?

with thanks

On Tue, Oct 20, 2015 at 12:40 PM, Ella Bowles <ebowles at ucalgary.ca<mailto:ebowles at ucalgary.ca>> wrote:
Hello,

I think I have worked my way through a DAPC analysis, and it's pretty neat. I have five questions though. By way of background, I am using a SNP dataset with 11 putative populations (clusters), containing 4099 SNPs. I've converted a structure file to genInd, and am using that.

1) Am I correct in understanding that the number of clusters you find should inform the number of colours that you list for your DAPC plot?

2) I'm not quite sure how to interpret the following. How do I know if the fit is good?

[Inline image 1]

​3 and 4) Is there a function that I can use to correlate the colours with my original populations. I do have this information in the datafile that I fed in. And, does 300 sound reasonable for the number of discriminant functions to retain?
> dapc1 <- dapc(data_full, NumClust$grp)
Choose the number PCs to retain (>=1): 40
Choose the number discriminant functions to retain (>=1): 300
#making colours for 9 clusters, since optimal k was 9 with the data containing zeros
myCol <- c("red", "orange", "yellow", "green", "blue", "purple", "violet", "grey", "brown")
scatter(dapc1, scree.da=FALSE, bg="white", pch=20, cell=0, cstar=0, col=myCol, solid=.4, cex=1, clab=0, leg=TRUE, txt.leg=paste("Cluster", 1:9))
[Inline image 2]​
​
5) I don't really understand the difference between the optim a score and the cross validation analyses. Both seem to be determining what is the best number of PCs to retain. However, they give very different results. Am I misunderstanding what they are?
#for "data_full" dataset
dapc2 <- dapc(data_full, n.da=300, n.pca=50)

temp <- optim.a.score(dapc2)

#graph shows that highest alpha seems to be 8
​[Inline image 3]​
​#cross-validation for number of PCs to retain –can only do using data_full (this is called “mat” here), couldn’t get it to work using data with zeros
mat <- scaleGen(data, NA.method="mean")
grp <- pop(data)


xval <- xvalDapc(mat, grp, n.pca.max = 100, training.set = 0.9, result = "groupMean", center = TRUE, scale = FALSE, n.pca = NULL, n.rep = 30, xval.plot = TRUE)

xval[2:6]

#results
Confidence Interval for Random Chance`
      2.5%        50%      97.5%
0.05659207 0.09212947 0.14164194

$`Mean Successful Assignment by Number of PCs of PCA`
       10        20        30        40        50        60        70        80        90
0.8409091 0.8348485 0.8439394 0.8530303 0.8136364 0.8227273 0.8000000 0.8075758 0.8075758

$`Number of PCs Achieving Highest Mean Success`
[1] "40"

$`Root Mean Squared Error by Number of PCs of PCA`
       10        20        30        40        50        60        70        80        90
0.1702777 0.1770200 0.1649359 0.1607061 0.2007218 0.1864929 0.2138458 0.2051338 0.2074707

$`Number of PCs Achieving Lowest MSE`
[1] "40"
[Inline image 4]​

​Thank you very much for your time, and sincerely,
Ella Bowles​

--
Ella Bowles
PhD Candidate
Biological Sciences
University of Calgary

e-mail: ebowles at ucalgary.ca<mailto:ebowles at ucalgary.ca>, bowlese at gmail.com<mailto:bowlese at gmail.com>
website: http://ellabowlesphd.wordpress.com/



--
Ella Bowles
PhD Candidate
Biological Sciences
University of Calgary

e-mail: ebowles at ucalgary.ca<mailto:ebowles at ucalgary.ca>, bowlese at gmail.com<mailto:bowlese at gmail.com>
website: http://ellabowlesphd.wordpress.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20151021/7ee1b267/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 12190 bytes
Desc: image.png
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20151021/7ee1b267/attachment-0004.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 46303 bytes
Desc: image.png
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20151021/7ee1b267/attachment-0005.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 14492 bytes
Desc: image.png
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20151021/7ee1b267/attachment-0006.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 20171 bytes
Desc: image.png
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20151021/7ee1b267/attachment-0007.png>


More information about the adegenet-forum mailing list