<html dir="ltr">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<style id="owaParaStyle" type="text/css">P {margin-top:0;margin-bottom:0;}</style>
</head>
<body ocsi="0" fpstyle="1">
<div style="direction: ltr;font-family: Tahoma;color: #000000;font-size: 10pt;"><br>
Hi, <br>
<br>
in bulk:<br>
#1 yes, by default colors are taken from a palette with one color per group<br>
<br>
#2 if you have clusters defined already this graph may not be very useful; it just compares previous cluster definition to Kmean's<br>
<br>
#3 ?scatter.dapc -> argument 'col', which you are using already<br>
<br>
#4 there are K-1 discriminant functions, so '300' will just retain K-1<br>
<br>
#5 if in doubt, use Xval - more advanced and easier to interpret; in your case your data are very well separated in just a few dimensions; 10 PCs should do the trick<br>
<br>
<br>
<div>
<div style="font-family:Tahoma; font-size:13px">
<div class="BodyFragment"><font size="2"><span style="font-size:10pt">
<div class="PlainText">As for your extra question on eigenvalues, they are stored in the $eig of the object. Please do read the tutorial as it is described there.<br>
<br>
Cheers<br>
Thibaut<br>
<br>
</div>
</span></font></div>
</div>
</div>
<div style="font-family: Times New Roman; color: #000000; font-size: 16px">
<hr tabindex="-1">
<div style="direction: ltr;" id="divRpF156376"><font color="#000000" face="Tahoma" size="2"><b>From:</b> adegenet-forum-bounces@lists.r-forge.r-project.org [adegenet-forum-bounces@lists.r-forge.r-project.org] on behalf of Ella Bowles [ebowles@ucalgary.ca]<br>
<b>Sent:</b> 20 October 2015 19:45<br>
<b>To:</b> adegenet-forum@lists.r-forge.r-project.org<br>
<b>Subject:</b> Re: [adegenet-forum] a.score versus cross validation and number of discriminant functions to retain<br>
</font><br>
</div>
<div></div>
<div>
<div dir="ltr">
<div class="gmail_default" style="font-size:large; color:#0000ff">ps Also, which function do I use to get numeric values for the percentage of variation that is explained by the two principle components that are reflected on the scatter plot?</div>
<div class="gmail_default" style="font-size:large; color:#0000ff"><br>
</div>
<div class="gmail_default" style="font-size:large; color:#0000ff">with thanks</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Tue, Oct 20, 2015 at 12:40 PM, Ella Bowles <span dir="ltr">
<<a href="mailto:ebowles@ucalgary.ca" target="_blank">ebowles@ucalgary.ca</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex; border-left:1px #ccc solid; padding-left:1ex">
<div dir="ltr">
<div class="gmail_default">
<p class="MsoNormal" style="color:rgb(0,0,255); font-size:large; margin-bottom:0.0001pt">
<span style="font-size:10pt; font-family:'Times New Roman',serif">Hello,</span></p>
<p class="MsoNormal" style="color:rgb(0,0,255); font-size:large; margin-bottom:0.0001pt">
<span style="font-family:'Times New Roman',serif; font-size:10pt"><br>
</span></p>
<p class="MsoNormal" style="color:rgb(0,0,255); font-size:large; margin-bottom:0.0001pt">
<span style="font-family:'Times New Roman',serif; font-size:10pt">I think I have worked my way through a DAPC analysis, and it's pretty neat. I have five questions though. </span><span style="font-family:'Times New Roman',serif; font-size:10pt">By way of background,
I am using a SNP dataset with 11 putative populations (clusters), containing 4099 SNPs. I've converted a structure file to genInd, and am using that.</span></p>
<p class="MsoNormal" style="color:rgb(0,0,255); font-size:large; margin-bottom:0.0001pt">
<br>
</p>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><font color="#0000ff" face="Times New Roman, serif"><span style="font-size:13.3333px">1) Am I correct in understanding that the number of clusters you find should inform the number of colours that you list
for your DAPC plot?</span></font></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-family:'Times New Roman',serif; font-size:10pt; color:rgb(0,0,255)"><br>
</span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-family:'Times New Roman',serif; font-size:10pt; color:rgb(0,0,255)">2) I'm not quite sure how to interpret the following. How do I know if the fit is good?</span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-family:'Times New Roman',serif; font-size:10pt; color:rgb(0,0,255)"> </span></p>
<p class="MsoNormal" style="color:rgb(0,0,255); font-size:large; margin-bottom:0.0001pt">
<img src="cid:ii_150868359321bbb7" alt="Inline image 1" height="280" width="280"><br>
</p>
</div>
<div><br>
</div>
<div>
<div class="gmail_default" style="font-size:large; color:rgb(0,0,255)">3 and 4) Is there a function that I can use to correlate the colours with my original populations. I do have this information in the datafile that I fed in. And, does 300 sound reasonable
for the number of discriminant functions to retain?</div>
<div class="gmail_default">
<p class="MsoNormal" style="margin-bottom:0.0001pt"><font color="#0000ff" face="Times New Roman, serif"><span style="font-size:13.3333px">> dapc1 <- dapc(data_full, NumClust$grp)</span></font></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><font color="#0000ff" face="Times New Roman, serif"><span style="font-size:13.3333px">Choose the number PCs to retain (>=1): 40</span></font></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><font color="#0000ff" face="Times New Roman, serif"><span style="font-size:13.3333px">Choose the number discriminant functions to retain (>=1): 300</span></font></p>
</div>
<div class="gmail_default" style="font-size:large; color:rgb(0,0,255)">
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-size:10pt; font-family:'Times New Roman',serif">#making colours for 9 clusters, since optimal k was 9 with the data containing zeros</span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-size:10pt; font-family:'Times New Roman',serif">myCol <- c("red", "orange", "yellow", "green", "blue", "purple", "violet", "grey", "brown")</span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-size:10pt; font-family:'Times New Roman',serif">scatter(dapc1, scree.da=FALSE, bg="white", pch=20, cell=0, cstar=0, col=myCol, solid=.4, cex=1, clab=0, leg=TRUE, txt.leg=paste("Cluster",
1:9))</span></p>
</div>
<div class="gmail_default" style="font-size:large; color:rgb(0,0,255)"><img src="cid:ii_1508685356bc662a" alt="Inline image 2" height="280" width="280"></div>
<div class="gmail_default" style="font-size:large; color:rgb(0,0,255)"></div>
<div class="gmail_default" style="font-size:large; color:rgb(0,0,255)">5) I don't really understand the difference between the optim a score and the cross validation analyses. Both seem to be determining what is the best number of PCs to retain. However, they
give very different results. Am I misunderstanding what they are?</div>
<div class="gmail_default" style="font-size:large; color:rgb(0,0,255)">
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-size:10pt; font-family:'Times New Roman',serif">#for "data_full" dataset</span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-size:10pt; font-family:'Times New Roman',serif">dapc2 <- dapc(data_full, n.da=300, n.pca=50)</span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-size:10pt; font-family:'Times New Roman',serif"> </span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-size:10pt; font-family:'Times New Roman',serif">temp <- optim.a.score(dapc2)</span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-size:10pt; font-family:'Times New Roman',serif"> </span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-size:10pt; font-family:'Times New Roman',serif">#graph shows that highest alpha seems to be 8</span></p>
</div>
<div class="gmail_default" style="font-size:large; color:rgb(0,0,255)"><img src="cid:ii_15086881f2cd8715" alt="Inline image 3" height="280" width="280"></div>
<div class="gmail_default" style="font-size:large; color:rgb(0,0,255)"><span style="font-family:'Times New Roman',serif; font-size:10pt; color:rgb(34,34,34)">#cross-validation for number of PCs to retain –can only do using data_full (this is called “mat” here),
couldn’t get it to work using data with zeros</span></div>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-size:10pt; font-family:'Times New Roman',serif">mat <- scaleGen(data, NA.method="mean")</span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-size:10pt; font-family:'Times New Roman',serif">grp <- pop(data)</span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-size:10pt; font-family:'Times New Roman',serif"> </span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-size:10pt; font-family:'Times New Roman',serif"> </span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-size:10pt; font-family:'Times New Roman',serif">xval <- xvalDapc(mat, grp, n.pca.max = 100, training.set = 0.9, result = "groupMean", center = TRUE, scale = FALSE, n.pca = NULL, n.rep = 30,
xval.plot = TRUE)</span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-size:10pt; font-family:'Times New Roman',serif"> </span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-size:10pt; font-family:'Times New Roman',serif">xval[2:6]</span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-size:10pt; font-family:'Times New Roman',serif"> </span></p>
<div class="gmail_default" style="font-size:large; color:rgb(0,0,255)"><span style="font-family:'Times New Roman',serif; font-size:10pt; color:rgb(34,34,34)">#results</span></div>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-size:10pt; font-family:'Times New Roman',serif">Confidence Interval for Random Chance`</span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-size:10pt; font-family:'Times New Roman',serif"> 2.5% 50% 97.5%
</span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-size:10pt; font-family:'Times New Roman',serif">0.05659207 0.09212947 0.14164194
</span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-size:10pt; font-family:'Times New Roman',serif"> </span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-size:10pt; font-family:'Times New Roman',serif">$`Mean Successful Assignment by Number of PCs of PCA`</span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-size:10pt; font-family:'Times New Roman',serif"> 10 20 30 40 50 60 70 80 90
</span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-size:10pt; font-family:'Times New Roman',serif">0.8409091 0.8348485 0.8439394 0.8530303 0.8136364 0.8227273 0.8000000 0.8075758 0.8075758
</span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-size:10pt; font-family:'Times New Roman',serif"> </span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-size:10pt; font-family:'Times New Roman',serif">$`Number of PCs Achieving Highest Mean Success`</span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-size:10pt; font-family:'Times New Roman',serif">[1] "40"</span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-size:10pt; font-family:'Times New Roman',serif"> </span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-size:10pt; font-family:'Times New Roman',serif">$`Root Mean Squared Error by Number of PCs of PCA`</span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-size:10pt; font-family:'Times New Roman',serif"> 10 20 30 40 50 60 70 80 90
</span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-size:10pt; font-family:'Times New Roman',serif">0.1702777 0.1770200 0.1649359 0.1607061 0.2007218 0.1864929 0.2138458 0.2051338 0.2074707
</span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-size:10pt; font-family:'Times New Roman',serif"> </span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-size:10pt; font-family:'Times New Roman',serif">$`Number of PCs Achieving Lowest MSE`</span></p>
<div class="gmail_default" style="font-size:large; color:rgb(0,0,255)"><span style="font-family:'Times New Roman',serif; font-size:10pt; color:rgb(34,34,34)">[1] "40"</span></div>
<div class="gmail_default" style="font-size:large; color:rgb(0,0,255)"><img src="cid:ii_15086894cc94945b" alt="Inline image 4" height="291" width="343"></div>
<br>
</div>
<div>
<div class="gmail_default" style="font-size:large; color:rgb(0,0,255)">Thank you very much for your time, and sincerely,</div>
<div class="gmail_default" style="font-size:large; color:rgb(0,0,255)">Ella Bowles</div>
<span class="HOEnZb"><font color="#888888"><br>
</font></span></div>
<span class="HOEnZb"><font color="#888888">-- <br>
<div>
<div dir="ltr">
<div>Ella Bowles<br>
PhD Candidate </div>
<div>Biological Sciences</div>
<div>University of Calgary<br>
<br>
e-mail: <a href="mailto:ebowles@ucalgary.ca" target="_blank">ebowles@ucalgary.ca</a>,
<a href="mailto:bowlese@gmail.com" target="_blank">bowlese@gmail.com</a></div>
<div>website: <a href="http://ellabowlesphd.wordpress.com/" rel="nofollow me" style="color:rgb(59,89,152); font-family:'lucida grande',tahoma,verdana,arial,sans-serif; font-size:11.2px; line-height:17px" target="_blank">http://<span style="display:inline-block"></span>ellabowlesphd.wordpre<span style="display:inline-block"></span>ss.com/</a></div>
</div>
</div>
</font></span></div>
</blockquote>
</div>
<br>
<br clear="all">
<div><br>
</div>
-- <br>
<div class="gmail_signature">
<div dir="ltr">
<div>Ella Bowles<br>
PhD Candidate </div>
<div>Biological Sciences</div>
<div>University of Calgary<br>
<br>
e-mail: <a href="mailto:ebowles@ucalgary.ca" target="_blank">ebowles@ucalgary.ca</a>,
<a href="mailto:bowlese@gmail.com" target="_blank">bowlese@gmail.com</a></div>
<div>website: <a href="http://ellabowlesphd.wordpress.com/" rel="nofollow me" style="color:rgb(59,89,152); font-family:'lucida grande',tahoma,verdana,arial,sans-serif; font-size:11.199999809265137px; line-height:17px" target="_blank">http://<span style="display:inline-block"></span>ellabowlesphd.wordpre<span style="display:inline-block"></span>ss.com/</a></div>
</div>
</div>
</div>
</div>
</div>
</div>
</body>
</html>