<div dir="ltr"><div class="gmail_default" style="font-size:large;color:#0000ff">ps Also, which function do I use to get numeric values for the percentage of variation that is explained by the two principle components that are reflected on the scatter plot?</div><div class="gmail_default" style="font-size:large;color:#0000ff"><br></div><div class="gmail_default" style="font-size:large;color:#0000ff">with thanks</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Oct 20, 2015 at 12:40 PM, Ella Bowles <span dir="ltr"><<a href="mailto:ebowles@ucalgary.ca" target="_blank">ebowles@ucalgary.ca</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_default"><p class="MsoNormal" style="color:rgb(0,0,255);font-size:large;margin-bottom:0.0001pt;background-image:initial;background-repeat:initial"><span style="font-size:10pt;font-family:'Times New Roman',serif">Hello,</span></p><p class="MsoNormal" style="color:rgb(0,0,255);font-size:large;margin-bottom:0.0001pt;background-image:initial;background-repeat:initial"><span style="font-family:'Times New Roman',serif;font-size:10pt"><br></span></p><p class="MsoNormal" style="color:rgb(0,0,255);font-size:large;margin-bottom:0.0001pt;background-image:initial;background-repeat:initial"><span style="font-family:'Times New Roman',serif;font-size:10pt">I think I have worked my way through a DAPC analysis, and it's pretty neat. I have five questions though. </span><span style="font-family:'Times New Roman',serif;font-size:10pt">By way of background, I am using a SNP dataset with 11 putative populations (clusters), containing 4099 SNPs. I've converted a structure file to genInd, and am using that.</span></p><p class="MsoNormal" style="color:rgb(0,0,255);font-size:large;margin-bottom:0.0001pt;background-image:initial;background-repeat:initial"><br></p><p class="MsoNormal" style="margin-bottom:0.0001pt;background-image:initial;background-repeat:initial"><font color="#0000ff" face="Times New Roman, serif"><span style="font-size:13.3333px">1) Am I correct in understanding that the number of clusters you find should inform the number of colours that you list for your DAPC plot?</span></font></p><p class="MsoNormal" style="margin-bottom:0.0001pt;background-image:initial;background-repeat:initial"><span style="font-family:'Times New Roman',serif;font-size:10pt;color:rgb(0,0,255)"><br></span></p><p class="MsoNormal" style="margin-bottom:0.0001pt;background-image:initial;background-repeat:initial"><span style="font-family:'Times New Roman',serif;font-size:10pt;color:rgb(0,0,255)">2) I'm not quite sure how to interpret the following. How do I know if the fit is good?</span></p><p class="MsoNormal" style="margin-bottom:0.0001pt;background-image:initial;background-repeat:initial"><span style="font-family:'Times New Roman',serif;font-size:10pt;color:rgb(0,0,255)"> </span></p>
<p class="MsoNormal" style="color:rgb(0,0,255);font-size:large;margin-bottom:0.0001pt;background-image:initial;background-repeat:initial"><img src="cid:ii_150868359321bbb7" alt="Inline image 1" width="280" height="280"><br></p></div><div><br></div><div><div class="gmail_default" style="font-size:large;color:rgb(0,0,255)">3 and 4) Is there a function that I can use to correlate the colours with my original populations. I do have this information in the datafile that I fed in. And, does 300 sound reasonable for the number of discriminant functions to retain?</div><div class="gmail_default"><p class="MsoNormal" style="margin-bottom:0.0001pt;background-image:initial;background-repeat:initial"><font color="#0000ff" face="Times New Roman, serif"><span style="font-size:13.3333px">> dapc1 <- dapc(data_full, NumClust$grp)</span></font></p><p class="MsoNormal" style="margin-bottom:0.0001pt;background-image:initial;background-repeat:initial"><font color="#0000ff" face="Times New Roman, serif"><span style="font-size:13.3333px">Choose the number PCs to retain (>=1): 40</span></font></p><p class="MsoNormal" style="margin-bottom:0.0001pt;background-image:initial;background-repeat:initial"><font color="#0000ff" face="Times New Roman, serif"><span style="font-size:13.3333px">Choose the number discriminant functions to retain (>=1): 300</span></font></p></div><div class="gmail_default" style="font-size:large;color:rgb(0,0,255)"><p class="MsoNormal" style="margin-bottom:0.0001pt;background-image:initial;background-repeat:initial"><span style="font-size:10pt;font-family:'Times New Roman',serif">#making colours for 9 clusters,
since optimal k was 9 with the data containing zeros</span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt;background-image:initial;background-repeat:initial"><span style="font-size:10pt;font-family:'Times New Roman',serif">myCol <- c("red",
"orange", "yellow", "green", "blue",
"purple", "violet", "grey", "brown")</span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt;background-image:initial;background-repeat:initial"><span style="font-size:10pt;font-family:'Times New Roman',serif">scatter(dapc1, scree.da=FALSE,
bg="white", pch=20, cell=0, cstar=0, col=myCol, solid=.4, cex=1,
clab=0, leg=TRUE, txt.leg=paste("Cluster", 1:9))</span></p></div><div class="gmail_default" style="font-size:large;color:rgb(0,0,255)"><img src="cid:ii_1508685356bc662a" alt="Inline image 2" width="280" height="280"></div><div class="gmail_default" style="font-size:large;color:rgb(0,0,255)"></div><div class="gmail_default" style="font-size:large;color:rgb(0,0,255)">5) I don't really understand the difference between the optim a score and the cross validation analyses. Both seem to be determining what is the best number of PCs to retain. However, they give very different results. Am I misunderstanding what they are?</div><div class="gmail_default" style="font-size:large;color:rgb(0,0,255)"><p class="MsoNormal" style="margin-bottom:0.0001pt;background-image:initial;background-repeat:initial"><span style="font-size:10pt;font-family:'Times New Roman',serif">#for "data_full" dataset</span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt;background-image:initial;background-repeat:initial"><span style="font-size:10pt;font-family:'Times New Roman',serif">dapc2 <- dapc(data_full,
n.da=300, n.pca=50)</span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt;background-image:initial;background-repeat:initial"><span style="font-size:10pt;font-family:'Times New Roman',serif"> </span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt;background-image:initial;background-repeat:initial"><span style="font-size:10pt;font-family:'Times New Roman',serif">temp <- optim.a.score(dapc2)</span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt;background-image:initial;background-repeat:initial"><span style="font-size:10pt;font-family:'Times New Roman',serif"> </span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt;background-image:initial;background-repeat:initial"><span style="font-size:10pt;font-family:'Times New Roman',serif">#graph shows that highest alpha
seems to be 8</span></p></div><div class="gmail_default" style="font-size:large;color:rgb(0,0,255)"><img src="cid:ii_15086881f2cd8715" alt="Inline image 3" width="280" height="280"></div><div class="gmail_default" style="font-size:large;color:rgb(0,0,255)"><span style="font-family:'Times New Roman',serif;font-size:10pt;color:rgb(34,34,34)">#cross-validation for number of PCs
to retain –can only do using data_full (this is called “mat” here), couldn’t
get it to work using data with zeros</span></div>
<p class="MsoNormal" style="margin-bottom:0.0001pt;background-image:initial;background-repeat:initial"><span style="font-size:10pt;font-family:'Times New Roman',serif">mat <- scaleGen(data,
NA.method="mean")</span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt;background-image:initial;background-repeat:initial"><span style="font-size:10pt;font-family:'Times New Roman',serif">grp <- pop(data)</span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt;background-image:initial;background-repeat:initial"><span style="font-size:10pt;font-family:'Times New Roman',serif"> </span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt;background-image:initial;background-repeat:initial"><span style="font-size:10pt;font-family:'Times New Roman',serif"> </span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt;background-image:initial;background-repeat:initial"><span style="font-size:10pt;font-family:'Times New Roman',serif">xval <- xvalDapc(mat, grp,
n.pca.max = 100, training.set = 0.9, result = "groupMean", center =
TRUE, scale = FALSE, n.pca = NULL, n.rep = 30, xval.plot = TRUE)</span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt;background-image:initial;background-repeat:initial"><span style="font-size:10pt;font-family:'Times New Roman',serif"> </span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt;background-image:initial;background-repeat:initial"><span style="font-size:10pt;font-family:'Times New Roman',serif">xval[2:6]</span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt;background-image:initial;background-repeat:initial"><span style="font-size:10pt;font-family:'Times New Roman',serif"> </span></p>
<div class="gmail_default" style="font-size:large;color:rgb(0,0,255)"><span style="font-family:'Times New Roman',serif;font-size:10pt;color:rgb(34,34,34)">#results</span></div><p class="MsoNormal" style="margin-bottom:0.0001pt;background-image:initial;background-repeat:initial"><span style="font-size:10pt;font-family:'Times New Roman',serif">Confidence Interval for Random
Chance`</span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt;background-image:initial;background-repeat:initial"><span style="font-size:10pt;font-family:'Times New Roman',serif">
2.5% 50% 97.5% </span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt;background-image:initial;background-repeat:initial"><span style="font-size:10pt;font-family:'Times New Roman',serif">0.05659207 0.09212947 0.14164194 </span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt;background-image:initial;background-repeat:initial"><span style="font-size:10pt;font-family:'Times New Roman',serif"> </span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt;background-image:initial;background-repeat:initial"><span style="font-size:10pt;font-family:'Times New Roman',serif">$`Mean Successful Assignment by
Number of PCs of PCA`</span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt;background-image:initial;background-repeat:initial"><span style="font-size:10pt;font-family:'Times New Roman',serif"> 10
20 30 40 50 60 70 80 90 </span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt;background-image:initial;background-repeat:initial"><span style="font-size:10pt;font-family:'Times New Roman',serif">0.8409091 0.8348485 0.8439394
0.8530303 0.8136364 0.8227273 0.8000000 0.8075758 0.8075758 </span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt;background-image:initial;background-repeat:initial"><span style="font-size:10pt;font-family:'Times New Roman',serif"> </span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt;background-image:initial;background-repeat:initial"><span style="font-size:10pt;font-family:'Times New Roman',serif">$`Number of PCs Achieving Highest
Mean Success`</span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt;background-image:initial;background-repeat:initial"><span style="font-size:10pt;font-family:'Times New Roman',serif">[1] "40"</span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt;background-image:initial;background-repeat:initial"><span style="font-size:10pt;font-family:'Times New Roman',serif"> </span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt;background-image:initial;background-repeat:initial"><span style="font-size:10pt;font-family:'Times New Roman',serif">$`Root Mean Squared Error by Number
of PCs of PCA`</span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt;background-image:initial;background-repeat:initial"><span style="font-size:10pt;font-family:'Times New Roman',serif"> 10
20 30 40 50 60 70 80 90 </span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt;background-image:initial;background-repeat:initial"><span style="font-size:10pt;font-family:'Times New Roman',serif">0.1702777 0.1770200 0.1649359
0.1607061 0.2007218 0.1864929 0.2138458 0.2051338 0.2074707 </span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt;background-image:initial;background-repeat:initial"><span style="font-size:10pt;font-family:'Times New Roman',serif"> </span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt;background-image:initial;background-repeat:initial"><span style="font-size:10pt;font-family:'Times New Roman',serif">$`Number of PCs Achieving Lowest
MSE`</span></p>
<div class="gmail_default" style="font-size:large;color:rgb(0,0,255)"><span style="font-family:'Times New Roman',serif;font-size:10pt;color:rgb(34,34,34)">[1] "40"</span></div><div class="gmail_default" style="font-size:large;color:rgb(0,0,255)"><img src="cid:ii_15086894cc94945b" alt="Inline image 4" width="343" height="291"></div><br></div><div><div class="gmail_default" style="font-size:large;color:rgb(0,0,255)">Thank you very much for your time, and sincerely,</div><div class="gmail_default" style="font-size:large;color:rgb(0,0,255)">Ella Bowles</div><span class="HOEnZb"><font color="#888888"><br></font></span></div><span class="HOEnZb"><font color="#888888">-- <br><div><div dir="ltr"><div>Ella Bowles<br>PhD Candidate </div><div>Biological Sciences</div>
<div>University of Calgary<br><br>e-mail: <a href="mailto:ebowles@ucalgary.ca" target="_blank">ebowles@ucalgary.ca</a>, <a href="mailto:bowlese@gmail.com" target="_blank">bowlese@gmail.com</a></div><div>website: <a href="http://ellabowlesphd.wordpress.com/" rel="nofollow me" style="color:rgb(59,89,152);font-family:'lucida grande',tahoma,verdana,arial,sans-serif;font-size:11.2px;line-height:17px" target="_blank">http://<span style="display:inline-block"></span>ellabowlesphd.wordpre<span style="display:inline-block"></span>ss.com/</a></div></div></div>
</font></span></div>
</blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature"><div dir="ltr"><div>Ella Bowles<br>PhD Candidate </div><div>Biological Sciences</div>
<div>University of Calgary<br><br>e-mail: <a href="mailto:ebowles@ucalgary.ca" target="_blank">ebowles@ucalgary.ca</a>, <a href="mailto:bowlese@gmail.com" target="_blank">bowlese@gmail.com</a></div><div>website: <a href="http://ellabowlesphd.wordpress.com/" rel="nofollow me" style="color:rgb(59,89,152);font-family:'lucida grande',tahoma,verdana,arial,sans-serif;font-size:11.199999809265137px;line-height:17px" target="_blank">http://<span style="display:inline-block"></span>ellabowlesphd.wordpre<span style="display:inline-block"></span>ss.com/</a></div></div></div>
</div>