<div dir="ltr"><div class="gmail_default" style="font-size:large;color:#0000ff">PS As a different way of looking at the DAPC plot, would it be possible to plot the populations according to cluster, as they are in my plot, but colour code by population (as assigned in the input file)? </div></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Oct 21, 2015 at 10:52 AM, Ella Bowles <span dir="ltr"><<a href="mailto:ebowles@ucalgary.ca" target="_blank">ebowles@ucalgary.ca</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_default" style="color:rgb(0,0,255)"><font size="1">ps I just tried to fun find.cluster while only retaining 10 PCs, and got a super strange result. It's calling a giant number of groups as best. Seems like this is resolving too much variation. Seems best to stick with xval suggested 40.</font></div><div class="gmail_default" style="color:rgb(0,0,255)"><font size="1"><br></font></div><div class="gmail_default" style="color:rgb(0,0,255)"><div class="gmail_default"><font size="1"> NumClust <- find.clusters(data_full, max.n.clust=100)</font></div><div class="gmail_default"><font size="1">Choose the number PCs to retain (>=1): 10</font></div><div class="gmail_default"><font size="1">Choose the number of clusters (>=2: 25</font></div><div class="gmail_default"><font size="1">> head(NumClust$Kstat, 30)</font></div><div class="gmail_default"><font size="1"> K=1 K=2 K=3 K=4 K=5 K=6 K=7 K=8 K=9 K=10 K=11 K=12 </font></div><div class="gmail_default"><font size="1">864.7344 810.3223 729.0304 669.8737 619.2427 573.9809 544.6057 481.2244 473.3314 468.2758 434.5868 429.4302 </font></div><div class="gmail_default"><font size="1"> K=13 K=14 K=15 K=16 K=17 K=18 K=19 K=20 K=21 K=22 K=23 K=24 </font></div><div class="gmail_default"><font size="1">424.6336 423.1422 413.2484 414.3202 410.0086 407.0822 411.1878 408.5134 418.5212 413.8698 411.2578 401.9535 </font></div><div class="gmail_default"><font size="1"> K=25 K=26 K=27 K=28 K=29 K=30 </font></div><div class="gmail_default"><font size="1">413.3403 405.8296 417.6782 403.9047 407.2553 406.8078 </font></div></div></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Oct 21, 2015 at 10:36 AM, Ella Bowles <span dir="ltr"><<a href="mailto:ebowles@ucalgary.ca" target="_blank">ebowles@ucalgary.ca</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_default" style="font-size:large;color:rgb(0,0,255)">Many thanks for this. Couple quick questions in follow-up.</div><div class="gmail_extra"><br><div class="gmail_quote"><span><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div><div style="direction:ltr;font-family:Tahoma;color:rgb(0,0,0);font-size:10pt"><br>
<br>
#2 if you have clusters defined already this graph may not be very useful; it just compares previous cluster definition to Kmean's<br></div></div></blockquote><div><br></div></span><div><div class="gmail_default" style="font-size:large;color:rgb(0,0,255);display:inline">>>I have populations identified using the "pop" option. But I don't have clusters identified per se. If this is the case, does my plot look okay?</div> </div><div><div class="gmail_default" style="font-size:large;color:rgb(0,0,255)"><span style="font-family:'Times New Roman',serif;font-size:10pt"> </span></div><div class="gmail_default" style="font-size:large;color:rgb(0,0,255)"><img src="cid:ii_150868359321bbb7" alt="Inline image 1" width="280" height="280"></div><br></div><span><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div><div style="direction:ltr;font-family:Tahoma;color:rgb(0,0,0);font-size:10pt">
<br>
#3 ?scatter.dapc -> argument 'col', which you are using already<br></div></div></blockquote></span><div><div class="gmail_default" style="font-size:large;color:rgb(0,0,255);display:inline">>>I should have been more clear here. I don't know which population is being represented by which colour, and would ideally like to know this so that I can see how they are being grouped. Is there a function that I can use to ask for this information? Do the numbers that NumClust$grp give me represent the clusters that the individuals are being assigned to? If this is the case, then this question is answered. </div></div><span><div><div class="gmail_default" style="font-size:large;color:rgb(0,0,255);display:inline"><br></div></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div><div style="direction:ltr;font-family:Tahoma;color:rgb(0,0,0);font-size:10pt">
#4 there are K-1 discriminant functions, so '300' will just retain K-1<br>
<br></div></div></blockquote></span><div><div class="gmail_default" style="font-size:large;color:rgb(0,0,255);display:inline">>>is 300 a good number though? I just don't know how to know if I'm making a good choice.</div></div><span><div><div class="gmail_default" style="font-size:large;color:rgb(0,0,255);display:inline"></div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div><div style="direction:ltr;font-family:Tahoma;color:rgb(0,0,0);font-size:10pt">
#5 if in doubt, use Xval - more advanced and easier to interpret; in your case your data are very well separated in just a few dimensions; 10 PCs should do the trick<br></div></div></blockquote><div><br></div></span><div><div class="gmail_default" style="font-size:large;color:rgb(0,0,255);display:inline">>>So I should use 10 even though xval says 40?</div></div><div><div class="gmail_default" style="font-size:large;color:rgb(0,0,255);display:inline"><br></div></div><div><div class="gmail_default" style="font-size:large;color:rgb(0,0,255);display:inline">Thank you again,</div></div><div><div class="gmail_default" style="font-size:large;color:rgb(0,0,255);display:inline">Ella</div> </div><div><div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div><div style="direction:ltr;font-family:Tahoma;color:rgb(0,0,0);font-size:10pt"><div><div style="font-family:Tahoma;font-size:13px"><div><font size="2"><span style="font-size:10pt"><div>
</div>
</span></font></div>
</div>
</div>
<div style="font-family:'Times New Roman';color:rgb(0,0,0);font-size:16px">
<hr>
<div style="direction:ltr"><font color="#000000" face="Tahoma" size="2"><b>From:</b> <a href="mailto:adegenet-forum-bounces@lists.r-forge.r-project.org" target="_blank">adegenet-forum-bounces@lists.r-forge.r-project.org</a> [<a href="mailto:adegenet-forum-bounces@lists.r-forge.r-project.org" target="_blank">adegenet-forum-bounces@lists.r-forge.r-project.org</a>] on behalf of Ella Bowles [<a href="mailto:ebowles@ucalgary.ca" target="_blank">ebowles@ucalgary.ca</a>]<br>
<b>Sent:</b> 20 October 2015 19:45<br>
<b>To:</b> <a href="mailto:adegenet-forum@lists.r-forge.r-project.org" target="_blank">adegenet-forum@lists.r-forge.r-project.org</a><br>
<b>Subject:</b> Re: [adegenet-forum] a.score versus cross validation and number of discriminant functions to retain<br>
</font><br>
</div><div><div>
<div></div>
<div>
<div dir="ltr">
<div style="font-size:large;color:rgb(0,0,255)">ps Also, which function do I use to get numeric values for the percentage of variation that is explained by the two principle components that are reflected on the scatter plot?</div>
<div style="font-size:large;color:rgb(0,0,255)"><br>
</div>
<div style="font-size:large;color:rgb(0,0,255)">with thanks</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Tue, Oct 20, 2015 at 12:40 PM, Ella Bowles <span dir="ltr">
<<a href="mailto:ebowles@ucalgary.ca" target="_blank">ebowles@ucalgary.ca</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div dir="ltr">
<div>
<p class="MsoNormal" style="color:rgb(0,0,255);font-size:large;margin-bottom:0.0001pt">
<span style="font-size:10pt;font-family:'Times New Roman',serif">Hello,</span></p>
<p class="MsoNormal" style="color:rgb(0,0,255);font-size:large;margin-bottom:0.0001pt">
<span style="font-family:'Times New Roman',serif;font-size:10pt"><br>
</span></p>
<p class="MsoNormal" style="color:rgb(0,0,255);font-size:large;margin-bottom:0.0001pt">
<span style="font-family:'Times New Roman',serif;font-size:10pt">I think I have worked my way through a DAPC analysis, and it's pretty neat. I have five questions though. </span><span style="font-family:'Times New Roman',serif;font-size:10pt">By way of background,
I am using a SNP dataset with 11 putative populations (clusters), containing 4099 SNPs. I've converted a structure file to genInd, and am using that.</span></p>
<p class="MsoNormal" style="color:rgb(0,0,255);font-size:large;margin-bottom:0.0001pt">
<br>
</p>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><font color="#0000ff" face="Times New Roman, serif"><span style="font-size:13.3333px">1) Am I correct in understanding that the number of clusters you find should inform the number of colours that you list
for your DAPC plot?</span></font></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-family:'Times New Roman',serif;font-size:10pt;color:rgb(0,0,255)"><br>
</span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-family:'Times New Roman',serif;font-size:10pt;color:rgb(0,0,255)">2) I'm not quite sure how to interpret the following. How do I know if the fit is good?</span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-family:'Times New Roman',serif;font-size:10pt;color:rgb(0,0,255)"> </span></p>
<p class="MsoNormal" style="color:rgb(0,0,255);font-size:large;margin-bottom:0.0001pt">
<img src="cid:ii_150868359321bbb7" alt="Inline image 1" height="280" width="280"><br>
</p>
</div>
<div><br>
</div>
<div>
<div style="font-size:large;color:rgb(0,0,255)">3 and 4) Is there a function that I can use to correlate the colours with my original populations. I do have this information in the datafile that I fed in. And, does 300 sound reasonable
for the number of discriminant functions to retain?</div>
<div>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><font color="#0000ff" face="Times New Roman, serif"><span style="font-size:13.3333px">> dapc1 <- dapc(data_full, NumClust$grp)</span></font></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><font color="#0000ff" face="Times New Roman, serif"><span style="font-size:13.3333px">Choose the number PCs to retain (>=1): 40</span></font></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><font color="#0000ff" face="Times New Roman, serif"><span style="font-size:13.3333px">Choose the number discriminant functions to retain (>=1): 300</span></font></p>
</div>
<div style="font-size:large;color:rgb(0,0,255)">
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-size:10pt;font-family:'Times New Roman',serif">#making colours for 9 clusters, since optimal k was 9 with the data containing zeros</span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-size:10pt;font-family:'Times New Roman',serif">myCol <- c("red", "orange", "yellow", "green", "blue", "purple", "violet", "grey", "brown")</span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-size:10pt;font-family:'Times New Roman',serif">scatter(dapc1, scree.da=FALSE, bg="white", pch=20, cell=0, cstar=0, col=myCol, solid=.4, cex=1, clab=0, leg=TRUE, txt.leg=paste("Cluster",
1:9))</span></p>
</div>
<div style="font-size:large;color:rgb(0,0,255)"><img src="cid:ii_1508685356bc662a" alt="Inline image 2" height="280" width="280"></div>
<div style="font-size:large;color:rgb(0,0,255)"></div>
<div style="font-size:large;color:rgb(0,0,255)">5) I don't really understand the difference between the optim a score and the cross validation analyses. Both seem to be determining what is the best number of PCs to retain. However, they
give very different results. Am I misunderstanding what they are?</div>
<div style="font-size:large;color:rgb(0,0,255)">
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-size:10pt;font-family:'Times New Roman',serif">#for "data_full" dataset</span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-size:10pt;font-family:'Times New Roman',serif">dapc2 <- dapc(data_full, n.da=300, n.pca=50)</span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-size:10pt;font-family:'Times New Roman',serif"> </span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-size:10pt;font-family:'Times New Roman',serif">temp <- optim.a.score(dapc2)</span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-size:10pt;font-family:'Times New Roman',serif"> </span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-size:10pt;font-family:'Times New Roman',serif">#graph shows that highest alpha seems to be 8</span></p>
</div>
<div style="font-size:large;color:rgb(0,0,255)"><img src="cid:ii_15086881f2cd8715" alt="Inline image 3" height="280" width="280"></div>
<div style="font-size:large;color:rgb(0,0,255)"><span style="font-family:'Times New Roman',serif;font-size:10pt;color:rgb(34,34,34)">#cross-validation for number of PCs to retain –can only do using data_full (this is called “mat” here),
couldn’t get it to work using data with zeros</span></div>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-size:10pt;font-family:'Times New Roman',serif">mat <- scaleGen(data, NA.method="mean")</span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-size:10pt;font-family:'Times New Roman',serif">grp <- pop(data)</span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-size:10pt;font-family:'Times New Roman',serif"> </span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-size:10pt;font-family:'Times New Roman',serif"> </span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-size:10pt;font-family:'Times New Roman',serif">xval <- xvalDapc(mat, grp, n.pca.max = 100, training.set = 0.9, result = "groupMean", center = TRUE, scale = FALSE, n.pca = NULL, n.rep = 30,
xval.plot = TRUE)</span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-size:10pt;font-family:'Times New Roman',serif"> </span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-size:10pt;font-family:'Times New Roman',serif">xval[2:6]</span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-size:10pt;font-family:'Times New Roman',serif"> </span></p>
<div style="font-size:large;color:rgb(0,0,255)"><span style="font-family:'Times New Roman',serif;font-size:10pt;color:rgb(34,34,34)">#results</span></div>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-size:10pt;font-family:'Times New Roman',serif">Confidence Interval for Random Chance`</span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-size:10pt;font-family:'Times New Roman',serif"> 2.5% 50% 97.5%
</span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-size:10pt;font-family:'Times New Roman',serif">0.05659207 0.09212947 0.14164194
</span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-size:10pt;font-family:'Times New Roman',serif"> </span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-size:10pt;font-family:'Times New Roman',serif">$`Mean Successful Assignment by Number of PCs of PCA`</span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-size:10pt;font-family:'Times New Roman',serif"> 10 20 30 40 50 60 70 80 90
</span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-size:10pt;font-family:'Times New Roman',serif">0.8409091 0.8348485 0.8439394 0.8530303 0.8136364 0.8227273 0.8000000 0.8075758 0.8075758
</span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-size:10pt;font-family:'Times New Roman',serif"> </span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-size:10pt;font-family:'Times New Roman',serif">$`Number of PCs Achieving Highest Mean Success`</span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-size:10pt;font-family:'Times New Roman',serif">[1] "40"</span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-size:10pt;font-family:'Times New Roman',serif"> </span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-size:10pt;font-family:'Times New Roman',serif">$`Root Mean Squared Error by Number of PCs of PCA`</span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-size:10pt;font-family:'Times New Roman',serif"> 10 20 30 40 50 60 70 80 90
</span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-size:10pt;font-family:'Times New Roman',serif">0.1702777 0.1770200 0.1649359 0.1607061 0.2007218 0.1864929 0.2138458 0.2051338 0.2074707
</span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-size:10pt;font-family:'Times New Roman',serif"> </span></p>
<p class="MsoNormal" style="margin-bottom:0.0001pt"><span style="font-size:10pt;font-family:'Times New Roman',serif">$`Number of PCs Achieving Lowest MSE`</span></p>
<div style="font-size:large;color:rgb(0,0,255)"><span style="font-family:'Times New Roman',serif;font-size:10pt;color:rgb(34,34,34)">[1] "40"</span></div>
<div style="font-size:large;color:rgb(0,0,255)"><img src="cid:ii_15086894cc94945b" alt="Inline image 4" height="291" width="343"></div>
<br>
</div>
<div>
<div style="font-size:large;color:rgb(0,0,255)">Thank you very much for your time, and sincerely,</div>
<div style="font-size:large;color:rgb(0,0,255)">Ella Bowles</div>
<span><font color="#888888"><br>
</font></span></div>
<span><font color="#888888">-- <br>
<div>
<div dir="ltr">
<div>Ella Bowles<br>
PhD Candidate </div>
<div>Biological Sciences</div>
<div>University of Calgary<br>
<br>
e-mail: <a href="mailto:ebowles@ucalgary.ca" target="_blank">ebowles@ucalgary.ca</a>,
<a href="mailto:bowlese@gmail.com" target="_blank">bowlese@gmail.com</a></div>
<div>website: <a href="http://ellabowlesphd.wordpress.com/" rel="nofollow me" style="color:rgb(59,89,152);font-family:'lucida grande',tahoma,verdana,arial,sans-serif;font-size:11.2px;line-height:17px" target="_blank">http://<span style="display:inline-block"></span>ellabowlesphd.wordpre<span style="display:inline-block"></span>ss.com/</a></div>
</div>
</div>
</font></span></div>
</blockquote>
</div>
<br>
<br clear="all">
<div><br>
</div>
-- <br>
<div>
<div dir="ltr">
<div>Ella Bowles<br>
PhD Candidate </div>
<div>Biological Sciences</div>
<div>University of Calgary<br>
<br>
e-mail: <a href="mailto:ebowles@ucalgary.ca" target="_blank">ebowles@ucalgary.ca</a>,
<a href="mailto:bowlese@gmail.com" target="_blank">bowlese@gmail.com</a></div>
<div>website: <a href="http://ellabowlesphd.wordpress.com/" rel="nofollow me" style="color:rgb(59,89,152);font-family:'lucida grande',tahoma,verdana,arial,sans-serif;font-size:11.2px;line-height:17px" target="_blank">http://<span style="display:inline-block"></span>ellabowlesphd.wordpre<span style="display:inline-block"></span>ss.com/</a></div>
</div>
</div>
</div>
</div>
</div></div></div>
</div>
</div>
</blockquote></div></div></div><div><div><br><br clear="all"><div><br></div>-- <br><div><div dir="ltr"><div>Ella Bowles<br>PhD Candidate </div><div>Biological Sciences</div>
<div>University of Calgary<br><br>e-mail: <a href="mailto:ebowles@ucalgary.ca" target="_blank">ebowles@ucalgary.ca</a>, <a href="mailto:bowlese@gmail.com" target="_blank">bowlese@gmail.com</a></div><div>website: <a href="http://ellabowlesphd.wordpress.com/" rel="nofollow me" style="color:rgb(59,89,152);font-family:'lucida grande',tahoma,verdana,arial,sans-serif;font-size:11.2px;line-height:17px" target="_blank">http://<span style="display:inline-block"></span>ellabowlesphd.wordpre<span style="display:inline-block"></span>ss.com/</a></div></div></div>
</div></div></div></div>
</blockquote></div><br><br clear="all"><div><br></div>-- <br><div><div dir="ltr"><div>Ella Bowles<br>PhD Candidate </div><div>Biological Sciences</div>
<div>University of Calgary<br><br>e-mail: <a href="mailto:ebowles@ucalgary.ca" target="_blank">ebowles@ucalgary.ca</a>, <a href="mailto:bowlese@gmail.com" target="_blank">bowlese@gmail.com</a></div><div>website: <a href="http://ellabowlesphd.wordpress.com/" rel="nofollow me" style="color:rgb(59,89,152);font-family:'lucida grande',tahoma,verdana,arial,sans-serif;font-size:11.199999809265137px;line-height:17px" target="_blank">http://<span style="display:inline-block"></span>ellabowlesphd.wordpre<span style="display:inline-block"></span>ss.com/</a></div></div></div>
</div>
</div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature"><div dir="ltr"><div>Ella Bowles<br>PhD Candidate </div><div>Biological Sciences</div>
<div>University of Calgary<br><br>e-mail: <a href="mailto:ebowles@ucalgary.ca" target="_blank">ebowles@ucalgary.ca</a>, <a href="mailto:bowlese@gmail.com" target="_blank">bowlese@gmail.com</a></div><div>website: <a href="http://ellabowlesphd.wordpress.com/" rel="nofollow me" style="color:rgb(59,89,152);font-family:'lucida grande',tahoma,verdana,arial,sans-serif;font-size:11.199999809265137px;line-height:17px" target="_blank">http://<span style="display:inline-block"></span>ellabowlesphd.wordpre<span style="display:inline-block"></span>ss.com/</a></div></div></div>
</div>