[adegenet-forum] a.score versus cross validation and number of discriminant functions to retain
Ella Bowles
ebowles at ucalgary.ca
Tue Oct 20 20:45:38 CEST 2015
ps Also, which function do I use to get numeric values for the percentage
of variation that is explained by the two principle components that are
reflected on the scatter plot?
with thanks
On Tue, Oct 20, 2015 at 12:40 PM, Ella Bowles <ebowles at ucalgary.ca> wrote:
> Hello,
>
>
> I think I have worked my way through a DAPC analysis, and it's pretty
> neat. I have five questions though. By way of background, I am using a
> SNP dataset with 11 putative populations (clusters), containing 4099 SNPs.
> I've converted a structure file to genInd, and am using that.
>
>
> 1) Am I correct in understanding that the number of clusters you find
> should inform the number of colours that you list for your DAPC plot?
>
>
> 2) I'm not quite sure how to interpret the following. How do I know if the
> fit is good?
>
>
>
> [image: Inline image 1]
>
> 3 and 4) Is there a function that I can use to correlate the colours with
> my original populations. I do have this information in the datafile that I
> fed in. And, does 300 sound reasonable for the number of discriminant
> functions to retain?
>
> > dapc1 <- dapc(data_full, NumClust$grp)
>
> Choose the number PCs to retain (>=1): 40
>
> Choose the number discriminant functions to retain (>=1): 300
>
> #making colours for 9 clusters, since optimal k was 9 with the data
> containing zeros
>
> myCol <- c("red", "orange", "yellow", "green", "blue", "purple", "violet",
> "grey", "brown")
>
> scatter(dapc1, scree.da=FALSE, bg="white", pch=20, cell=0, cstar=0,
> col=myCol, solid=.4, cex=1, clab=0, leg=TRUE, txt.leg=paste("Cluster", 1:9))
> [image: Inline image 2]
>
> 5) I don't really understand the difference between the optim a score and
> the cross validation analyses. Both seem to be determining what is the best
> number of PCs to retain. However, they give very different results. Am I
> misunderstanding what they are?
>
> #for "data_full" dataset
>
> dapc2 <- dapc(data_full, n.da=300, n.pca=50)
>
>
>
> temp <- optim.a.score(dapc2)
>
>
>
> #graph shows that highest alpha seems to be 8
> [image: Inline image 3]
> #cross-validation for number of PCs to retain –can only do using
> data_full (this is called “mat” here), couldn’t get it to work using data
> with zeros
>
> mat <- scaleGen(data, NA.method="mean")
>
> grp <- pop(data)
>
>
>
>
>
> xval <- xvalDapc(mat, grp, n.pca.max = 100, training.set = 0.9, result =
> "groupMean", center = TRUE, scale = FALSE, n.pca = NULL, n.rep = 30,
> xval.plot = TRUE)
>
>
>
> xval[2:6]
>
>
> #results
>
> Confidence Interval for Random Chance`
>
> 2.5% 50% 97.5%
>
> 0.05659207 0.09212947 0.14164194
>
>
>
> $`Mean Successful Assignment by Number of PCs of PCA`
>
> 10 20 30 40 50 60
> 70 80 90
>
> 0.8409091 0.8348485 0.8439394 0.8530303 0.8136364 0.8227273 0.8000000
> 0.8075758 0.8075758
>
>
>
> $`Number of PCs Achieving Highest Mean Success`
>
> [1] "40"
>
>
>
> $`Root Mean Squared Error by Number of PCs of PCA`
>
> 10 20 30 40 50 60
> 70 80 90
>
> 0.1702777 0.1770200 0.1649359 0.1607061 0.2007218 0.1864929 0.2138458
> 0.2051338 0.2074707
>
>
>
> $`Number of PCs Achieving Lowest MSE`
> [1] "40"
> [image: Inline image 4]
>
> Thank you very much for your time, and sincerely,
> Ella Bowles
>
> --
> Ella Bowles
> PhD Candidate
> Biological Sciences
> University of Calgary
>
> e-mail: ebowles at ucalgary.ca, bowlese at gmail.com
> website: http://ellabowlesphd.wordpress.com/
>
--
Ella Bowles
PhD Candidate
Biological Sciences
University of Calgary
e-mail: ebowles at ucalgary.ca, bowlese at gmail.com
website: http://ellabowlesphd.wordpress.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20151020/395c323a/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 12190 bytes
Desc: not available
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20151020/395c323a/attachment-0004.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 46303 bytes
Desc: not available
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20151020/395c323a/attachment-0005.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 14492 bytes
Desc: not available
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20151020/395c323a/attachment-0006.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 20171 bytes
Desc: not available
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20151020/395c323a/attachment-0007.png>
More information about the adegenet-forum
mailing list