[adegenet-forum] a.score versus cross validation and number of discriminant functions to retain

Wed Oct 21 18:36:54 CEST 2015

Many thanks for this. Couple quick questions in follow-up.

>
> #2 if you have clusters defined already this graph may not be very useful;
> it just compares previous cluster definition to Kmean's
>

>>I have populations identified using the "pop" option. But I don't have
clusters identified per se. If this is the case, does my plot look okay?

[image: Inline image 1]

> #3 ?scatter.dapc -> argument 'col', which you are using already
>
>>I should have been more clear here. I don't know which population is
being represented by which colour, and would ideally like to know this so
that I can see how they are being grouped. Is there a function that I can
use to ask for this information? Do the numbers that NumClust$grp give me
represent the clusters that the individuals are being assigned to? If this
is the case, then this question is answered.

#4 there are K-1 discriminant functions, so '300' will just retain K-1
>
> >>is 300 a good number though? I just don't know how to know if I'm
making a good choice.

> #5 if in doubt, use Xval - more advanced and easier to interpret; in your
> case your data are very well separated in just a few dimensions; 10 PCs
> should do the trick
>

>>So I should use 10 even though xval says 40?

Thank you again,
Ella

> ------------------------------
> *From:* adegenet-forum-bounces at lists.r-forge.r-project.org [
> adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Ella
> Bowles [ebowles at ucalgary.ca]
> *Sent:* 20 October 2015 19:45
> *To:* adegenet-forum at lists.r-forge.r-project.org
> *Subject:* Re: [adegenet-forum] a.score versus cross validation and
> number of discriminant functions to retain
>
> ps Also, which function do I use to get numeric values for the percentage
> of variation that is explained by the two principle components that are
> reflected on the scatter plot?
>
> with thanks
>
> On Tue, Oct 20, 2015 at 12:40 PM, Ella Bowles <ebowles at ucalgary.ca> wrote:
>
>> Hello,
>>
>>
>> I think I have worked my way through a DAPC analysis, and it's pretty
>> neat. I have five questions though. By way of background, I am using a
>> SNP dataset with 11 putative populations (clusters), containing 4099 SNPs.
>> I've converted a structure file to genInd, and am using that.
>>
>>
>> 1) Am I correct in understanding that the number of clusters you find
>> should inform the number of colours that you list for your DAPC plot?
>>
>>
>> 2) I'm not quite sure how to interpret the following. How do I know if
>> the fit is good?
>>
>>
>>
>> [image: Inline image 1]
>>
>> 3 and 4) Is there a function that I can use to correlate the colours
>> with my original populations. I do have this information in the datafile
>> that I fed in. And, does 300 sound reasonable for the number of
>> discriminant functions to retain?
>>
>> > dapc1 <- dapc(data_full, NumClust$grp)
>>
>> Choose the number PCs to retain (>=1): 40
>>
>> Choose the number discriminant functions to retain (>=1): 300
>>
>> #making colours for 9 clusters, since optimal k was 9 with the data
>> containing zeros
>>
>> myCol <- c("red", "orange", "yellow", "green", "blue", "purple",
>> "violet", "grey", "brown")
>>
>> scatter(dapc1, scree.da=FALSE, bg="white", pch=20, cell=0, cstar=0,
>> col=myCol, solid=.4, cex=1, clab=0, leg=TRUE, txt.leg=paste("Cluster", 1:9))
>> [image: Inline image 2]
>> 
>> 5) I don't really understand the difference between the optim a score and
>> the cross validation analyses. Both seem to be determining what is the best
>> number of PCs to retain. However, they give very different results. Am I
>> misunderstanding what they are?
>>
>> #for "data_full" dataset
>>
>> dapc2 <- dapc(data_full, n.da=300, n.pca=50)
>>
>>
>>
>> temp <- optim.a.score(dapc2)
>>
>>
>>
>> #graph shows that highest alpha seems to be 8
>> [image: Inline image 3]
>> #cross-validation for number of PCs to retain –can only do using
>> data_full (this is called “mat” here), couldn’t get it to work using data
>> with zeros
>>
>> mat <- scaleGen(data, NA.method="mean")
>>
>> grp <- pop(data)
>>
>>
>>
>>
>>
>> xval <- xvalDapc(mat, grp, n.pca.max = 100, training.set = 0.9, result =
>> "groupMean", center = TRUE, scale = FALSE, n.pca = NULL, n.rep = 30,
>> xval.plot = TRUE)
>>
>>
>>
>> xval[2:6]
>>
>>
>> #results
>>
>> Confidence Interval for Random Chance`
>>
>>       2.5%        50%      97.5%
>>
>> 0.05659207 0.09212947 0.14164194
>>
>>
>>
>> $`Mean Successful Assignment by Number of PCs of PCA`
>>
>>        10        20        30        40        50        60
>> 70        80        90
>>
>> 0.8409091 0.8348485 0.8439394 0.8530303 0.8136364 0.8227273 0.8000000
>> 0.8075758 0.8075758
>>
>>
>>
>> $`Number of PCs Achieving Highest Mean Success`
>>
>> [1] "40"
>>
>>
>>
>> $`Root Mean Squared Error by Number of PCs of PCA`
>>
>>        10        20        30        40        50        60
>> 70        80        90
>>
>> 0.1702777 0.1770200 0.1649359 0.1607061 0.2007218 0.1864929 0.2138458
>> 0.2051338 0.2074707
>>
>>
>>
>> $`Number of PCs Achieving Lowest MSE`
>> [1] "40"
>> [image: Inline image 4]
>>
>> Thank you very much for your time, and sincerely,
>> Ella Bowles
>>
>> --
>> Ella Bowles
>> PhD Candidate
>> Biological Sciences
>> University of Calgary
>>
>> e-mail: ebowles at ucalgary.ca, bowlese at gmail.com
>> website: http://ellabowlesphd.wordpress.com/
>>
>
>
>
> --
> Ella Bowles
> PhD Candidate
> Biological Sciences
> University of Calgary
>
> e-mail: ebowles at ucalgary.ca, bowlese at gmail.com
> website: http://ellabowlesphd.wordpress.com/
>

-- 
Ella Bowles
PhD Candidate
Biological Sciences
University of Calgary

e-mail: ebowles at ucalgary.ca, bowlese at gmail.com
website: http://ellabowlesphd.wordpress.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20151021/f9eccd50/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 46303 bytes
Desc: not available
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20151021/f9eccd50/attachment-0004.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 14492 bytes
Desc: not available
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20151021/f9eccd50/attachment-0005.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 20171 bytes
Desc: not available
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20151021/f9eccd50/attachment-0006.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 12190 bytes
Desc: not available
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20151021/f9eccd50/attachment-0007.png>