<div dir="ltr"><p class="MsoNormal">Good question. <br>

<br>

Essentially these are just two different approaches to the same

problem of trying to find the optimal number of PCs to retain in DAPC. The

short answer is: <b>Use xvalDapc instead of optim.a.score.</b></p><p class="MsoNormal">optim.a.score was our first approach, and xvalDapc is our

new and improved approach…  xvalDapc is easier

to interpret and is likely to give better results.</p><p class="MsoNormal"></p>

<p class="MsoNormal"> </p>

<p class="MsoNormal">------</p>

<p class="MsoNormal">If you’re just generally curious about the two approaches, I

can offer a brief description and an explanation of the way I think about them,

at least: <br><br></p>

<p class="MsoNormal">Both methods rely on repeated measurements to perform model

validation relating to the impact of the number of PCs on the ability of the

model to predict the correct group membership of all individuals in the

dataset. <br><br></p>

<p class="MsoNormal">          In cross-validation with xvalDapc, DAPC is performed (with

increasing numbers of PCs) on a “training set” (typically 90% of the dataset)

and then we project the individuals left out of the analysis onto the

discriminant axes constructed by DAPC. We measure how accurately we can place

this left-out 10% of individuals in the multidimensional space (in which their

position corresponds to their group membership). With too few PCs retained, we

fail to correctly assign the validation set of individuals to the correct

groups because we simply do not have enough information. With too many PCs

retained, we also begin to fail to correctly assign these individuals, because

essentially now all we are doing is over-describing each of the individuals in

the training set instead of painting a general picture of just those features

that relate to their group structure. This over-description merely adds “noise”

that drowns out the group-defining “signal” that we had been attempting to

summarise. We perform the cross-validation procedure repeatedly (each time varying

the number of PCs retained) with different training and validation sets until

we find the right signal-to-noise ratio, the goldilocks point between weak

discrimination and unstable results. <br><br></p>

<p class="MsoNormal">          When using the a.score to achieve this aim, we repeatedly

perform DAPC with different numbers of retained PCs; but, by contrast to

xvalDapc, we keep all individuals in the analysis. Instead, with optim.a.score,

at each level of PC retention, we measure reassignment success to the real

populations of interest, and also measure that “success” to fake randomized

populations. If there is any real group structure to be identified in the

dataset, the optimal level of PC retention will be the one at which our ability

to assign individuals to their real groups exceeds by the greatest margin our

ability to assign individuals to the false groupings, calculated as Pt – Pr,

ie. probability of reassignment to the True cluster vs. the Random cluster. With

too few PCs, the probability of successful reassignment will be low for both

the true clusters and the random ones. On the other hand, with too many PCs,

you have so much information retained that you could paint effectively any

picture of groupings in the data, so reassignment success to the false clusters

will begin to approach that to the true clusters and the a-score will decline,

once again leaving a goldilocks point in the middle of the arc indicating the

optimal number of PCs to retain. <br><br></p>

<p class="MsoNormal">          The results of cross-validation and optim.a.score should not

give completely contradictory results, but they may not always give the same result. If results differed, we would always recommend

that you use the results of xvalDapc over optim.a.score, hence you may as well

just not worry about optim.a.score in the first place. </p>

<p class="MsoNormal"> </p>

<p class="MsoNormal">Hope that helps! </p>

<p class="MsoNormal">Best, <br>

Caitlin. </p></div><div class="gmail_extra"><br><div class="gmail_quote">On Sat, Sep 20, 2014 at 10:35 PM, Judy (Duffie), Caroline <span dir="ltr"><<a href="mailto:JudyC@si.edu" target="_blank">JudyC@si.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div style="word-wrap:break-word">

Dear Dr. Collins,

<div><br>

</div>

<div>I was wondering if you could help me understand the difference between using a.score.opt. vs. xvalDapc. It seems that both methods are used to determine the number of PCs to retain in the DAPC. Why and when would you use one method vs. the other?</div>

<div><br>

</div>

<div>Thanks for any clarification you can offer. I’ve been through the papers and the tutorials, but am still trying to wrap my mind around these procedures.</div>

<div><br>

</div>

<div>Caroline</div>

<div><br>

<div><span style="border-collapse:separate;color:rgb(0,0,0);font-family:Helvetica;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:-webkit-auto;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">

<div>Caroline D. Judy, PhD Candidate & Peter Buck Fellow</div>

<div>National Museum of Natural History</div>

<div>Smithsonian Institution</div>

<div><a href="mailto:judyc@si.edu" target="_blank">judyc@si.edu</a></div>

<div><br>

</div>

</span><br>

</div>

<br>

</div>

</div>

</blockquote></div><br></div>