<div dir="ltr"><p class="MsoNormal">Good question. <br>
<br>
Essentially these are just two different approaches to the same
problem of trying to find the optimal number of PCs to retain in DAPC. The
short answer is: <b>Use xvalDapc instead of optim.a.score.</b></p><p class="MsoNormal">optim.a.score was our first approach, and xvalDapc is our
new and improved approach… xvalDapc is easier
to interpret and is likely to give better results.</p><p class="MsoNormal"></p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">------</p>
<p class="MsoNormal">If you’re just generally curious about the two approaches, I
can offer a brief description and an explanation of the way I think about them,
at least: <br><br></p>
<p class="MsoNormal">Both methods rely on repeated measurements to perform model
validation relating to the impact of the number of PCs on the ability of the
model to predict the correct group membership of all individuals in the
dataset. <br><br></p>
<p class="MsoNormal"> In cross-validation with xvalDapc, DAPC is performed (with
increasing numbers of PCs) on a “training set” (typically 90% of the dataset)
and then we project the individuals left out of the analysis onto the
discriminant axes constructed by DAPC. We measure how accurately we can place
this left-out 10% of individuals in the multidimensional space (in which their
position corresponds to their group membership). With too few PCs retained, we
fail to correctly assign the validation set of individuals to the correct
groups because we simply do not have enough information. With too many PCs
retained, we also begin to fail to correctly assign these individuals, because
essentially now all we are doing is over-describing each of the individuals in
the training set instead of painting a general picture of just those features
that relate to their group structure. This over-description merely adds “noise”
that drowns out the group-defining “signal” that we had been attempting to
summarise. We perform the cross-validation procedure repeatedly (each time varying
the number of PCs retained) with different training and validation sets until
we find the right signal-to-noise ratio, the goldilocks point between weak
discrimination and unstable results. <br><br></p>
<p class="MsoNormal"> When using the a.score to achieve this aim, we repeatedly
perform DAPC with different numbers of retained PCs; but, by contrast to
xvalDapc, we keep all individuals in the analysis. Instead, with optim.a.score,
at each level of PC retention, we measure reassignment success to the real
populations of interest, and also measure that “success” to fake randomized
populations. If there is any real group structure to be identified in the
dataset, the optimal level of PC retention will be the one at which our ability
to assign individuals to their real groups exceeds by the greatest margin our
ability to assign individuals to the false groupings, calculated as Pt – Pr,
ie. probability of reassignment to the True cluster vs. the Random cluster. With
too few PCs, the probability of successful reassignment will be low for both
the true clusters and the random ones. On the other hand, with too many PCs,
you have so much information retained that you could paint effectively any
picture of groupings in the data, so reassignment success to the false clusters
will begin to approach that to the true clusters and the a-score will decline,
once again leaving a goldilocks point in the middle of the arc indicating the
optimal number of PCs to retain. <br><br></p>
<p class="MsoNormal"> The results of cross-validation and optim.a.score should not
give completely contradictory results, but they may not always give the same result. If results differed, we would always recommend
that you use the results of xvalDapc over optim.a.score, hence you may as well
just not worry about optim.a.score in the first place. </p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">Hope that helps! </p>
<p class="MsoNormal">Best, <br>
Caitlin. </p></div><div class="gmail_extra"><br><div class="gmail_quote">On Sat, Sep 20, 2014 at 10:35 PM, Judy (Duffie), Caroline <span dir="ltr"><<a href="mailto:JudyC@si.edu" target="_blank">JudyC@si.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div style="word-wrap:break-word">
Dear Dr. Collins,
<div><br>
</div>
<div>I was wondering if you could help me understand the difference between using a.score.opt. vs. xvalDapc. It seems that both methods are used to determine the number of PCs to retain in the DAPC. Why and when would you use one method vs. the other?</div>
<div><br>
</div>
<div>Thanks for any clarification you can offer. I’ve been through the papers and the tutorials, but am still trying to wrap my mind around these procedures.</div>
<div><br>
</div>
<div>Caroline</div>
<div><br>
<div><span style="border-collapse:separate;color:rgb(0,0,0);font-family:Helvetica;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:-webkit-auto;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">
<div>Caroline D. Judy, PhD Candidate & Peter Buck Fellow</div>
<div>National Museum of Natural History</div>
<div>Smithsonian Institution</div>
<div><a href="mailto:judyc@si.edu" target="_blank">judyc@si.edu</a></div>
<div><br>
</div>
</span><br>
</div>
<br>
</div>
</div>
</blockquote></div><br></div>