[adegenet-forum] adegenet - a.score.opt vs. xvalDAPC

Caitlin Collins caitiecollins at gmail.com
Fri Sep 26 18:27:48 CEST 2014


Good question.

Essentially these are just two different approaches to the same problem of
trying to find the optimal number of PCs to retain in DAPC. The short
answer is: *Use xvalDapc instead of optim.a.score.*

optim.a.score was our first approach, and xvalDapc is our new and improved
approach…  xvalDapc is easier to interpret and is likely to give better
results.



------

If you’re just generally curious about the two approaches, I can offer a
brief description and an explanation of the way I think about them, at
least:

Both methods rely on repeated measurements to perform model validation
relating to the impact of the number of PCs on the ability of the model to
predict the correct group membership of all individuals in the dataset.

          In cross-validation with xvalDapc, DAPC is performed (with
increasing numbers of PCs) on a “training set” (typically 90% of the
dataset) and then we project the individuals left out of the analysis onto
the discriminant axes constructed by DAPC. We measure how accurately we can
place this left-out 10% of individuals in the multidimensional space (in
which their position corresponds to their group membership). With too few
PCs retained, we fail to correctly assign the validation set of individuals
to the correct groups because we simply do not have enough information.
With too many PCs retained, we also begin to fail to correctly assign these
individuals, because essentially now all we are doing is over-describing
each of the individuals in the training set instead of painting a general
picture of just those features that relate to their group structure. This
over-description merely adds “noise” that drowns out the group-defining
“signal” that we had been attempting to summarise. We perform the
cross-validation procedure repeatedly (each time varying the number of PCs
retained) with different training and validation sets until we find the
right signal-to-noise ratio, the goldilocks point between weak
discrimination and unstable results.

          When using the a.score to achieve this aim, we repeatedly perform
DAPC with different numbers of retained PCs; but, by contrast to xvalDapc,
we keep all individuals in the analysis. Instead, with optim.a.score, at
each level of PC retention, we measure reassignment success to the real
populations of interest, and also measure that “success” to fake randomized
populations. If there is any real group structure to be identified in the
dataset, the optimal level of PC retention will be the one at which our
ability to assign individuals to their real groups exceeds by the greatest
margin our ability to assign individuals to the false groupings, calculated
as Pt – Pr, ie. probability of reassignment to the True cluster vs. the
Random cluster. With too few PCs, the probability of successful
reassignment will be low for both the true clusters and the random ones. On
the other hand, with too many PCs, you have so much information retained
that you could paint effectively any picture of groupings in the data, so
reassignment success to the false clusters will begin to approach that to
the true clusters and the a-score will decline, once again leaving a
goldilocks point in the middle of the arc indicating the optimal number of
PCs to retain.

          The results of cross-validation and optim.a.score should not give
completely contradictory results, but they may not always give the same
result. If results differed, we would always recommend that you use the
results of xvalDapc over optim.a.score, hence you may as well just not
worry about optim.a.score in the first place.



Hope that helps!

Best,
Caitlin.

On Sat, Sep 20, 2014 at 10:35 PM, Judy (Duffie), Caroline <JudyC at si.edu>
wrote:

>  Dear Dr. Collins,
>
>  I was wondering if you could help me understand the difference between
> using a.score.opt. vs. xvalDapc. It seems that both methods are used to
> determine the number of PCs to retain in the DAPC. Why and when would you
> use one method vs. the other?
>
>  Thanks for any clarification you can offer. I’ve been through the papers
> and the tutorials, but am still trying to wrap my mind around these
> procedures.
>
>  Caroline
>
>  Caroline D. Judy, PhD Candidate & Peter Buck Fellow
> National Museum of Natural History
> Smithsonian Institution
> judyc at si.edu
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20140926/e0921e8e/attachment.html>


More information about the adegenet-forum mailing list