[adegenet-forum] Fwd: DAPC

Fri Jul 31 03:57:08 CEST 2015

---------- Forwarded message ----------
From: Caitlin Collins <caitiecollins at gmail.com>
Date: Thu, Jul 30, 2015 at 3:04 PM
Subject: Re: [adegenet-forum] DAPC
To: Carly Graham <graham9c at gmail.com>

Hi,

I'm glad you asked.

First things first, you are correct in interpreting from the xvalDapc
results that you should use 100 PCs when running DAPC. (Though, I should
say that you don't actually need to *run* DAPC after running xvalDapc, as
the output of xvalDapc contains a dapc object that has been made by running
DAPC with the optimal number of PCs as indicated by the lowest RMSE).

Now, I think I may know where your confusion might lie:

When you perform cross-validation with xvalDapc, the way it works is that
DAPC gets run with varying numbers of PCs retained, *with some proportion
of the data left out* (this is specified by the argument "training.set", by
default 0.9). So, what "Mean Successful Assignment" is telling you is that,
after xvalDapc ran DAPC with 100 PCs using only 90% of the, it was able to
correctly place the left-out 10% of the data in the right group only
20.13889% of the time. The point of doing this is to identify the number of
PCs that, when kept, allows us to generate a DAPC with the most informative
and generalisable results.

By contrast, the part of the output from summary(dapc) to which I think you
are referring (presumably you mean "assign.prop"?) is not actually a
*probability*: it is a *proportion*. Moreover, it is the proportion of
individuals that were successfully assigned to the correct group *when the
DAPC was run with all of the data*. This helps to explain why it is usually
much higher than the "Mean Successful Assignment" of xvalDapc, even with
the same number of PCs. While cross-validation was trying to *predict* the
likely group membership of unseen individuals whose data was not used to
build the model,  DAPC is reporting the percent successful assignment of
individuals to their groups based on a model built with data from these
individuals and all others in the dataset.

Now that this is hopefully beginning to become a little more clear, I
should say (in case you have, or will, notice this and become confused)
that the method used by DAPC to assign individuals to groups (leading to
the assignment *proportion* reported by summary(dapc)) is a probabilistic
one. Essentially, DAPC makes a probabilistic assessment based on the model.
DAPC first asks: "based on the coordinate system I have generated, and
given the data from this individual, where should I place this individual
in multivariate space?". Then, for some individuals, it is incredibly clear
that they have ended up in a corner of multivariate space that is defined
and occupied by the members of a given group. For others, however, who may
be placed in the space at the edge of a group or between two groups, their
"most likely true group" is less clear. DAPC calculates a probability that
each individual belongs in each group (storing these in the "$posterior"
slot of the dapc object), and assigns individuals to the group for which
their posterior probability is highest. Note that the "assign.prop" element
of the output of summary(dapc) does not actually contain any information on
these probabilities directly, it just tells you what proportion of the
ultimate assignments were correct.

Taking this all into consideration, if you were actually hoping to report
an "assignment *probability*", the answer is not as straightforward as
choosing either "assign.prop" or "Mean Successful Assignment". "Mean
Successful Assignment" might tell you more about the ability of your model
to make predictions about the group memberships of individuals that are not
in your sample. *But*, if your sample happened to be a perfectly
*representative
*sample, this would be unfair. Clearly, given all the data in the sample,
DAPC was able to build a model that accurately placed individuals in the
correct group 87.01923% of the time. Furthermore, *this* is the DAPC
model/output you are actually talking about (ie. the one built with all the
data, not just 90% of it), so this figure is more indicative of the success
of your model. Altogether, if you still wanted to make a statement about
the ability of your model to make predictions about data that was not in
your sample, the truth would presumably be somewhere in between those two
numbers. But when talking about the DAPC outcome that you actually achieved
with your dataset, the true successful assignment attained was 87.01923%
(just don't call it a probability!).

Hope that helps.

Cheers,
Caitlin.

On Wed, Jul 29, 2015 at 10:51 PM, Carly Graham <graham9c at gmail.com> wrote:

> Hello,
>
> I have been looking at the population structure of whitefish in a small
> lake area. I used xvalDapc to determine the optimal number of PCs to retain
> for the dapc analysis. When I look at the output from this xval command I
> see that the following:
>
> $`Median and Confidence Interval for Random Chance`
>       2.5%        50%      97.5%
> 0.08354526 0.12267036 0.17258877
>
> $`Mean Successful Assignment by Number of PCs of PCA`
>        20        40        60        80       100       120       140
>   160       180
> 0.1319444 0.1694444 0.1680556 0.1875000 0.2013889 0.1291667 0.1666667
> 0.1250000 0.1125000
>
> $`Number of PCs Achieving Highest Mean Success`
> [1] "100"
>
> $`Root Mean Squared Error by Number of PCs of PCA`
>        20        40        60        80       100       120       140
>   160       180
> 0.8704578 0.8361065 0.8360719 0.8177360 0.8051124 0.8730468 0.8389395
> 0.8772458 0.8905042
>
> $`Number of PCs Achieving Lowest MSE`
> [1] "100"
>
>
> From this I have interpreted that I should use 100 PCs. From there when I
> run the dapc with 100 PCs and then look at output from summary(dapc) I have
> an assignment probability of 0.8701923. Where I am confused is how to
> interpret the “Mean Successful Assignment” from the above output. Does this
> also correspond to the assignment to populations? If so, is it more
> accurate to assume that the assignment probability is 0.2013889?
>
> Thanks,
>
>
> Carly Graham
> PhD Candidate
> University of Regina
>
>
>
>
>
>
> _______________________________________________
> adegenet-forum mailing list
> adegenet-forum at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20150731/a8a9f667/attachment.html>