[adegenet-forum] Fwd: Question about how to interpret Cross validation in my analysis. Thanks!

Thu Oct 16 14:28:13 CEST 2014

---------- Forwarded message ----------
From: Caitlin Collins <caitiecollins at gmail.com>
Date: Thu, Oct 16, 2014 at 1:27 PM
Subject: Re: Question about how to interpret Cross validation in my
analysis. Thanks!
To: Angela Merino <Angela.Merino at cawthron.org.nz>
Cc: "Collins, Caitlin" <caitlin.collins12 at imperial.ac.uk>, "Jombart,
Thibaut" <t.jombart at imperial.ac.uk>

Hi Angela,

Well, I have two pieces of good news for you, and one piece of mediocre
news.

First, there’s nothing to worry about with respect to the “NULL” that you
are seeing. It just gets printed when xval.plot=TRUE as an artefact of one
of the lines of the printing function. It has no meaning, and certainly
does not imply that your model is not valid. (Given the stress that I now
realise this glaring “NULL” may cause, I’ve changed the way the plots print
now, so in the next release of adegenet this won’t happen.)

Second, you are absolutely correct in your interpretation of the results of
xvalDapc (which are stored in whatever object you assigned the results to,
in your case, “xval”).

This brings me to the mediocre news: given that your interpretation is
correct, it seems that the best model you can achieve with DAPC, where
n.pca=25, is only able to predict the group membership of validation set
individuals in 63% of the cases, with a 32% root mean squared error.
Arguably, this is not great. Your final comment on the matter, though, is
quite insightful. The fact that you can achieve the same modest level of
success with 20-80 PCs indicates that the optimisation procedure has not
been particularly successful. Ideally, one would like to see an arch, with
a maximum success point somewhere in the middle. In your case, there is a
bit of an arch, but it isn’t particularly striking.

The only thing I might add to your interpretation of this result is that
it’s not so much that the model is poor because a similar level of success
can be achieved with variable numbers of PCs. If mean success was virtually
constant, but varying around 90%, the interpretation would not be that the
model is poor, but rather that most levels of PC retention can compose a
model that effectively discriminates between groups.

I hope this has helped answer some of your questions. If you have any more,
please feel free to ask.

Best,
Caitlin.

On Mon, Oct 13, 2014 at 11:48 PM, Angela Merino <
Angela.Merino at cawthron.org.nz> wrote:

>  Hi Caitlin Collins and Thibaut Jombart,
>
>
>
> My name is Angela Parody-Merino and I am a PhD student at Massey
> University (New Zealand). I am studying the population genetic structure in
> a migratory bird (the New Zealand Godwit) with 23 microsatellites. Anyway,
> maybe this is a very simple question but I really want to understand and be
> sure about the meaning and interpretation of the output when doing
> cross-validation. I have been some days looking in the internet and reading
> explanations etc…without being able to really understand what’s going on
> with my analysis. Could you help me please? J
>
>
>
> This is the script of the analysis:
>
> > x <- ELpop
>
> > mat <- as.matrix(na.replace(x, method="mean"))
>
>
>
> Replaced 371 missing values
>
> > grp <- pop(x)
>
> > xval <- xvalDapc(mat, grp, n.pca.max = 40, training.set = 0.9,
>
> + result = "groupMean", center = TRUE, scale = FALSE,
>
> + n.pca = NULL, n.rep = 500, xval.plot = TRUE)
>
> NULL *>>> What does it mean this NULL? Does it mean that the model is not
> valid?*
>
>  *$`Median and Confidence Interval for Random Chance`*
>
> *     2.5%       50%     97.5% *
>
> *0.4294840 0.4928747 0.5962807 *
>
>
>
> *$`Mean Successful Assignment by Number of PCs of PCA`*
>
> *        5        10        15        20        25        30
> 35        40 *
>
> *0.5871429 0.6000000 0.5819048 0.6014286 0.6952381 0.6747619 0.6333333
> 0.6109524 *
>
>
>
> *$`Number of PCs Achieving Highest Mean Success`*
>
> *[1] "25"*
>
>
>
> *$`Root Mean Squared Error by Number of PCs of PCA`*
>
> *        5        10        15        20        25        30
>    35        40 *
>
> *0.4301795 0.4141872 0.4389381 0.4131429 0.3241735 0.3531491 0.3885084
> 0.4145894 *
>
>
>
> *$`Number of PCs Achieving Lowest MSE`*
>
> *[1] "25"*
>
>
>
> *From the screenshot and the output results of the cross validation (in
> blue), I would say that my model (retaining 25PCs) can predict  with a mean
> of 63% but it is not such a good model because most of the models that can
> be obtained by retaining 20, 40, 60, 80 PCs are quite the same successful.
> Is it my interpretation correct?*
>
>
>
>
>
>
>
> Thanks in advance,
>
>
>
> Kind regards,
>
>
>
> ‘Angela Parody-Merino
>  ------------------------------
> *Attention: *
> This message is for the named person's use only.  It may contain
> confidential, proprietary or legally privileged information.  If you
> receive this message in error, please immediately delete it and all copies
> of it from your system, destroy any hard copies of it and notify the
> sender.  You must not, directly or indirectly, use, disclose, distribute,
> print, or copy any part of this message if you are not the intended
> recipient. Cawthron reserves the right to monitor all e-mail communications
> through its networks.  Any opinions expressed in this message are those of
> the individual sender, except where the message states otherwise and the
> sender is authorised to make that statement.
>
> This e-mail message has been scanned and cleared by *MailMarshal *
> ------------------------------
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20141016/99005459/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.jpg
Type: image/jpeg
Size: 48953 bytes
Desc: not available
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20141016/99005459/attachment-0001.jpg>