[adegenet-forum] xcal/optim.a.score consistency
Thibaut Jombart
thibautjombart at gmail.com
Mon Oct 3 12:59:33 CEST 2016
Hi Alexandre,
I would not trust the automatic selection of the optimal space dimension
unless you are looking at simulated data and you need to run the analysis
100s of times. There are 2 questions here:
# stability of xvalDapc output
As this is a stochastic process, changing results are to be expected. It
may be the case that you need to increase the number of replicates for
results to stabilise a bit.
If you haven't yet, check the tutorials for some guidelines on this, but
basically you want to select the smallest number of dimensions that gives
the best classification outcome (i.e. the 'elbow' in the curve). If there
is no elbow, there may be no structure in the data - check that the %
successful re-assignment is better than expected at random. If the %
successful re-assignment plateaus, various numbers of PCs might lead to
equivalent solutions, but at the very least the structures should remain
stable.
# cross validation vs optim.a.score
Simple: go with cross validation. The 'a-score' was meant as a crude
measure of goodness of fit of DAPC results, but cross-validation makes more
sense.
Hope this helps
Thibaut
--
Dr Thibaut Jombart
Lecturer, Department of Infectious Disease Epidemiology, Imperial College
London
Head of RECON: https://repidemicsconsortium.org
https://sites.google.com/site/thibautjombart/
https://github.com/thibautjombart
Twitter: @TeebzR <https://twitter.com/TeebzR>
On 29 September 2016 at 10:02, Alexandre Lemo <
alexandros.lemopoulos at gmail.com> wrote:
> Dear Dr. Jombart and *adegenet* users,
>
> I am trying to run a DPCA on a dataset of 3975 SNPS obtained through RAD
> sequencing. Tere are 11 populations and 306 individuals examined here
> (minmum 16 ind /pop). Note that I am not using the find.cluster function.
>
> My problem is that I can't get any consistency in the number of PC that I
> should use for the DPCA. Actually, everytime I run *optim.a.score* or
> *xval*, I get different results. I tried changing the training set (tried
> 0.7, 0.8 and 0.9) but still the optimal PC retained change in each run.
>
>
> Here is an example of my script:
>
> #str is a genind object
>
>
>
> *optim_PC <- xvalDapc(tab(str, NA.method = "mean", training.set =0.9),
> pop(str), n.pca = 5:100, n.rep = 1000,
> parallel = "snow", ncpus = 4L*
>
>
>
>
>
>
> *optim_PC_2<- xvalDapc(tab(str, NA.method = "mean", training.set =0.9),
> pop(str), n.pca = 5:100, n.rep = 1000,
> parallel = "snow", ncpus = 4L*What happens
> here is that optim_PC will give me an optimal PC of (e.g) 76 while
> optim_PC_2 will give me 16. I tried running this several times and
> everytime results are different.
>
>
> I also tried using optim.a.score() :
>
>
>
> *dapc.str <- dapc(str, var.contrib = TRUE, scale = FALSE, n.pca = 100,n.da
> = NULL)*
> *optim.a.score (dapc.str)*
>
> Here, the number of PC will change everytime I run the function.
>
>
> Does anyone have an idea of why this is happening or had several issues? I
> am quite confused as results obviously change a lot depending on how many
> PC are used...
>
> Thanks for your help and for this great adegenet package!
>
> Best,
>
> Alexandre
>
>
> _______________________________________________
> adegenet-forum mailing list
> adegenet-forum at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/
> listinfo/adegenet-forum
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20161003/c1ad9a86/attachment.html>
More information about the adegenet-forum
mailing list