[adegenet-forum] xvalDapc error message
Jombart, Thibaut
t.jombart at imperial.ac.uk
Tue Jun 4 12:13:14 CEST 2013
Hello,
In your case, cross-validation is bound to be problematic. The distribution of your groups is:
> table(grp)
grp
1 2 3 4 5 6 7
24 6 9 4 22 24 33
With large training sets, chances some groups won't be cross-validated - in this case, the devel version now issues a meaningful warning. With small training sets, some groups may not be represented at all, causing the discriminating space to skrink by one or more dimension, and currently causing a more cryptic error. I am still wondering what the best default behaviour should be in this case.
Short solution for you is to specify n.da and fix a value of e.g. 4, so that the dimension of your discriminating space does not have to be determined by the number of groups. But the core problem remain - largely unequal group sizes in overall small dataset are not well suited for cross validation.
Cheers
Thibaut
________________________________________
From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Julian Dupuis [jrdupuis at ualberta.ca]
Sent: 31 May 2013 20:06
To: adegenet-forum at lists.r-forge.r-project.org
Subject: [adegenet-forum] xvalDapc error message
Hello,
I am trying to use the new xvalDapc function to determine the ideal number of PCs to retain in my DAPC analysis, but am having trouble getting it to work. Here's the code I'm inputting:
xval <- xvalDapc(JRD1NoNa at tab, pop(JRD1), n.pca.max=150, n.da=NULL, n.pca=NULL, center=TRUE, scale=FALSE, n.rep=10)
And this is the error message I receive:
Error in ldaX$scaling[, 1:n.da, drop = FALSE] : subscript out of bounds
I've searched around for similar problems, but haven't found anything relating specifically to the lda function in MASS. I'm wondering if it might just be a problem with MASS being out of date with the new version of R/adegenet?
Any help would be appreciated, and please let me know if I could include anything else to help identify the problem (my R expertise is pretty minimal). Also, if anyone has any insight/opinions on alternate ways to determine the ideal number of PCs to retain in a DAPC (e.g. the optim.a.score function), I would be interesetd to hear them.
Thanks in advance,
Julian Dupuis
--
Julian Rowe Dupuis
Ph.D. Candidate
Dept of Biological Sciences
CW405, Biol. Sci. Centre
University of Alberta
Edmonton, Alberta, CAN
T6G 2E9
Office: Earth Sciences 1-52A
More information about the adegenet-forum
mailing list