[adegenet-forum] xvalDapc confusion

Jombart, Thibaut t.jombart at imperial.ac.uk
Mon Feb 17 16:31:28 CET 2014


Hello there, 
To reply to the various points:

> First, is xvalDapc running your DAPC or just validating the parameters (PCAs) to use for running a separate DAPC?

It runs a bunch of DAPCs with varying numbers of PCA axes retained, each time with a bootstrapped sample of the data.

> And related to that, what can you do with the results from xvalDapc?  For example do you run xvalDapc, see what number of PCAs give you the highest success, then run a 'regular' DAPC choosing the PCA number from xvalDapc results?  

Yes.

> Or do you do the opposite...run DAPC first using what you think is the best number of PCAs, then run xvalDapc to validate the number of PCAs you originally chose?  Or both? (or neither?)

The main use is the previous statement - get the right number of PCA axes. This said, once you settle for a number of PCA axes and thus for a DAPC, xvalDapc still gives you some interesting information about how reliable your group membership prediction is. 

> Ultimately I am still wanting to make a scatter plot of my groups for publication. So I supposed I still need to run a single DAPC to do that and can't use the xvalDapc results somehow...right?

I'd recommend doing the above. Get an idea of the optimal number of PCA axes, then use one DAPC to make the scatterplot. Reliability of the results in terms of group prediction can be assessed by running xvalDapc.


> Second, in the output for the xvalDapc function what are the numbers under the success column?  I was thinking they were assignment success, but if you do not specify either result="groupMean" or result="overall", which result are you getting?  I have tried it all three ways (not specifying a result, using groupMean, and using overall) and have gotten very different numbers for each (all are close to or above 0.90, but results become more variable when I specify a result argument).

"groupMean" is the default. As for the difference, from the 'details' section of the doc:
"DAPC is performed on a training set, typically
 made of 90% of the observations, and then used to predict the
 groups of the 10% remaining observation. Current method uses the
 average prediction success per group (result="groupMean"), or the
 overall prediction success (result="overall").
"

Thus groupMean will even out differences due to group sizes, while "overall" will reflect more the larger groups. Makes sense?

> I apologize in advance if my confusion is just a result of a brain fart due to the crappy cold weather the northeast US has been having.

I hope nothing that bad happens to your brain. As for the weather, I'm currently working on it; improvements will hopefully be part of the next release of adegenet.

Best
Thibaut


________________________________________
From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Nikki Vollmer [nlv209 at hotmail.com]
Sent: 17 February 2014 15:04
To: adegenet-forum at lists.r-forge.r-project.org
Subject: [adegenet-forum] xvalDapc confusion

Hi all,


I have used DAPC for my studies a bunch in the past, and am now curious to see how applying xvalDapc to the procedure affects things. 




Second, in the output for the xvalDapc function what are the numbers under the success column?  I was thinking they were assignment success, but if you do not specify either result="groupMean" or result="overall", which result are you getting?  I have tried it all three ways (not specifying a result, using groupMean, and using overall) and have gotten very different numbers for each (all are close to or above 0.90, but results become more variable when I specify a result argument).


Thank you in advance for any help you can offer, I really appreciate it!


Nikki


More information about the adegenet-forum mailing list