[adegenet-forum] Data type/format and admixed individuals using DAPC

Jombart, Thibaut t.jombart at imperial.ac.uk
Mon Apr 18 20:29:09 CEST 2011


Hello,

DAPC is meant for quantitative data. One workaround is to transform your data first, i.e. using dummy vectors with some centring/scaling. This is done implicitly by the multiple correspondence analysis  (MCA, dudi.acm in ade4), the multivariate analysis dedicated to categorical data. For instance:
####
> f1 <- function(){factor(as.vector(replicate(2, sample(letters[1:4],50, p=runif(4), replace=TRUE))))} # generates 100 indiv following two different distributions
> f1()
  [1] b b b b b c d c d a d c b c d b b b c b c c b b b c b d d b d d d b d d d
 [38] c b b b b c d d b d b b c b c d b c c d b d b d c d b a c a b c b b c b b
 [75] b a b b b d d b b b b d b a b b d b c b d b d b c d
Levels: a b c d

> barplot(unlist(lapply(split(x,rep(1:2,each=50)),table))) # show the differences, for one 'loci'
> dat <- data.frame(lapply(1:10, function(i) f1()))
> names(dat) <- paste("variable",1:10)
> mca1 <- dudi.acm(dat,scannf=FALSE, nf=10) # replace "nf " by the nb of factors you want
> fac <- factor(rep(1:2, each=50)) # in practice, replace with the groups
> s.class(mca1$li, fac=fac) # to see the MCA results

## then in find.clusters and dapc, use mca1$tab as the data, and specify dudi=mca1; e.g.:
> grp <- find.clusters(mca1$tab, dudi=mca1, n.iter=1e5, n.start=30, n.pc=10, n.clust=2) # find.clusters
> table(grp$grp, fac) # I find about 90% accurate classification

> dapc1 <- dapc(mca1$tab, fac, dudi=mca1, n.pca=10, n.da=1) # dapc
> scatter(dapc1) # plot results - here there's just one dimension
####

To ensure that the "dudi" argument will be correctly taken into account, you will need to use the devel version of adegenet (see download section on the website).

Also, be aware that so far uniform weights are used for all variables, meaning that in your analysis factors with more levels will likely be given stronger weight in the analysis.

All the best,

Thibaut


________________________________
From: adegenet-forum-bounces at r-forge.wu-wien.ac.at [adegenet-forum-bounces at r-forge.wu-wien.ac.at] on behalf of Hugo Gante [hugo.gante at gmail.com]
Sent: 18 April 2011 15:24
To: adegenet-forum at r-forge.wu-wien.ac.at
Subject: [adegenet-forum] Data type/format and admixed individuals using DAPC

Hi,
Perhaps someone could help me out with a basic file format question?
To run DAPC can I use qualitative (coded) data or do I have to use quantitative data since it first runs a PCA? I found some information about data file format (matrix vs tabular?) and data type (quantitative vs characters) but some clarification on usage and where to find more detail (examples?) on file formats would be most appreciated.

Also, I was wondering how admixed individuals are treated and if they will be identified by DAPC?

Thanks in advance!
Best,
Hugo

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20110418/20093527/attachment.htm>


More information about the adegenet-forum mailing list