[adegenet-forum] Data type/format and admixed individuals using DAPC

Mon Apr 18 21:53:46 CEST 2011

Yes, MDS is fine, but you'll lose variable contributions. I don't think admixture plays a role here.
Cheers
Thibaut
________________________________
From: Hugo Gante [hugo.gante at gmail.com]
Sent: 18 April 2011 20:15
To: Jombart, Thibaut
Cc: adegenet-forum at r-forge.wu-wien.ac.at
Subject: Re: [adegenet-forum] Data type/format and admixed individuals using DAPC

Dear Thibaut,
Thanks for the detailed reply!
Along the same lines, would non-metric multidimensional scaling be another alternative to MCA? Which one (if any) would deal better with admixed individuals??
Best,
Hugo

On Mon, Apr 18, 2011 at 8:29 PM, Jombart, Thibaut <t.jombart at imperial.ac.uk<mailto:t.jombart at imperial.ac.uk>> wrote:
Hello,

DAPC is meant for quantitative data. One workaround is to transform your data first, i.e. using dummy vectors with some centring/scaling. This is done implicitly by the multiple correspondence analysis  (MCA, dudi.acm in ade4), the multivariate analysis dedicated to categorical data. For instance:
####
> f1 <- function(){factor(as.vector(replicate(2, sample(letters[1:4],50, p=runif(4), replace=TRUE))))} # generates 100 indiv following two different distributions
> f1()
  [1] b b b b b c d c d a d c b c d b b b c b c c b b b c b d d b d d d b d d d
 [38] c b b b b c d d b d b b c b c d b c c d b d b d c d b a c a b c b b c b b
 [75] b a b b b d d b b b b d b a b b d b c b d b d b c d
Levels: a b c d

> barplot(unlist(lapply(split(x,rep(1:2,each=50)),table))) # show the differences, for one 'loci'
> dat <- data.frame(lapply(1:10, function(i) f1()))
> names(dat) <- paste("variable",1:10)
> mca1 <- dudi.acm(dat,scannf=FALSE, nf=10) # replace "nf " by the nb of factors you want
> fac <- factor(rep(1:2, each=50)) # in practice, replace with the groups
> s.class(mca1$li, fac=fac) # to see the MCA results

## then in find.clusters and dapc, use mca1$tab as the data, and specify dudi=mca1; e.g.:
> grp <- find.clusters(mca1$tab, dudi=mca1, n.iter=1e5, n.start=30, n.pc=10, n.clust=2) # find.clusters
> table(grp$grp, fac) # I find about 90% accurate classification

> dapc1 <- dapc(mca1$tab, fac, dudi=mca1, n.pca=10, n.da=1) # dapc
> scatter(dapc1) # plot results - here there's just one dimension
####

To ensure that the "dudi" argument will be correctly taken into account, you will need to use the devel version of adegenet (see download section on the website).

Also, be aware that so far uniform weights are used for all variables, meaning that in your analysis factors with more levels will likely be given stronger weight in the analysis.

All the best,

Thibaut

________________________________
From: adegenet-forum-bounces at r-forge.wu-wien.ac.at<mailto:adegenet-forum-bounces at r-forge.wu-wien.ac.at> [adegenet-forum-bounces at r-forge.wu-wien.ac.at<mailto:adegenet-forum-bounces at r-forge.wu-wien.ac.at>] on behalf of Hugo Gante [hugo.gante at gmail.com<mailto:hugo.gante at gmail.com>]
Sent: 18 April 2011 15:24
To: adegenet-forum at r-forge.wu-wien.ac.at<mailto:adegenet-forum at r-forge.wu-wien.ac.at>
Subject: [adegenet-forum] Data type/format and admixed individuals using DAPC

Hi,
Perhaps someone could help me out with a basic file format question?
To run DAPC can I use qualitative (coded) data or do I have to use quantitative data since it first runs a PCA? I found some information about data file format (matrix vs tabular?) and data type (quantitative vs characters) but some clarification on usage and where to find more detail (examples?) on file formats would be most appreciated.

Also, I was wondering how admixed individuals are treated and if they will be identified by DAPC?

Thanks in advance!
Best,
Hugo

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20110418/90cf1165/attachment-0001.htm>