Hugo Gante hugo.gante at gmail.com
Mon Apr 18 21:15:56 CEST 2011

```Dear Thibaut,
Along the same lines, would non-metric multidimensional scaling be another
alternative to MCA? Which one (if any) would deal better with admixed
individuals??
Best,
Hugo

On Mon, Apr 18, 2011 at 8:29 PM, Jombart, Thibaut
<t.jombart at imperial.ac.uk>wrote:

>  Hello,
>
> DAPC is meant for quantitative data. One workaround is to transform your
> data first, i.e. using dummy vectors with some centring/scaling. This is
> done implicitly by the multiple correspondence analysis  (MCA, dudi.acm in
> ade4), the multivariate analysis dedicated to categorical data. For
> instance:
> ####
> > f1 <- function(){factor(as.vector(replicate(2, sample(letters[1:4],50,
> p=runif(4), replace=TRUE))))} # generates 100 indiv following two different
> distributions
> > f1()
>   [1] b b b b b c d c d a d c b c d b b b c b c c b b b c b d d b d d d b d
> d d
>  [38] c b b b b c d d b d b b c b c d b c c d b d b d c d b a c a b c b b c
> b b
>  [75] b a b b b d d b b b b d b a b b d b c b d b d b c d
> Levels: a b c d
>
> > barplot(unlist(lapply(split(x,rep(1:2,each=50)),table))) # show the
> differences, for one 'loci'
> > dat <- data.frame(lapply(1:10, function(i) f1()))
> > names(dat) <- paste("variable",1:10)
> > mca1 <- dudi.acm(dat,scannf=FALSE, nf=10) # replace "nf " by the nb of
> factors you want
> > fac <- factor(rep(1:2, each=50)) # in practice, replace with the groups
> > s.class(mca1\$li, fac=fac) # to see the MCA results
>
> ## then in find.clusters and dapc, use mca1\$tab as the data, and specify
> dudi=mca1; e.g.:
> > grp <- find.clusters(mca1\$tab, dudi=mca1, n.iter=1e5, n.start=30,
> n.pc=10, n.clust=2) # find.clusters
> > table(grp\$grp, fac) # I find about 90% accurate classification
>
> > dapc1 <- dapc(mca1\$tab, fac, dudi=mca1, n.pca=10, n.da=1) # dapc
> > scatter(dapc1) # plot results - here there's just one dimension
> ####
>
> To ensure that the "dudi" argument will be correctly taken into account,
> the website).
>
> Also, be aware that so far uniform weights are used for all variables,
> meaning that in your analysis factors with more levels will likely be given
> stronger weight in the analysis.
>
> All the best,
>
> Thibaut
>
>
>  ------------------------------
> *From:* adegenet-forum-bounces at r-forge.wu-wien.ac.at [
> adegenet-forum-bounces at r-forge.wu-wien.ac.at] on behalf of Hugo Gante [
> hugo.gante at gmail.com]
> *Sent:* 18 April 2011 15:24
> DAPC
>
>  Hi,
> Perhaps someone could help me out with a basic file format question?
> To run DAPC can I use qualitative (coded) data or do I have to use
> quantitative data since it first runs a PCA? I found some information about
> data file format (matrix vs tabular?) and data type (quantitative vs
> characters) but some clarification on usage and where to find more detail
> (examples?) on file formats would be most appreciated.
>
>  Also, I was wondering how admixed individuals are treated and if they
> will be identified by DAPC?
>