[adegenet-forum] 'PCA on genind' figure output
Thibaut Jombart
jombart at biomserv.univ-lyon1.fr
Thu May 22 19:45:42 CEST 2008
Dear Colin,
sorry again that this email has been held for so long.
> Hello,
>
> I was recently introduced to this package and it looks very useful. I
> have a few questions regarding 'PCA on genind objects' as in the
> tutorial.
>
>
> I have 2 populations with some a priori putative hybrid individuals.
> I have run a PCA on my genind data object and the resulting figure is
> interesting. However, I would like to be able to identify these a
> priori putative hybrids on the plot (colour/shape/label?) but cannot
> figure out how to do this.
To do this, you need to have a factor identifying the putative type of
each individual: first pop, second pop, or hybrid.
Here is an example:
#### R code
library(adegenet)
library(ade4)
data(microbov)
## first, isolate each breed
temp <- seppop(microbov)
names(temp)
salers <- temp$Salers
zebu <- temp$Zebu
## we simulate hybrids here
zebler <- hybridize(salers, zebu, n=40)
## repool data: "dat" is a genind with individuals from 2 populations
(salers, zebu) and hybrids (zebler)
dat <- repool(salers,zebu,zebler)
## perform the PCA after replacing NAs
dat <- na.replace(dat,method="mean")
pca1 <- dudi.pca(dat at tab,scannf=FALSE,scale=FALSE)
## build a factor identifying the type of the individuals
fac <- dat at pop
levels(fac) <- dat at pop.names
head(fac)
## scatterplot
s.class(pca1$li,fac)
#### end R code
> I would also like to be able to turn off the lines from individuals to
> centroids and perhaps colour code (or use shapes) to identify which
> population the individuals are from.
These features are handled through arguments of s.class (in particular
axesell and col).
For instance:
#### R code
levels(fac)
col <- c("red", "blue", "darkviolet")
s.class(pca1$li, fac, col=col, axesel=FALSE)
add.scatter.eig(pca1$eig, 2,1,2)
#### end R code
>
> Also, it looks as though there is some interesting substructure that
> was not apparent in other analyses. Can I extract individuals from
> clusters?
It depends on what the 'clusters' are. You can use a clustering
algorithm on the PCA scores ($li or $l1) to define your groups (see the
function hclust in stats package). If clusters are groups of points
clustered together on the factorial map, then you may try finding the
corresponding range on both axes using 'locator', and then isolate the
corresponding individuals.
Maybe there is also a way to do this interactively using ade4TkGUI.
Anyone knowing this?
Anyway, both these approaches could be described in the tutorial... I'll
add this when I can.
Best regards,
Thibaut.
--
######################################
Thibaut JOMBART
CNRS UMR 5558 - Laboratoire de Biométrie et Biologie Evolutive
Universite Lyon 1
43 bd du 11 novembre 1918
69622 Villeurbanne Cedex
Tél. : 04.72.43.29.35
Fax : 04.72.43.13.88
jombart at biomserv.univ-lyon1.fr
http://biomserv.univ-lyon1.fr/%7Ejombart/
http://adegenet.r-forge.r-project.org/
More information about the adegenet-forum
mailing list