[adegenet-forum] PCA with tetraploid data

Jombart, Thibaut t.jombart at imperial.ac.uk
Tue May 10 11:16:07 CEST 2011


Hello,

df2genind does that annoying job for you. All you need is to read your data into R as a data.frame with one column for each locus, each genotype being a series of separated alleles. For instance:
####
> dat = data.frame(loc1=c("80/80/78/60","60/60/60/60","78/80/80/82"), loc2=c("50/55/60/75","50/50/50/50","55/55/55/55"))
> dat
         loc1        loc2
1 80/80/78/60 50/55/60/75
2 60/60/60/60 50/50/50/50
3 78/80/80/82 55/55/55/55

> x=df2genind(dat, sep="/", ploidy=4)
> x

   #####################
   ### Genind object ###
   #####################
- genotypes of individuals -

S4 class:  genind
@call: df2genind(X = dat, sep = "/", ploidy = 4)

@tab:  3 x 8 matrix of genotypes

@ind.names: vector of  3 individual names
@loc.names: vector of  2 locus names
@loc.nall: number of alleles per locus
@loc.fac: locus factor for the  8 columns of @tab
@all.names: list of  2 components yielding allele names for each locus
@ploidy:  4
@type:  codom

Optionnal contents:
@pop:  - empty -
@pop.names:  - empty -

@other: - empty -

> truenames(x)
  loc1.60 loc1.78 loc1.80 loc1.82 loc2.50 loc2.55 loc2.60 loc2.75
1    0.25    0.25     0.5    0.00    0.25    0.25    0.25    0.25
2    1.00    0.00     0.0    0.00    1.00    0.00    0.00    0.00
3    0.00    0.25     0.5    0.25    0.00    1.00    0.00    0.00
####

So that you can perform a PCA on truenames(x), or better on a centred/scaled version of this matrix using scaleGen(x).

Best

Thibaut

--
######################################
Dr Thibaut JOMBART
MRC Centre for Outbreak Analysis and Modelling
Department of Infectious Disease Epidemiology
Imperial College - Faculty of Medicine
St Mary’s Campus
Norfolk Place
London W2 1PG
United Kingdom
Tel. : 0044 (0)20 7594 3658
t.jombart at imperial.ac.uk
http://sites.google.com/site/thibautjombart/
http://adegenet.r-forge.r-project.org/
________________________________
From: adegenet-forum-bounces at r-forge.wu-wien.ac.at [adegenet-forum-bounces at r-forge.wu-wien.ac.at] on behalf of AVIK RAY [avik.ray.kol at gmail.com]
Sent: 09 May 2011 20:00
To: adegenet-forum at r-forge.wu-wien.ac.at
Subject: [adegenet-forum] PCA with tetraploid data

Dear Dr Jombart
I want to do PCA and other analyses in adegenet, however my data is tetraploid dataset, 204 individuals, 7 microsatellite loci, so it is not read using read.structure (as you mentioned in your earlier mails to Sarah Castillo (19/10/2010, RE: Looking for help with a PCA using adegenet in R);
So far I’ve understood from the code is instead of coding each individual for each locus as in read.structure (diploid data) idea is to get the allele freq for each allele (whether present or absent) and then code each individual genotypes accordingly, However, I did not get the last part of the code, e.g.
………….
$pop
[1] ON ON ON ON ON ON ON ON
Levels: ON
> genind2df(x, sep="/")
pop gen
……………
Moreover, it seems extremely cumbersome for large datasets like mine (204 indiv, 7 microsat loci); can you give any suggestion/s??
Thanks
best regards
AVIK
 --

AVIK RAY
Visiting Fellow
National Center for Biological Sciences
Tata Institute of Fundamental Research
GKVK Campus
Bellary Road
Bangalore-560065
India
Ph 91-80-23666340
Fax 91-80-2363 6662

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20110510/95a07fde/attachment.htm>


More information about the adegenet-forum mailing list