<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<META NAME="Generator" CONTENT="MS Exchange Server version 6.5.7638.1">
<TITLE>RE: Looking for help with a PCA using adegenet in R</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/plain format -->
<P><FONT SIZE=2>Hello again<BR>
About the ploidy. We are looking at MHC genes, specifically MHC DRB exon 2. We found that it is duplication, with individuals possessing between 2-4 alleles (Castillo et al. 2010). Individuals with 2 alleles are presumed to be homozygous at each locus (hence why genotype would be 1 2 -9 -9).<BR>
We are having trouble running our data in the usual genetic software due to the ploidy issue, we are sure that it is duplicated therefore ploidy is up to tetraploid, but individuals range with 2, 3, and 4 alleles.<BR>
<BR>
I am unsure of how else to represent this other than missing data (as they are still important)<BR>
<BR>
Therefore, the differences in number of alleles/individual is important for the structure and should be used during the analyzes. Any advice would be appreciated<BR>
<BR>
Sarrah<BR>
<BR>
<BR>
____________________<BR>
Sarrah Castillo<BR>
MSc Candidate<BR>
Environmental & Life Sciences Graduate Program<BR>
Trent University, 2140 East Bank Drive,<BR>
Peterborough, Ontario, K9J 7B8, Canada<BR>
e-mail:scastillo@nrdpfc.ca<BR>
<BR>
<BR>
<BR>
-----Original Message-----<BR>
From: Jombart, Thibaut [<A HREF="mailto:t.jombart@imperial.ac.uk">mailto:t.jombart@imperial.ac.uk</A>]<BR>
Sent: Tue 19/10/2010 12:37<BR>
To: Sarrah Castillo; adegenet-forum@lists.r-forge.r-project.org<BR>
Subject: RE: Looking for help with a PCA using adegenet in R<BR>
<BR>
Dear Sarrah,<BR>
<BR>
In this case read.structure should not be used - it is designed for diploid individuals only. Fortunately, you can still read your data in adegenet using df2genind.<BR>
<BR>
The trick consists in merging the 4 alleles into a single character string:<BR>
#####<BR>
> foo=read.table("foo.txt", head=TRUE)<BR>
> head(foo)<BR>
Ind Reg Al1 Al2 Al3 Al4<BR>
1 271 ON 7 10 11 -9<BR>
2 273 ON 2 10 13 -9<BR>
3 272 ON 4 11 12 -9<BR>
4 465 ON 1 2 -9 -9<BR>
5 472 ON 3 6 11 19<BR>
6 489 ON 2 3 4 19<BR>
> gen=apply(foo[,3:6],1,paste,collapse="/")<BR>
> gen<BR>
[1] "7/10/11/-9" "2/10/13/-9" "4/11/12/-9" "1/2/-9/-9" "3/6/11/19"<BR>
[6] "2/3/4/19" "7/12/-9/-9" "7/14/43/-9" "4/5/15/19" "7/14/20/26"<BR>
[11] "5/7/8/-9" "4/11/21/-9" "7/21/24/-9" "1/20/26/49" "7/16/20/26"<BR>
[16] "7/25/27/49" "3/19/25/49" "7/9/12/-9"<BR>
#####<BR>
<BR>
A problem in your data is that for a single locus and individual, it happens that some but not all data are missing (expl: "1/2/-9/-9"). Are these actual tetraploid data? Or is the actual ploidy unknown?<BR>
For now, I consider that frequencies cannot be inferred as soon as there is at least one NA.<BR>
<BR>
#####<BR>
> isNA=grep("-9",gen)<BR>
> gen[isNA] <- NA<BR>
> gen<BR>
[1] NA NA NA NA "3/6/11/19"<BR>
[6] "2/3/4/19" NA NA "4/5/15/19" "7/14/20/26"<BR>
[11] NA NA NA "1/20/26/49" "7/16/20/26"<BR>
[16] "7/25/27/49" "3/19/25/49" NA <BR>
#####<BR>
<BR>
We can now obtain the genind object:<BR>
<BR>
#####<BR>
> x=df2genind(data.frame(gen), ind.names=foo$Ind, pop=foo$Reg, sep="/", ploidy=4)<BR>
Warning message:<BR>
In df2genind(data.frame(gen), ind.names = foo$Ind, pop = foo$Reg, :<BR>
entirely non-type individual(s) deleted<BR>
> truenames(x)<BR>
$tab<BR>
gen.01 gen.02 gen.03 gen.04 gen.05 gen.06 gen.07 gen.11 gen.14 gen.15<BR>
472 0.00 0.00 0.25 0.00 0.00 0.25 0.00 0.25 0.00 0.00<BR>
489 0.00 0.25 0.25 0.25 0.00 0.00 0.00 0.00 0.00 0.00<BR>
466 0.00 0.00 0.00 0.25 0.25 0.00 0.00 0.00 0.00 0.25<BR>
749 0.00 0.00 0.00 0.00 0.00 0.00 0.25 0.00 0.25 0.00<BR>
319 0.25 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00<BR>
323 0.00 0.00 0.00 0.00 0.00 0.00 0.25 0.00 0.00 0.00<BR>
341 0.00 0.00 0.00 0.00 0.00 0.00 0.25 0.00 0.00 0.00<BR>
385 0.00 0.00 0.25 0.00 0.00 0.00 0.00 0.00 0.00 0.00<BR>
gen.16 gen.19 gen.20 gen.25 gen.26 gen.27 gen.49<BR>
472 0.00 0.25 0.00 0.00 0.00 0.00 0.00<BR>
489 0.00 0.25 0.00 0.00 0.00 0.00 0.00<BR>
466 0.00 0.25 0.00 0.00 0.00 0.00 0.00<BR>
749 0.00 0.00 0.25 0.00 0.25 0.00 0.00<BR>
319 0.00 0.00 0.25 0.00 0.25 0.00 0.25<BR>
323 0.25 0.00 0.25 0.00 0.25 0.00 0.00<BR>
341 0.00 0.00 0.00 0.25 0.00 0.25 0.25<BR>
385 0.00 0.25 0.00 0.25 0.00 0.00 0.25<BR>
<BR>
$pop<BR>
[1] ON ON ON ON ON ON ON ON<BR>
Levels: ON<BR>
<BR>
> genind2df(x, sep="/")<BR>
pop gen<BR>
472 ON 03/06/11/19<BR>
489 ON 02/03/04/19<BR>
466 ON 04/05/15/19<BR>
749 ON 07/14/20/26<BR>
319 ON 01/20/26/49<BR>
323 ON 07/16/20/26<BR>
341 ON 07/25/27/49<BR>
385 ON 03/19/25/49<BR>
#####<BR>
<BR>
Now you can use 'x' as any other genind object:<BR>
#####<BR>
> Hs(x)<BR>
1<BR>
0.9238281<BR>
> summary(x)<BR>
# Total number of genotypes: 8<BR>
<BR>
# Population sample sizes: <BR>
ON<BR>
8<BR>
<BR>
# Number of alleles per locus: <BR>
L1<BR>
17<BR>
<BR>
[etc.]<BR>
#####<BR>
<BR>
Best regards,<BR>
<BR>
Thibaut<BR>
<BR>
<BR>
________________________________________<BR>
From: adegenet-forum-bounces@lists.r-forge.r-project.org [adegenet-forum-bounces@lists.r-forge.r-project.org] On Behalf Of Sarrah Castillo [scastillo@nrdpfc.ca]<BR>
Sent: 19 October 2010 16:20<BR>
To: adegenet-forum@lists.r-forge.r-project.org<BR>
Subject: [adegenet-forum] Looking for help with a PCA using adegenet in R<BR>
<BR>
Hello Dr. Jombart<BR>
I was wondering if you could help me with an issue I am having with your program (Adegenet) in R.<BR>
I am attempting to perform a PCA using a structure file. The difference is that this is based on tetraploid data. Structure allows for multiple ploidy, however I am unsure of how to have the program read my data as tetraploid instead of diploid. The genetic information is for a single locus with between 2-4 alleles (with -9 representing missing data)<BR>
<BR>
Here is an example of my file (with -9 representing missing data)<BR>
<BR>
Ind Reg Al1 Al2 Al3 Al4<BR>
271 ON 7 10 11 -9<BR>
273 ON 2 10 13 -9<BR>
272 ON 4 11 12 -9<BR>
465 ON 1 2 -9 -9<BR>
472 ON 3 6 11 19<BR>
489 ON 2 3 4 19<BR>
519 ON 7 12 -9 -9<BR>
551 ON 7 14 43 -9<BR>
466 ON 4 5 15 19<BR>
749 ON 7 14 20 26<BR>
111 ON 5 7 8 -9<BR>
173 ON 4 11 21 -9<BR>
318 ON 7 21 24 -9<BR>
319 ON 1 20 26 49<BR>
323 ON 7 16 20 26<BR>
341 ON 7 25 27 49<BR>
385 ON 3 19 25 49<BR>
485 ON 7 9 12 -9<BR>
<BR>
Ind=individual<BR>
Reg=region<BR>
Al1= allele 1<BR>
Al2= allele 2<BR>
Al3= allele 3<BR>
Al4= allele 4<BR>
<BR>
<BR>
Any help would be much appreciated.<BR>
<BR>
Thank you<BR>
Sarrah Castillo<BR>
<BR>
<BR>
____________________<BR>
Sarrah Castillo<BR>
MSc Candidate<BR>
Environmental & Life Sciences Graduate Program<BR>
Trent University, 2140 East Bank Drive,<BR>
Peterborough, Ontario, K9J 7B8, Canada<BR>
e-mail:scastillo@nrdpfc.ca<BR>
<BR>
<BR>
<BR>
</FONT>
</P>
</BODY>
</HTML>