[adegenet-forum] SNP alleles
Andrea Garavito
neagef at gmail.com
Tue Jun 17 11:12:40 CEST 2014
Hi everybody!
I'm currently trying to do a PCA analysis using a SNP matrix from a diploid
organism, most of them are bi-allelic.
Although the results that I obtain are logic in terms of previous knowledge
of the groups, I'm confused with the genind object that I obtain, and I
want to be sure about what's going on with the analysis.
My data file is formatted using the nucleotides as alleles and a "/"
separator, and missing data coded as "NA".
ind mk1 mk2
ind1 G/A C/T
ind2 G/G C/T
After loading my data matrix with the df2genid function my data is stored
as a matrix with for times the number of columns of the original file :
ind mk1.A mk1.G mk1.A mk1.G mk2.C mk2.T mk2.C mk2.T
ind1 0.5 0.0 0 0.5 0.0
0.5 0.5 0
ind2 0.0 0.5 0 0.5 0.0
0.5 0.5 0
Is that correct? I thought I would get two columns per marker loci instead
of 4.
>From there I obtain doubled statistics for each one of the alleles. Since I
don't know the phase, an A/G is the same as a G/A, so how can I have the
unified stats for each allele?
Thank you for your answer
Best regards
Andrea
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20140617/c128876b/attachment.html>
More information about the adegenet-forum
mailing list