[adegenet-forum] SNP alleles

Tue Jun 17 11:12:40 CEST 2014

Hi everybody!

I'm currently trying to do a PCA analysis using a SNP matrix from a diploid
organism, most of them are bi-allelic.
Although the results that I obtain are logic in terms of previous knowledge
of the groups, I'm confused with the genind object that I obtain, and I
want to be sure about what's going on with the analysis.
My data file is formatted using the nucleotides as alleles and a "/"
separator, and missing data coded as "NA".
ind    mk1    mk2
ind1  G/A    C/T
ind2  G/G    C/T
After loading my data matrix with the df2genid function my data is stored
as a matrix with for times the number of columns of the original file :

ind    mk1.A    mk1.G    mk1.A    mk1.G   mk2.C    mk2.T    mk2.C    mk2.T
ind1    0.5           0.0         0            0.5         0.0
0.5         0.5         0
ind2    0.0           0.5         0            0.5         0.0
0.5         0.5         0

Is that correct? I thought I would get two columns per marker loci instead
of 4.
>From there I obtain doubled statistics for each one of the alleles. Since I
don't know the phase, an A/G is the same as a G/A, so how can I have the
unified stats for each allele?

Thank you for your answer

Best regards
Andrea
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20140617/c128876b/attachment.html>