<div dir="ltr"><div>Hi everybody!<br><br>I'm currently trying to do a PCA analysis using a SNP matrix from a diploid organism, most of them are bi-allelic.<br></div><div>Although the results that I obtain are logic in terms of previous knowledge of the groups, I'm confused with the genind object that I obtain, and I want to be sure about what's going on with the analysis.  <br>


</div><div>My data file is formatted using the nucleotides as alleles and a "/" separator, and missing data coded as "NA".<br></div>ind    mk1    mk2     <br>ind1  G/A    C/T       <br>ind2  G/G    C/T        <div>


After loading my data matrix with the df2genid function my data is stored as a matrix with for times the number of columns of the original file :<br><br>ind    mk1.A    mk1.G    mk1.A    mk1.G   mk2.C    mk2.T    mk2.C    mk2.T<br>


ind1    0.5           0.0         0            0.5         0.0         0.5         0.5         0<br></div><div>ind2    0.0           0.5         0            0.5         0.0         0.5         0.5         0<br><br></div>


<div>Is that correct? I thought I would get two columns per marker loci instead of 4.<br>From there I obtain doubled statistics for each one of the alleles. Since I don't know the phase, an A/G is the same as a G/A, so how can I have the unified stats for each allele? <br>


<br></div><div>Thank you for your answer<br><br></div><div>Best regards<br></div><div>Andrea<br></div></div>