<div dir="ltr"><div>Hi everybody!<br><br>I'm currently trying to do a PCA analysis using a SNP matrix from a diploid organism, most of them are bi-allelic.<br></div><div>Although the results that I obtain are logic in terms of previous knowledge of the groups, I'm confused with the genind object that I obtain, and I want to be sure about what's going on with the analysis. <br>
</div><div>My data file is formatted using the nucleotides as alleles and a "/" separator, and missing data coded as "NA".<br></div>ind mk1 mk2 <br>ind1 G/A C/T <br>ind2 G/G C/T <div>
After loading my data matrix with the df2genid function my data is stored as a matrix with for times the number of columns of the original file :<br><br>ind mk1.A mk1.G mk1.A mk1.G mk2.C mk2.T mk2.C mk2.T<br>
ind1 0.5 0.0 0 0.5 0.0 0.5 0.5 0<br></div><div>ind2 0.0 0.5 0 0.5 0.0 0.5 0.5 0<br><br></div>
<div>Is that correct? I thought I would get two columns per marker loci instead of 4.<br>From there I obtain doubled statistics for each one of the alleles. Since I don't know the phase, an A/G is the same as a G/A, so how can I have the unified stats for each allele? <br>
<br></div><div>Thank you for your answer<br><br></div><div>Best regards<br></div><div>Andrea<br></div></div>