[adegenet-forum] SNP alleles

Jombart, Thibaut t.jombart at imperial.ac.uk
Tue Jun 17 13:59:04 CEST 2014


Hi there, 

yes, as Caitlin said, it probably is something wrong about the conversion. I get:

> dat=data.frame(mk1=c("G/A","G/G"), km2=c("C/T","C/T"))
> dat
  mk1 km2
1 G/A C/T
2 G/G C/T
> x=df2genind(dat,sep="/",ploidy=2)
> truenames(x)
  mk1.A mk1.G km2.C km2.T
1   0.5   0.5   0.5   0.5
2   0.0   1.0   0.5   0.5
> 


Cheers
Thibaut


________________________________________
From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Caitlin Collins [caitiecollins at gmail.com]
Sent: 17 June 2014 12:36
To: Andrea Garavito
Cc: adegenet-forum at lists.r-forge.r-project.org
Subject: Re: [adegenet-forum] SNP alleles

Hi Andrea,

I'm afraid that without seeing the exact code you used to generate the results you have presented, it is a bit difficult to say for certain what the origin of your problem is. So please forgive me if the following suggestion misses the mark. (If so, can I ask you to reply with the functions and arguments you used to generate that output?)

I notice you've stated that your original data file is formatted using a "/" separator. One way of getting the df2genind output format you are experiencing is by neglecting to inform the df2genind function that you are using that separator. If you have not done so already, try adding the argument sep="/" to the list of arguments taken by df2genind. Let me know if that does the trick. If not, please post back with the code you are using and we can go from there.

Best,
Caitlin.


On Tue, Jun 17, 2014 at 10:12 AM, Andrea Garavito <neagef at gmail.com<mailto:neagef at gmail.com>> wrote:
Hi everybody!

I'm currently trying to do a PCA analysis using a SNP matrix from a diploid organism, most of them are bi-allelic.
Although the results that I obtain are logic in terms of previous knowledge of the groups, I'm confused with the genind object that I obtain, and I want to be sure about what's going on with the analysis.
My data file is formatted using the nucleotides as alleles and a "/" separator, and missing data coded as "NA".
ind    mk1    mk2
ind1  G/A    C/T
ind2  G/G    C/T
After loading my data matrix with the df2genid function my data is stored as a matrix with for times the number of columns of the original file :

ind    mk1.A    mk1.G    mk1.A    mk1.G   mk2.C    mk2.T    mk2.C    mk2.T
ind1    0.5           0.0         0            0.5         0.0         0.5         0.5         0
ind2    0.0           0.5         0            0.5         0.0         0.5         0.5         0

Is that correct? I thought I would get two columns per marker loci instead of 4.
>From there I obtain doubled statistics for each one of the alleles. Since I don't know the phase, an A/G is the same as a G/A, so how can I have the unified stats for each allele?

Thank you for your answer

Best regards
Andrea

_______________________________________________
adegenet-forum mailing list
adegenet-forum at lists.r-forge.r-project.org<mailto:adegenet-forum at lists.r-forge.r-project.org>
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum



More information about the adegenet-forum mailing list