[adegenet-forum] SNP alleles

Andrea Garavito neagef at gmail.com
Tue Jun 17 14:47:13 CEST 2014


Hi Caitlin and Thibaut,
Thanks for your answers.
I did used the sep argument. My code to generate the genind object is :

>myData_genid <- df2genind(myData, sep="/")

The weird thing is that when I try the same code with a test object that I
created:

>dat = data.frame(loc1=c("A/A","T/A","T/A","T/T","T/A","A/T"),
loc2=c("C/G","G/C","C/C","G/G","C/G","G/C"))
>x=df2genind(dat, sep="/")

I get the two columns per loci (as Thibaut does):

>truenames(x)
loc1.A loc1.T loc2.C loc2.G
1    1.0    0.0    0.5    0.5
2    0.5    0.5    0.5    0.5
3    0.5    0.5    1.0    0.0
4    0.0    1.0    0.0    1.0
5    0.5    0.5    0.5    0.5
6    0.5    0.5    0.5    0.5

But when I  test a subset of  my data

>test<-myData[1:10,1:10]
>test
    loc_29      loc_7       loc_43  etc...
1  "G / A"      "C / T"     "T / T"
2  "G / G"      "C / T"     "T/ T"
etc...

> test_genid <- df2genind(test,sep="/")

I get again three or four columns:

>truenames(test_genid)
    loc_29.A  loc_29.G  loc_29.G loc_7.C  loc_7.T  loc_7.C  loc_43.C
loc_43.T  loc_43.C  loc_43.T etc..
1     0.5           0.0            0.5          0.0          0.5
0.5          0.0           0.5          0.0            0.5
2     0.0           0.5            0.5          0.0          0.5
0.5          0.0            0.5         0.0            0.5
etc...

When I carry my PCA analysis with all my data:

>X <- scaleGen(myData_genid, scale=F, missing="mean")
>pca_myData<-dudi.pca(X,center=F,scale=F)

I get the following message:
In data.row.names(row.names, rowsi, i) :
  some row.names duplicated: 3,4,...

I really don't understand what is causing that, is there a hiden character
in my data file that makes the df2genind divide my columns? Does that
affect the results I get thereafter?

By the way, I tried the scale=F and scale=T in the scaleGen function  but I
get two radically different results. With scale=T my individuals get
separated into only two groups; while with scale=F, individuals get more
"harmoniously" distributed over the 2 axis. Which one would be more
appropriate according to my data type? Because both seemed in agreement
with the origin of individuals, I'm not sure which one represents the "real
picture".

Thanks for your comments
Andrea
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20140617/e9d4f5a6/attachment.html>


More information about the adegenet-forum mailing list