[adegenet-forum] SNP alleles

Jombart, Thibaut t.jombart at imperial.ac.uk
Tue Jun 17 14:57:33 CEST 2014


What is "myData"? 

BTW it is safer to specify the ploidy when constructing a genind.

Try:

alleles(test_genid) # btw the name is 'genind' - genotype of individuals

to see if it is a problem of empty characters.

Cheers
Thibaut
________________________________________
From: Andrea Garavito [neagef at gmail.com]
Sent: 17 June 2014 13:47
To: Jombart, Thibaut
Cc: Caitlin Collins; adegenet-forum at lists.r-forge.r-project.org
Subject: Re: [adegenet-forum] SNP alleles

Hi Caitlin and Thibaut,
Thanks for your answers.
I did used the sep argument. My code to generate the genind object is :

>myData_genid <- df2genind(myData, sep="/")

The weird thing is that when I try the same code with a test object that I created:

>dat = data.frame(loc1=c("A/A","T/A","T/A","T/T","T/A","A/T"), loc2=c("C/G","G/C","C/C","G/G","C/G","G/C"))
>x=df2genind(dat, sep="/")

I get the two columns per loci (as Thibaut does):

>truenames(x)
loc1.A loc1.T loc2.C loc2.G
1    1.0    0.0    0.5    0.5
2    0.5    0.5    0.5    0.5
3    0.5    0.5    1.0    0.0
4    0.0    1.0    0.0    1.0
5    0.5    0.5    0.5    0.5
6    0.5    0.5    0.5    0.5

But when I  test a subset of  my data

>test<-myData[1:10,1:10]
>test
    loc_29      loc_7       loc_43  etc...
1  "G / A"      "C / T"     "T / T"
2  "G / G"      "C / T"     "T/ T"
etc...

> test_genid <- df2genind(test,sep="/")

I get again three or four columns:

>truenames(test_genid)
    loc_29.A  loc_29.G  loc_29.G loc_7.C  loc_7.T  loc_7.C  loc_43.C  loc_43.T  loc_43.C  loc_43.T etc..
1     0.5           0.0            0.5          0.0          0.5         0.5          0.0           0.5          0.0            0.5
2     0.0           0.5            0.5          0.0          0.5         0.5          0.0            0.5         0.0            0.5
etc...

When I carry my PCA analysis with all my data:

>X <- scaleGen(myData_genid, scale=F, missing="mean")
>pca_myData<-dudi.pca(X,center=F,scale=F)

I get the following message:
In data.row.names(row.names, rowsi, i) :
  some row.names duplicated: 3,4,...

I really don't understand what is causing that, is there a hiden character in my data file that makes the df2genind divide my columns? Does that affect the results I get thereafter?

By the way, I tried the scale=F and scale=T in the scaleGen function  but I get two radically different results. With scale=T my individuals get separated into only two groups; while with scale=F, individuals get more "harmoniously" distributed over the 2 axis. Which one would be more appropriate according to my data type? Because both seemed in agreement with the origin of individuals, I'm not sure which one represents the "real picture".

Thanks for your comments
Andrea


More information about the adegenet-forum mailing list