[adegenet-forum] SNP alleles

Tue Jun 17 15:24:07 CEST 2014

Me neither. But so far: 

- instructions we can verify all behave normally

- we don't have reproducible code for the stated problem

If you can send a small subset of data and command line used to create *myData*, and the commands showing the problem for this dataset, then we can try and figure it out. 

Best

Thibaut

________________________________________
From: Andrea Garavito [neagef at gmail.com]
Sent: 17 June 2014 14:15
To: Jombart, Thibaut
Subject: Re: [adegenet-forum] SNP alleles

Hi Thibaut,
my Data is a matrix of 162 individuals with 10806 biallelic SNPs coded as I already mentioned.

I've done the df2genind with the ploidy=as.integer(2) and ploidy=2 parameter and I get exactly the same result.

It doesn't seem to be an empty character problem. I really don't understand.
> alleles(test_genid)
$L01
  1   2   3
"A" "G" "G"
$L02
  1   2   3
"C" "T" "C"
$L03
  1   2
"G" "C"
$L04
  1   2   3
"A" "C" "A"
$L05
  1   2
"G" "A"
$L06
  1   2
"G" "C"
$L07
  1   2   3   4
"C" "T" "C" "T"
$L08
  1   2   3
"C" "C" "T"
$L09
  1   2   3   4
"G" "T" "G" "T"
$L10
  1   2   3
"C" "T" "T"

Thanks again
Andrea

2014-06-17 14:57 GMT+02:00 Jombart, Thibaut <t.jombart at imperial.ac.uk<mailto:t.jombart at imperial.ac.uk>>:

What is "myData"?

BTW it is safer to specify the ploidy when constructing a genind.

Try:

alleles(test_genid) # btw the name is 'genind' - genotype of individuals

to see if it is a problem of empty characters.

Cheers
Thibaut
________________________________________
From: Andrea Garavito [neagef at gmail.com<mailto:neagef at gmail.com>]
Sent: 17 June 2014 13:47
To: Jombart, Thibaut
Cc: Caitlin Collins; adegenet-forum at lists.r-forge.r-project.org<mailto:adegenet-forum at lists.r-forge.r-project.org>
Subject: Re: [adegenet-forum] SNP alleles

Hi Caitlin and Thibaut,
Thanks for your answers.
I did used the sep argument. My code to generate the genind object is :

>myData_genid <- df2genind(myData, sep="/")

The weird thing is that when I try the same code with a test object that I created:

>dat = data.frame(loc1=c("A/A","T/A","T/A","T/T","T/A","A/T"), loc2=c("C/G","G/C","C/C","G/G","C/G","G/C"))
>x=df2genind(dat, sep="/")

I get the two columns per loci (as Thibaut does):

>truenames(x)
loc1.A loc1.T loc2.C loc2.G
1    1.0    0.0    0.5    0.5
2    0.5    0.5    0.5    0.5
3    0.5    0.5    1.0    0.0
4    0.0    1.0    0.0    1.0
5    0.5    0.5    0.5    0.5
6    0.5    0.5    0.5    0.5

But when I  test a subset of  my data

>test<-myData[1:10,1:10]
>test
    loc_29      loc_7       loc_43  etc...
1  "G / A"      "C / T"     "T / T"
2  "G / G"      "C / T"     "T/ T"
etc...

> test_genid <- df2genind(test,sep="/")

I get again three or four columns:

>truenames(test_genid)
    loc_29.A  loc_29.G  loc_29.G loc_7.C  loc_7.T  loc_7.C  loc_43.C  loc_43.T  loc_43.C  loc_43.T etc..
1     0.5           0.0            0.5          0.0          0.5         0.5          0.0           0.5          0.0            0.5
2     0.0           0.5            0.5          0.0          0.5         0.5          0.0            0.5         0.0            0.5
etc...

When I carry my PCA analysis with all my data:

>X <- scaleGen(myData_genid, scale=F, missing="mean")
>pca_myData<-dudi.pca(X,center=F,scale=F)

I get the following message:
In data.row.names(row.names, rowsi, i) :
  some row.names duplicated: 3,4,...

I really don't understand what is causing that, is there a hiden character in my data file that makes the df2genind divide my columns? Does that affect the results I get thereafter?

By the way, I tried the scale=F and scale=T in the scaleGen function  but I get two radically different results. With scale=T my individuals get separated into only two groups; while with scale=F, individuals get more "harmoniously" distributed over the 2 axis. Which one would be more appropriate according to my data type? Because both seemed in agreement with the origin of individuals, I'm not sure which one represents the "real picture".

Thanks for your comments
Andrea