[adegenet-forum] SNP alleles
Jombart, Thibaut
t.jombart at imperial.ac.uk
Tue Jun 17 15:24:07 CEST 2014
Me neither. But so far:
- instructions we can verify all behave normally
- we don't have reproducible code for the stated problem
If you can send a small subset of data and command line used to create *myData*, and the commands showing the problem for this dataset, then we can try and figure it out.
Best
Thibaut
________________________________________
From: Andrea Garavito [neagef at gmail.com]
Sent: 17 June 2014 14:15
To: Jombart, Thibaut
Subject: Re: [adegenet-forum] SNP alleles
Hi Thibaut,
my Data is a matrix of 162 individuals with 10806 biallelic SNPs coded as I already mentioned.
I've done the df2genind with the ploidy=as.integer(2) and ploidy=2 parameter and I get exactly the same result.
It doesn't seem to be an empty character problem. I really don't understand.
> alleles(test_genid)
$L01
1 2 3
"A" "G" "G"
$L02
1 2 3
"C" "T" "C"
$L03
1 2
"G" "C"
$L04
1 2 3
"A" "C" "A"
$L05
1 2
"G" "A"
$L06
1 2
"G" "C"
$L07
1 2 3 4
"C" "T" "C" "T"
$L08
1 2 3
"C" "C" "T"
$L09
1 2 3 4
"G" "T" "G" "T"
$L10
1 2 3
"C" "T" "T"
Thanks again
Andrea
2014-06-17 14:57 GMT+02:00 Jombart, Thibaut <t.jombart at imperial.ac.uk<mailto:t.jombart at imperial.ac.uk>>:
What is "myData"?
BTW it is safer to specify the ploidy when constructing a genind.
Try:
alleles(test_genid) # btw the name is 'genind' - genotype of individuals
to see if it is a problem of empty characters.
Cheers
Thibaut
________________________________________
From: Andrea Garavito [neagef at gmail.com<mailto:neagef at gmail.com>]
Sent: 17 June 2014 13:47
To: Jombart, Thibaut
Cc: Caitlin Collins; adegenet-forum at lists.r-forge.r-project.org<mailto:adegenet-forum at lists.r-forge.r-project.org>
Subject: Re: [adegenet-forum] SNP alleles
Hi Caitlin and Thibaut,
Thanks for your answers.
I did used the sep argument. My code to generate the genind object is :
>myData_genid <- df2genind(myData, sep="/")
The weird thing is that when I try the same code with a test object that I created:
>dat = data.frame(loc1=c("A/A","T/A","T/A","T/T","T/A","A/T"), loc2=c("C/G","G/C","C/C","G/G","C/G","G/C"))
>x=df2genind(dat, sep="/")
I get the two columns per loci (as Thibaut does):
>truenames(x)
loc1.A loc1.T loc2.C loc2.G
1 1.0 0.0 0.5 0.5
2 0.5 0.5 0.5 0.5
3 0.5 0.5 1.0 0.0
4 0.0 1.0 0.0 1.0
5 0.5 0.5 0.5 0.5
6 0.5 0.5 0.5 0.5
But when I test a subset of my data
>test<-myData[1:10,1:10]
>test
loc_29 loc_7 loc_43 etc...
1 "G / A" "C / T" "T / T"
2 "G / G" "C / T" "T/ T"
etc...
> test_genid <- df2genind(test,sep="/")
I get again three or four columns:
>truenames(test_genid)
loc_29.A loc_29.G loc_29.G loc_7.C loc_7.T loc_7.C loc_43.C loc_43.T loc_43.C loc_43.T etc..
1 0.5 0.0 0.5 0.0 0.5 0.5 0.0 0.5 0.0 0.5
2 0.0 0.5 0.5 0.0 0.5 0.5 0.0 0.5 0.0 0.5
etc...
When I carry my PCA analysis with all my data:
>X <- scaleGen(myData_genid, scale=F, missing="mean")
>pca_myData<-dudi.pca(X,center=F,scale=F)
I get the following message:
In data.row.names(row.names, rowsi, i) :
some row.names duplicated: 3,4,...
I really don't understand what is causing that, is there a hiden character in my data file that makes the df2genind divide my columns? Does that affect the results I get thereafter?
By the way, I tried the scale=F and scale=T in the scaleGen function but I get two radically different results. With scale=T my individuals get separated into only two groups; while with scale=F, individuals get more "harmoniously" distributed over the 2 axis. Which one would be more appropriate according to my data type? Because both seemed in agreement with the origin of individuals, I'm not sure which one represents the "real picture".
Thanks for your comments
Andrea
More information about the adegenet-forum
mailing list