[adegenet-forum] SNP alleles
Jombart, Thibaut
t.jombart at imperial.ac.uk
Tue Jun 17 14:57:33 CEST 2014
What is "myData"?
BTW it is safer to specify the ploidy when constructing a genind.
Try:
alleles(test_genid) # btw the name is 'genind' - genotype of individuals
to see if it is a problem of empty characters.
Cheers
Thibaut
________________________________________
From: Andrea Garavito [neagef at gmail.com]
Sent: 17 June 2014 13:47
To: Jombart, Thibaut
Cc: Caitlin Collins; adegenet-forum at lists.r-forge.r-project.org
Subject: Re: [adegenet-forum] SNP alleles
Hi Caitlin and Thibaut,
Thanks for your answers.
I did used the sep argument. My code to generate the genind object is :
>myData_genid <- df2genind(myData, sep="/")
The weird thing is that when I try the same code with a test object that I created:
>dat = data.frame(loc1=c("A/A","T/A","T/A","T/T","T/A","A/T"), loc2=c("C/G","G/C","C/C","G/G","C/G","G/C"))
>x=df2genind(dat, sep="/")
I get the two columns per loci (as Thibaut does):
>truenames(x)
loc1.A loc1.T loc2.C loc2.G
1 1.0 0.0 0.5 0.5
2 0.5 0.5 0.5 0.5
3 0.5 0.5 1.0 0.0
4 0.0 1.0 0.0 1.0
5 0.5 0.5 0.5 0.5
6 0.5 0.5 0.5 0.5
But when I test a subset of my data
>test<-myData[1:10,1:10]
>test
loc_29 loc_7 loc_43 etc...
1 "G / A" "C / T" "T / T"
2 "G / G" "C / T" "T/ T"
etc...
> test_genid <- df2genind(test,sep="/")
I get again three or four columns:
>truenames(test_genid)
loc_29.A loc_29.G loc_29.G loc_7.C loc_7.T loc_7.C loc_43.C loc_43.T loc_43.C loc_43.T etc..
1 0.5 0.0 0.5 0.0 0.5 0.5 0.0 0.5 0.0 0.5
2 0.0 0.5 0.5 0.0 0.5 0.5 0.0 0.5 0.0 0.5
etc...
When I carry my PCA analysis with all my data:
>X <- scaleGen(myData_genid, scale=F, missing="mean")
>pca_myData<-dudi.pca(X,center=F,scale=F)
I get the following message:
In data.row.names(row.names, rowsi, i) :
some row.names duplicated: 3,4,...
I really don't understand what is causing that, is there a hiden character in my data file that makes the df2genind divide my columns? Does that affect the results I get thereafter?
By the way, I tried the scale=F and scale=T in the scaleGen function but I get two radically different results. With scale=T my individuals get separated into only two groups; while with scale=F, individuals get more "harmoniously" distributed over the 2 axis. Which one would be more appropriate according to my data type? Because both seemed in agreement with the origin of individuals, I'm not sure which one represents the "real picture".
Thanks for your comments
Andrea
More information about the adegenet-forum
mailing list