[adegenet-forum] Trouble reading data
Jombart, Thibaut
t.jombart at imperial.ac.uk
Thu Feb 6 15:43:40 CET 2014
Hi there,
you're close, but there's a non-trivial glitch with using "|" as a separator. As it is a special character, regular expressions used to process the file need it to be within "[]":
#### start R code
> library(adegenet)
## read the data table
> tab <- read.table("testdata.txt", header=TRUE)
> head(tab)
geno pop M1 M2 M3 M4 M5 M6 M7 M8 M9 M10
1 Ind1 pop1 44|44 44|44 22|22 33|33 33|33 22|22 22|22 11|11 11|11 <NA>
2 Ind2 pop1 44|44 44|44 44|44 33|33 33|33 22|22 22|22 11|11 22|22 11|11
3 Ind3 pop2 44|44 44|44 44|44 33|33 11|11 44|44 44|44 33|33 11|11 33|33
4 Ind4 pop2 44|44 22|22 22|22 33|33 33|33 22|22 22|22 33|33 22|22 11|11
5 Ind5 pop2 44|44 44|44 22|22 44|44 33|33 22|22 44|44 11|11 22|22 11|11
6 Ind6 pop3 44|44 22|22 22|22 <NA> 33|33 22|22 22|22 11|11 11|11 11|11
## convert to genind
> x <- df2genind(tab[,-(1:2)], ind.names=tab$geno, pop=tab$pop, sep="[|]")
## check conversion by reverting back to table
> genind2df(x,sep="/")
pop M1 M2 M3 M4 M5 M6 M7 M8 M9 M10
Ind1 pop1 44/44 44/44 22/22 33/33 33/33 22/22 22/22 11/11 11/11 <NA>
Ind2 pop1 44/44 44/44 44/44 33/33 33/33 22/22 22/22 11/11 22/22 11/11
Ind3 pop2 44/44 44/44 44/44 33/33 11/11 44/44 44/44 33/33 11/11 33/33
Ind4 pop2 44/44 22/22 22/22 33/33 33/33 22/22 22/22 33/33 22/22 11/11
Ind5 pop2 44/44 44/44 22/22 44/44 33/33 22/22 44/44 11/11 22/22 11/11
Ind6 pop3 44/44 22/22 22/22 <NA> 33/33 22/22 22/22 11/11 11/11 11/11
Ind7 pop3 44/44 22/22 22/22 33/33 11/11 22/22 44/44 11/11 22/22 11/11
Ind8 pop3 44/44 44/44 44/44 44/44 33/33 22/22 22/22 11/11 11/11 33/33
Ind9 pop4 44/44 22/22 22/22 44/44 11/11 44/44 22/22 33/33 11/11 11/11
Ind10 pop4 44/44 44/44 44/44 44/44 33/33 22/22 22/22 11/11 11/11 33/33
#### end R code
And you can now run DAPC on your dataset "x", alongside any other analysis using genind objects as inputs.
Best
Thibaut
--
######################################
Dr Thibaut JOMBART
MRC Centre for Outbreak Analysis and Modelling
Department of Infectious Disease Epidemiology
Imperial College - School of Public Health
St Mary’s Campus
Norfolk Place
London W2 1PG
United Kingdom
Tel. : 0044 (0)20 7594 3658
t.jombart at imperial.ac.uk
http://sites.google.com/site/thibautjombart/
http://adegenet.r-forge.r-project.org/
________________________________________
From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Bulli, Peter [peter.bulli at wsu.edu]
Sent: 06 February 2014 03:18
To: adegenet-forum at lists.r-forge.r-project.org
Subject: Re: [adegenet-forum] Trouble reading data
Hello everybody,
This is my first post to the mailing list although I've spent some time scanning through posts on specific subjects/topics. However, I still found myself having a problem, and it has to do with reading my data into the R for "structure" DAPC analysis using "adegenet". I would greatly appreciate if anyone can help me out My data has 983 individuals labeled individuals in the first column, followed by population groups in the 2nd column, and columns of 548 SNP markers. I have attached a sample file of 10 individuals (1st column), the population groupings (2nd column) and 10 SNP markers so you can have an idea of the data format I used.
For the 983 individuals and the 548 SNP markers, I used the following codes and got the following error messages when trying to read my data into R:
> setwd("c:\\myDAPC")
> data <- read.table("c:\\myDAPC\\data02052014.txt", header=TRUE)
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
line 680 did not have 550 elements
> data <- read.table("c:\\myDAPC\\data02052014.txt", na.strings="NA", sep="|", header=TRUE)
Error in read.table("c:\\myDAPC\\data02052014.txt", na.strings = "NA", :
more columns than column names
> data <- read.table("c:\\myDAPC\\data02052014.txt", na.strings="NA", sep="|", rows=1, col.lab=1, col.pop=2, header=TRUE)
Error in read.table("c:\\myDAPC\\data02052014.txt", na.strings = "NA", :
unused arguments (rows = 1, col.lab = 1, col.pop = 2)
>
For the test sample data of 10 individuals and 10 SNP markers below is an error message and a code that seems to have worked:
> datadata <- read.table("c:\\myDAPC\\testdata.txt", na.strings="NA", sep="|", header=TRUE)
Error in read.table("c:\\myDAPC\\testdata.txt", na.strings = "NA", sep = "|", :
more columns than column names
> data <- read.table("c:\\myDAPC\\testdata.txt", header=TRUE)
> data
geno pop M1 M2 M3 M4 M5 M6 M7 M8 M9 M10
1 Ind1 pop1 44|44 44|44 22|22 33|33 33|33 22|22 22|22 11|11 11|11 <NA>
2 Ind2 pop1 44|44 44|44 44|44 33|33 33|33 22|22 22|22 11|11 22|22 11|11
3 Ind3 pop2 44|44 44|44 44|44 33|33 11|11 44|44 44|44 33|33 11|11 33|33
4 Ind4 pop2 44|44 22|22 22|22 33|33 33|33 22|22 22|22 33|33 22|22 11|11
5 Ind5 pop2 44|44 44|44 22|22 44|44 33|33 22|22 44|44 11|11 22|22 11|11
6 Ind6 pop3 44|44 22|22 22|22 <NA> 33|33 22|22 22|22 11|11 11|11 11|11
7 Ind7 pop3 44|44 22|22 22|22 33|33 11|11 22|22 44|44 11|11 22|22 11|11
8 Ind8 pop3 44|44 44|44 44|44 44|44 33|33 22|22 22|22 11|11 11|11 33|33
9 Ind9 pop4 44|44 22|22 22|22 44|44 11|11 44|44 22|22 33|33 11|11 11|11
10 Ind10 pop4 44|44 44|44 44|44 44|44 33|33 22|22 22|22 11|11 11|11 33|33
The last part for the sample data seems to be working. But the same code doesn't work for when the data of the 983 individuals grouped into 6 populations, and genotyped with 548 SNP markers is used.
Any help that will enable me get started with the "DAPC" analyses for the input data of 983 individuals that are grouped into 6 populations, and genotyped with 548 SNP markers would be highly appreciated.
Thank you for your time and help.
Peter
More information about the adegenet-forum
mailing list