[adegenet-forum] Trouble reading data

Thu Feb 6 15:43:40 CET 2014

Hi there, 

you're close, but there's a non-trivial glitch with using "|" as a separator. As it is a special character, regular expressions used to process the file need it to be within "[]":

#### start R code
> library(adegenet)

## read the data table
> tab <- read.table("testdata.txt", header=TRUE)
> head(tab)
  geno  pop    M1    M2    M3    M4    M5    M6    M7    M8    M9   M10
1 Ind1 pop1 44|44 44|44 22|22 33|33 33|33 22|22 22|22 11|11 11|11  <NA>
2 Ind2 pop1 44|44 44|44 44|44 33|33 33|33 22|22 22|22 11|11 22|22 11|11
3 Ind3 pop2 44|44 44|44 44|44 33|33 11|11 44|44 44|44 33|33 11|11 33|33
4 Ind4 pop2 44|44 22|22 22|22 33|33 33|33 22|22 22|22 33|33 22|22 11|11
5 Ind5 pop2 44|44 44|44 22|22 44|44 33|33 22|22 44|44 11|11 22|22 11|11
6 Ind6 pop3 44|44 22|22 22|22  <NA> 33|33 22|22 22|22 11|11 11|11 11|11

## convert to genind
> x <- df2genind(tab[,-(1:2)], ind.names=tab$geno, pop=tab$pop, sep="[|]")

## check conversion by reverting back to table
> genind2df(x,sep="/")
       pop    M1    M2    M3    M4    M5    M6    M7    M8    M9   M10
Ind1  pop1 44/44 44/44 22/22 33/33 33/33 22/22 22/22 11/11 11/11  <NA>
Ind2  pop1 44/44 44/44 44/44 33/33 33/33 22/22 22/22 11/11 22/22 11/11
Ind3  pop2 44/44 44/44 44/44 33/33 11/11 44/44 44/44 33/33 11/11 33/33
Ind4  pop2 44/44 22/22 22/22 33/33 33/33 22/22 22/22 33/33 22/22 11/11
Ind5  pop2 44/44 44/44 22/22 44/44 33/33 22/22 44/44 11/11 22/22 11/11
Ind6  pop3 44/44 22/22 22/22  <NA> 33/33 22/22 22/22 11/11 11/11 11/11
Ind7  pop3 44/44 22/22 22/22 33/33 11/11 22/22 44/44 11/11 22/22 11/11
Ind8  pop3 44/44 44/44 44/44 44/44 33/33 22/22 22/22 11/11 11/11 33/33
Ind9  pop4 44/44 22/22 22/22 44/44 11/11 44/44 22/22 33/33 11/11 11/11
Ind10 pop4 44/44 44/44 44/44 44/44 33/33 22/22 22/22 11/11 11/11 33/33

#### end R code

And you can now run DAPC on your dataset "x", alongside any other analysis using genind objects as inputs.

Best
Thibaut

--
######################################
Dr Thibaut JOMBART
MRC Centre for Outbreak Analysis and Modelling
Department of Infectious Disease Epidemiology
Imperial College - School of Public Health
St Mary’s Campus
Norfolk Place
London W2 1PG
United Kingdom
Tel. : 0044 (0)20 7594 3658
t.jombart at imperial.ac.uk
http://sites.google.com/site/thibautjombart/
http://adegenet.r-forge.r-project.org/
________________________________________
From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Bulli, Peter [peter.bulli at wsu.edu]
Sent: 06 February 2014 03:18
To: adegenet-forum at lists.r-forge.r-project.org
Subject: Re: [adegenet-forum] Trouble reading data

Hello everybody,

This is my first post to the mailing list although I've spent some time scanning through posts on specific subjects/topics. However, I still found myself having a problem, and it has to do with reading my data into the R for "structure" DAPC analysis using "adegenet". I would greatly appreciate if anyone can help me out My data has 983 individuals labeled individuals in the first column, followed by population groups in the 2nd column, and columns of 548 SNP markers. I have attached a sample file of 10 individuals (1st column), the population groupings (2nd column)  and 10 SNP markers so you can have an idea of the data format I used.

For the 983 individuals and the 548 SNP markers, I used the following codes and got the following error messages when trying to read my data into R:

> setwd("c:\\myDAPC")

> data <- read.table("c:\\myDAPC\\data02052014.txt", header=TRUE)
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  :
  line 680 did not have 550 elements

> data <- read.table("c:\\myDAPC\\data02052014.txt", na.strings="NA", sep="|", header=TRUE)
Error in read.table("c:\\myDAPC\\data02052014.txt", na.strings = "NA",  :
  more columns than column names

> data <- read.table("c:\\myDAPC\\data02052014.txt", na.strings="NA", sep="|", rows=1, col.lab=1, col.pop=2, header=TRUE)
Error in read.table("c:\\myDAPC\\data02052014.txt", na.strings = "NA",  :
  unused arguments (rows = 1, col.lab = 1, col.pop = 2)
>

For the test sample data of 10 individuals and 10 SNP markers below is an error message and a code that seems to have worked:

> datadata <- read.table("c:\\myDAPC\\testdata.txt", na.strings="NA", sep="|", header=TRUE)
Error in read.table("c:\\myDAPC\\testdata.txt", na.strings = "NA", sep = "|",  :
  more columns than column names

> data <- read.table("c:\\myDAPC\\testdata.txt", header=TRUE)
> data
    geno  pop    M1    M2    M3    M4    M5    M6    M7    M8    M9   M10
1   Ind1 pop1 44|44 44|44 22|22 33|33 33|33 22|22 22|22 11|11 11|11  <NA>
2   Ind2 pop1 44|44 44|44 44|44 33|33 33|33 22|22 22|22 11|11 22|22 11|11
3   Ind3 pop2 44|44 44|44 44|44 33|33 11|11 44|44 44|44 33|33 11|11 33|33
4   Ind4 pop2 44|44 22|22 22|22 33|33 33|33 22|22 22|22 33|33 22|22 11|11
5   Ind5 pop2 44|44 44|44 22|22 44|44 33|33 22|22 44|44 11|11 22|22 11|11
6   Ind6 pop3 44|44 22|22 22|22  <NA> 33|33 22|22 22|22 11|11 11|11 11|11
7   Ind7 pop3 44|44 22|22 22|22 33|33 11|11 22|22 44|44 11|11 22|22 11|11
8   Ind8 pop3 44|44 44|44 44|44 44|44 33|33 22|22 22|22 11|11 11|11 33|33
9   Ind9 pop4 44|44 22|22 22|22 44|44 11|11 44|44 22|22 33|33 11|11 11|11
10 Ind10 pop4 44|44 44|44 44|44 44|44 33|33 22|22 22|22 11|11 11|11 33|33

The last part for the sample data seems to be working. But the same code doesn't work for when the data of the 983 individuals grouped into 6 populations, and genotyped with 548 SNP markers is used.

Any help that will enable me get started with the "DAPC" analyses for the input data of 983 individuals that are grouped into 6 populations, and genotyped with 548 SNP markers would be highly appreciated.

Thank you for your time and help.

Peter