[adegenet-forum] Trouble reading data

Thu Feb 6 04:18:57 CET 2014

Hello everybody,

This is my first post to the mailing list although I've spent some time scanning through posts on specific subjects/topics. However, I still found myself having a problem, and it has to do with reading my data into the R for "structure" DAPC analysis using "adegenet". I would greatly appreciate if anyone can help me out My data has 983 individuals labeled individuals in the first column, followed by population groups in the 2nd column, and columns of 548 SNP markers. I have attached a sample file of 10 individuals (1st column), the population groupings (2nd column)  and 10 SNP markers so you can have an idea of the data format I used.

For the 983 individuals and the 548 SNP markers, I used the following codes and got the following error messages when trying to read my data into R:

> setwd("c:\\myDAPC")

> data <- read.table("c:\\myDAPC\\data02052014.txt", header=TRUE)
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  :
  line 680 did not have 550 elements

> data <- read.table("c:\\myDAPC\\data02052014.txt", na.strings="NA", sep="|", header=TRUE)
Error in read.table("c:\\myDAPC\\data02052014.txt", na.strings = "NA",  :
  more columns than column names

> data <- read.table("c:\\myDAPC\\data02052014.txt", na.strings="NA", sep="|", rows=1, col.lab=1, col.pop=2, header=TRUE)
Error in read.table("c:\\myDAPC\\data02052014.txt", na.strings = "NA",  :
  unused arguments (rows = 1, col.lab = 1, col.pop = 2)
>

For the test sample data of 10 individuals and 10 SNP markers below is an error message and a code that seems to have worked:

> datadata <- read.table("c:\\myDAPC\\testdata.txt", na.strings="NA", sep="|", header=TRUE)
Error in read.table("c:\\myDAPC\\testdata.txt", na.strings = "NA", sep = "|",  :
  more columns than column names

> data <- read.table("c:\\myDAPC\\testdata.txt", header=TRUE)
> data
    geno  pop    M1    M2    M3    M4    M5    M6    M7    M8    M9   M10
1   Ind1 pop1 44|44 44|44 22|22 33|33 33|33 22|22 22|22 11|11 11|11  <NA>
2   Ind2 pop1 44|44 44|44 44|44 33|33 33|33 22|22 22|22 11|11 22|22 11|11
3   Ind3 pop2 44|44 44|44 44|44 33|33 11|11 44|44 44|44 33|33 11|11 33|33
4   Ind4 pop2 44|44 22|22 22|22 33|33 33|33 22|22 22|22 33|33 22|22 11|11
5   Ind5 pop2 44|44 44|44 22|22 44|44 33|33 22|22 44|44 11|11 22|22 11|11
6   Ind6 pop3 44|44 22|22 22|22  <NA> 33|33 22|22 22|22 11|11 11|11 11|11
7   Ind7 pop3 44|44 22|22 22|22 33|33 11|11 22|22 44|44 11|11 22|22 11|11
8   Ind8 pop3 44|44 44|44 44|44 44|44 33|33 22|22 22|22 11|11 11|11 33|33
9   Ind9 pop4 44|44 22|22 22|22 44|44 11|11 44|44 22|22 33|33 11|11 11|11
10 Ind10 pop4 44|44 44|44 44|44 44|44 33|33 22|22 22|22 11|11 11|11 33|33

The last part for the sample data seems to be working. But the same code doesn't work for when the data of the 983 individuals grouped into 6 populations, and genotyped with 548 SNP markers is used.

Any help that will enable me get started with the "DAPC" analyses for the input data of 983 individuals that are grouped into 6 populations, and genotyped with 548 SNP markers would be highly appreciated.

Thank you for your time and help.

Peter

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20140206/4f3e9c3a/attachment.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: testdata.txt
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20140206/4f3e9c3a/attachment.txt>