[adegenet-forum] Discrepancy in NA counts

Roman Luštrik roman.lustrik at biolitika.si
Mon Nov 28 10:03:39 CET 2016


Hi, 

I think the problem is that adegenet, for consistency, adds NAs to accommodate the extra alleles present for a particular locus. Take for example C_KH1238 (bottom row in the example pasted belo). 
In raw file, it has missing values for locus 1378_53, but this locus has three alleles, ergo 3 NAs and not 2. Can't go through all the NAs right now, but I think there's a pretty good chance this is what is causing the discrepancy between what you see in "excel" and in adegenet. 

1369_41.11 1372_14.22 1372_14.24 1373_9.44 1373_9.24 1377_42.44 1377_42.24 1378_53.22 1378_53.24 1378_53.44 1379_10.33 1379_10.13 1382_37.33 
... 
C_KH1238 0 1 0 1 0 1 0 NA NA NA 1 0 1 # notice 3 NAs for all available alleles for 1378_53, not just two (as expected for diploid) 


Here is the code I used to explore this: 

library(adegenet) 

xy <- read.table("Sub_batch_1.stru", header = TRUE, sep = "\t") 
xy <- xy[, c(-1, -2)] 
table(as.matrix(xy)) 

# 0 1 2 3 4 
# 16 467 618 760 867 


xy <- read.structure("Sub_batch_1.stru", NA.char="0", 
n.ind = 44, n.loc = 31, onerowperind = FALSE, 
col.lab = 1, col.pop = 2, row.marknames = 1, 
sep = "\t", col.others = 0) 

xy <- tab(xy) 
xy[grepl("C_KH1238", rownames(xy)), grepl("1378_53", colnames(xy))] 

Cheers, 
Roman 

---- 
In god we trust, all others bring data. 


From: "Biz Sheedy" <biz.sheedy at gmail.com> 
To: "Roman Luštrik" <roman.lustrik at biolitika.si> 
Sent: Monday, November 28, 2016 9:11:39 AM 
Subject: Re: [adegenet-forum] Discrepancy in NA counts 

My apologies. First time posting to a forum so I am a little unsure of things. I have attached a subset of the data, which includes the locus that I saw had problems. 

In this case there are 31 loci with 16 zeroes counted (excel), and 20 NAs counted (adegenet). The additional NAs occur in locus 1401_25. 

Thanks so much, 
Elizabeth 

On 28 November 2016 at 16:31, Roman Luštrik < roman.lustrik at biolitika.si > wrote: 



Hi, 

can you share a (subset) of the dataset? It's hard to pinpoint where things might be going wrong without some data in hand. 

Cheers, 
Roman 

---- 
In god we trust, all others bring data. 


From: "Biz Sheedy" < biz.sheedy at gmail.com > 
To: adegenet-forum at lists.r-forge.r-project.org 
Sent: Friday, November 25, 2016 10:44:16 AM 
Subject: [adegenet-forum] Discrepancy in NA counts 

Dear All, 

I am trying to read SNP data from Stacks into adegenet. I have tried read.structure and read.genepop but they both give (the same) NA counts that are higher than expected. Using read.table on the structure-formatted file (with "ind" and "pop" inserted into the first two columns of row one) gave the expected number of missing data. 

I looked at a single population subset (both the original and the converted data) in excel and found a locus where in the original data, all nine individuals were "3", but in the converted data one individual was "NA". The loci before and after this one both matched/were correct. 

I am not sure what I have missed for this to happen, my R skills are beginner at best. Any help with reading the data in correctly would be greatly appreciated! 

Thank you, 
Elizabeth 


R version 3.3.2 
adegenet version 2.0.1 

Data: 44 individuals, diploid, 4279 loci. 

all<-read.structure("all_batch_1.stru", NA.char="0") 

Total cells in excel: 376552 
After read.structure/genepop: 44*8558=376552 

0s in excel: 3952 
0s after read.table; length(which(X==0)): 3952 
NA after read.structure/genepop; sum( is.na (all$tab)): 4008 
Difference: 56 

Subset Chichi 
Total cells: 77022 
After read.structure/genepop: 9*8558=77022 

0s in excel: 742 
NA after read.structure/genepop; sum( is.na (chi$tab)): 756 
Difference: 14 



-- 
4-1-1 Amakubo 
Department of Botany 
National Museum of Nature and Science 
Tsukuba, Ibaraki 305-0005 
Japan 

biz.sheedy at gmail.com 

_______________________________________________ 
adegenet-forum mailing list 
adegenet-forum at lists.r-forge.r-project.org 
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum 






-- 
4-1-1 Amakubo 
Department of Botany 
National Museum of Nature and Science 
Tsukuba, Ibaraki 305-0005 
Japan 

biz.sheedy at gmail.com 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20161128/0724ee04/attachment.html>


More information about the adegenet-forum mailing list