[adegenet-forum] Discrepancy in NA counts

Mon Nov 28 13:40:24 CET 2016

Hi Elizabeth, 

it would appear there is something funky happening with the code due to locus names being numeric. This has happened before in some other function. Until we fix this, you can change your locus names so that they start with a letter. 

Here is the excerpt from the genind object indicating that these two samples have alleles 33: 

X1401_25.13 X1401_25.33 X1403_13.11 X1403_13.13 X1403_13.33 X1404_17.13 X1404_17.33 X1404_17.11 
C_KH1059 0 1 1 0 0 0 1 0 
M_KH1834 0 1 1 0 0 1 0 0 

Cheers, 
Roman 

---- 
In god we trust, all others bring data. 

From: "Biz Sheedy" <biz.sheedy at gmail.com> 
To: "Roman Luštrik" <roman.lustrik at biolitika.si> 
Cc: adegenet-forum at lists.r-forge.r-project.org 
Sent: Monday, November 28, 2016 11:00:53 AM 
Subject: Re: [adegenet-forum] Discrepancy in NA counts 

Thanks for looking into this. 

Something that I did differently to the code you provided, was that I only answered the prompts for the read.structure function. This meant I did not use sep="\t" and the number of alleles was 62 instead of 72, which I think should be comparable to the excel count. Following the code you provide, ' is.na ' finds 23 NAs (instead of 20 NAs at 62 alleles and 16 zeroes in excel). 

Your explanation makes sense to me for the additional three NAs in adegenet, but I still don't understand how in locus 1401_25 the data for two individuals (C_KH1059 and M_KH1834) changed from being homozygous for "3" to being "NA"? 

I would really appreciate any further help on this. 

Thanks again, 
Elizabeth 

On 28 November 2016 at 18:03, Roman Luštrik < roman.lustrik at biolitika.si > wrote: 

Hi, 

I think the problem is that adegenet, for consistency, adds NAs to accommodate the extra alleles present for a particular locus. Take for example C_KH1238 (bottom row in the example pasted belo). 
In raw file, it has missing values for locus 1378_53, but this locus has three alleles, ergo 3 NAs and not 2. Can't go through all the NAs right now, but I think there's a pretty good chance this is what is causing the discrepancy between what you see in "excel" and in adegenet. 

1369_41.11 1372_14.22 1372_14.24 1373_9.44 1373_9.24 1377_42.44 1377_42.24 1378_53.22 1378_53.24 1378_53.44 1379_10.33 1379_10.13 1382_37.33 
... 
C_KH1238 0 1 0 1 0 1 0 NA NA NA 1 0 1 # notice 3 NAs for all available alleles for 1378_53, not just two (as expected for diploid) 

Here is the code I used to explore this: 

library(adegenet) 

xy <- read.table("Sub_batch_1.stru", header = TRUE, sep = "\t") 
xy <- xy[, c(-1, -2)] 
table(as.matrix(xy)) 

# 0 1 2 3 4 
# 16 467 618 760 867 

xy <- read.structure("Sub_batch_1.stru", NA.char="0", 
n.ind = 44, n.loc = 31, onerowperind = FALSE, 
col.lab = 1, col.pop = 2, row.marknames = 1, 
sep = "\t", col.others = 0) 

xy <- tab(xy) 
xy[grepl("C_KH1238", rownames(xy)), grepl("1378_53", colnames(xy))] 

Cheers, 
Roman 

---- 
In god we trust, all others bring data. 

From: "Biz Sheedy" < biz.sheedy at gmail.com > 
To: "Roman Luštrik" < roman.lustrik at biolitika.si > 
Sent: Monday, November 28, 2016 9:11:39 AM 
Subject: Re: [adegenet-forum] Discrepancy in NA counts 

My apologies. First time posting to a forum so I am a little unsure of things. I have attached a subset of the data, which includes the locus that I saw had problems. 

In this case there are 31 loci with 16 zeroes counted (excel), and 20 NAs counted (adegenet). The additional NAs occur in locus 1401_25. 

Thanks so much, 
Elizabeth 

On 28 November 2016 at 16:31, Roman Luštrik < roman.lustrik at biolitika.si > wrote: 

BQ_BEGIN

Hi, 

can you share a (subset) of the dataset? It's hard to pinpoint where things might be going wrong without some data in hand. 

Cheers, 
Roman 

---- 
In god we trust, all others bring data. 

From: "Biz Sheedy" < biz.sheedy at gmail.com > 
To: adegenet-forum at lists.r-forge.r-project.org 
Sent: Friday, November 25, 2016 10:44:16 AM 
Subject: [adegenet-forum] Discrepancy in NA counts 

Dear All, 

I am trying to read SNP data from Stacks into adegenet. I have tried read.structure and read.genepop but they both give (the same) NA counts that are higher than expected. Using read.table on the structure-formatted file (with "ind" and "pop" inserted into the first two columns of row one) gave the expected number of missing data. 

I looked at a single population subset (both the original and the converted data) in excel and found a locus where in the original data, all nine individuals were "3", but in the converted data one individual was "NA". The loci before and after this one both matched/were correct. 

I am not sure what I have missed for this to happen, my R skills are beginner at best. Any help with reading the data in correctly would be greatly appreciated! 

Thank you, 
Elizabeth 

R version 3.3.2 
adegenet version 2.0.1 

Data: 44 individuals, diploid, 4279 loci. 

all<-read.structure("all_batch_1.stru", NA.char="0") 

Total cells in excel: 376552 
After read.structure/genepop: 44*8558=376552 

0s in excel: 3952 
0s after read.table; length(which(X==0)): 3952 
NA after read.structure/genepop; sum( is.na (all$tab)): 4008 
Difference: 56 

Subset Chichi 
Total cells: 77022 
After read.structure/genepop: 9*8558=77022 

0s in excel: 742 
NA after read.structure/genepop; sum( is.na (chi$tab)): 756 
Difference: 14 

-- 
4-1-1 Amakubo 
Department of Botany 
National Museum of Nature and Science 
Tsukuba, Ibaraki 305-0005 
Japan 

biz.sheedy at gmail.com 

_______________________________________________ 
adegenet-forum mailing list 
adegenet-forum at lists.r-forge.r-project.org 
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum 

-- 
4-1-1 Amakubo 
Department of Botany 
National Museum of Nature and Science 
Tsukuba, Ibaraki 305-0005 
Japan 

biz.sheedy at gmail.com 

BQ_END

-- 
4-1-1 Amakubo 
Department of Botany 
National Museum of Nature and Science 
Tsukuba, Ibaraki 305-0005 
Japan 

biz.sheedy at gmail.com 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20161128/38302c86/attachment.html>