[adegenet-forum] Discrepancy in NA counts
Roman Luštrik
roman.lustrik at biolitika.si
Mon Nov 28 10:03:39 CET 2016
Hi,
I think the problem is that adegenet, for consistency, adds NAs to accommodate the extra alleles present for a particular locus. Take for example C_KH1238 (bottom row in the example pasted belo).
In raw file, it has missing values for locus 1378_53, but this locus has three alleles, ergo 3 NAs and not 2. Can't go through all the NAs right now, but I think there's a pretty good chance this is what is causing the discrepancy between what you see in "excel" and in adegenet.
1369_41.11 1372_14.22 1372_14.24 1373_9.44 1373_9.24 1377_42.44 1377_42.24 1378_53.22 1378_53.24 1378_53.44 1379_10.33 1379_10.13 1382_37.33
...
C_KH1238 0 1 0 1 0 1 0 NA NA NA 1 0 1 # notice 3 NAs for all available alleles for 1378_53, not just two (as expected for diploid)
Here is the code I used to explore this:
library(adegenet)
xy <- read.table("Sub_batch_1.stru", header = TRUE, sep = "\t")
xy <- xy[, c(-1, -2)]
table(as.matrix(xy))
# 0 1 2 3 4
# 16 467 618 760 867
xy <- read.structure("Sub_batch_1.stru", NA.char="0",
n.ind = 44, n.loc = 31, onerowperind = FALSE,
col.lab = 1, col.pop = 2, row.marknames = 1,
sep = "\t", col.others = 0)
xy <- tab(xy)
xy[grepl("C_KH1238", rownames(xy)), grepl("1378_53", colnames(xy))]
Cheers,
Roman
----
In god we trust, all others bring data.
From: "Biz Sheedy" <biz.sheedy at gmail.com>
To: "Roman Luštrik" <roman.lustrik at biolitika.si>
Sent: Monday, November 28, 2016 9:11:39 AM
Subject: Re: [adegenet-forum] Discrepancy in NA counts
My apologies. First time posting to a forum so I am a little unsure of things. I have attached a subset of the data, which includes the locus that I saw had problems.
In this case there are 31 loci with 16 zeroes counted (excel), and 20 NAs counted (adegenet). The additional NAs occur in locus 1401_25.
Thanks so much,
Elizabeth
On 28 November 2016 at 16:31, Roman Luštrik < roman.lustrik at biolitika.si > wrote:
Hi,
can you share a (subset) of the dataset? It's hard to pinpoint where things might be going wrong without some data in hand.
Cheers,
Roman
----
In god we trust, all others bring data.
From: "Biz Sheedy" < biz.sheedy at gmail.com >
To: adegenet-forum at lists.r-forge.r-project.org
Sent: Friday, November 25, 2016 10:44:16 AM
Subject: [adegenet-forum] Discrepancy in NA counts
Dear All,
I am trying to read SNP data from Stacks into adegenet. I have tried read.structure and read.genepop but they both give (the same) NA counts that are higher than expected. Using read.table on the structure-formatted file (with "ind" and "pop" inserted into the first two columns of row one) gave the expected number of missing data.
I looked at a single population subset (both the original and the converted data) in excel and found a locus where in the original data, all nine individuals were "3", but in the converted data one individual was "NA". The loci before and after this one both matched/were correct.
I am not sure what I have missed for this to happen, my R skills are beginner at best. Any help with reading the data in correctly would be greatly appreciated!
Thank you,
Elizabeth
R version 3.3.2
adegenet version 2.0.1
Data: 44 individuals, diploid, 4279 loci.
all<-read.structure("all_batch_1.stru", NA.char="0")
Total cells in excel: 376552
After read.structure/genepop: 44*8558=376552
0s in excel: 3952
0s after read.table; length(which(X==0)): 3952
NA after read.structure/genepop; sum( is.na (all$tab)): 4008
Difference: 56
Subset Chichi
Total cells: 77022
After read.structure/genepop: 9*8558=77022
0s in excel: 742
NA after read.structure/genepop; sum( is.na (chi$tab)): 756
Difference: 14
--
4-1-1 Amakubo
Department of Botany
National Museum of Nature and Science
Tsukuba, Ibaraki 305-0005
Japan
biz.sheedy at gmail.com
_______________________________________________
adegenet-forum mailing list
adegenet-forum at lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum
--
4-1-1 Amakubo
Department of Botany
National Museum of Nature and Science
Tsukuba, Ibaraki 305-0005
Japan
biz.sheedy at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20161128/0724ee04/attachment.html>
More information about the adegenet-forum
mailing list