[adegenet-forum] Discrepancy in NA counts
Biz Sheedy
biz.sheedy at gmail.com
Mon Nov 28 11:00:53 CET 2016
Thanks for looking into this.
Something that I did differently to the code you provided, was that I only
answered the prompts for the read.structure function. This meant I did not
use sep="\t" and the number of alleles was 62 instead of 72, which I think
should be comparable to the excel count. Following the code you provide, '
is.na' finds 23 NAs (instead of 20 NAs at 62 alleles and 16 zeroes in
excel).
Your explanation makes sense to me for the additional three NAs in
adegenet, but I still don't understand how in locus 1401_25 the data for
two individuals (C_KH1059 and M_KH1834) changed from being homozygous for
"3" to being "NA"?
I would really appreciate any further help on this.
Thanks again,
Elizabeth
On 28 November 2016 at 18:03, Roman Luštrik <roman.lustrik at biolitika.si>
wrote:
> Hi,
>
> I think the problem is that adegenet, for consistency, adds NAs to
> accommodate the extra alleles present for a particular locus. Take for
> example C_KH1238 (bottom row in the example pasted belo).
> In raw file, it has missing values for locus 1378_53, but this locus has
> three alleles, ergo 3 NAs and not 2. Can't go through all the NAs right
> now, but I think there's a pretty good chance this is what is causing the
> discrepancy between what you see in "excel" and in adegenet.
>
> 1369_41.11 1372_14.22 1372_14.24 1373_9.44 1373_9.24 1377_42.44 1377_42.24
> 1378_53.22 1378_53.24 1378_53.44 1379_10.33 1379_10.13 1382_37.33
> ...
> C_KH1238 0 1 0 1 0 1 0 *NA NA NA* 1 0 1 # notice 3 NAs for all available
> alleles for 1378_53, not just two (as expected for diploid)
>
>
> Here is the code I used to explore this:
>
> library(adegenet)
>
> xy <- read.table("Sub_batch_1.stru", header = TRUE, sep = "\t")
> xy <- xy[, c(-1, -2)]
> table(as.matrix(xy))
>
> # 0 1 2 3 4
> # 16 467 618 760 867
>
>
> xy <- read.structure("Sub_batch_1.stru", NA.char="0",
> n.ind = 44, n.loc = 31, onerowperind = FALSE,
> col.lab = 1, col.pop = 2, row.marknames = 1,
> sep = "\t", col.others = 0)
>
> xy <- tab(xy)
> xy[grepl("C_KH1238", rownames(xy)), grepl("1378_53", colnames(xy))]
>
> Cheers,
> Roman
>
> ----
> In god we trust, all others bring data.
>
> ------------------------------
> *From: *"Biz Sheedy" <biz.sheedy at gmail.com>
> *To: *"Roman Luštrik" <roman.lustrik at biolitika.si>
> *Sent: *Monday, November 28, 2016 9:11:39 AM
> *Subject: *Re: [adegenet-forum] Discrepancy in NA counts
>
> My apologies. First time posting to a forum so I am a little unsure of
> things. I have attached a subset of the data, which includes the locus that
> I saw had problems.
>
> In this case there are 31 loci with 16 zeroes counted (excel), and 20 NAs
> counted (adegenet). The additional NAs occur in locus 1401_25.
>
> Thanks so much,
> Elizabeth
>
> On 28 November 2016 at 16:31, Roman Luštrik <roman.lustrik at biolitika.si>
> wrote:
>
>> Hi,
>>
>> can you share a (subset) of the dataset? It's hard to pinpoint where
>> things might be going wrong without some data in hand.
>>
>> Cheers,
>> Roman
>>
>> ----
>> In god we trust, all others bring data.
>>
>> ------------------------------
>> *From: *"Biz Sheedy" <biz.sheedy at gmail.com>
>> *To: *adegenet-forum at lists.r-forge.r-project.org
>> *Sent: *Friday, November 25, 2016 10:44:16 AM
>> *Subject: *[adegenet-forum] Discrepancy in NA counts
>>
>> Dear All,
>>
>> I am trying to read SNP data from Stacks into adegenet. I have tried
>> read.structure and read.genepop but they both give (the same) NA counts
>> that are higher than expected. Using read.table on the structure-formatted
>> file (with "ind" and "pop" inserted into the first two columns of row one)
>> gave the expected number of missing data.
>>
>> I looked at a single population subset (both the original and the
>> converted data) in excel and found a locus where in the original data, all
>> nine individuals were "3", but in the converted data one individual was
>> "NA". The loci before and after this one both matched/were correct.
>>
>> I am not sure what I have missed for this to happen, my R skills are
>> beginner at best. Any help with reading the data in correctly would be
>> greatly appreciated!
>>
>> Thank you,
>> Elizabeth
>>
>>
>> R version 3.3.2
>> adegenet version 2.0.1
>>
>> Data: 44 individuals, diploid, 4279 loci.
>>
>> all<-read.structure("all_batch_1.stru", NA.char="0")
>>
>> Total cells in excel: 376552
>> After read.structure/genepop: 44*8558=376552
>>
>> 0s in excel: 3952
>> 0s after read.table; length(which(X==0)): 3952
>> NA after read.structure/genepop; sum(is.na(all$tab)): 4008
>> Difference: 56
>>
>> Subset Chichi
>> Total cells: 77022
>> After read.structure/genepop: 9*8558=77022
>>
>> 0s in excel: 742
>> NA after read.structure/genepop; sum(is.na(chi$tab)): 756
>> Difference: 14
>>
>>
>>
>> --
>> 4-1-1 Amakubo
>> Department of Botany
>> National Museum of Nature and Science
>> Tsukuba, Ibaraki 305-0005
>> Japan
>>
>> biz.sheedy at gmail.com
>>
>> _______________________________________________
>> adegenet-forum mailing list
>> adegenet-forum at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/
>> listinfo/adegenet-forum
>>
>
>
>
> --
> 4-1-1 Amakubo
> Department of Botany
> National Museum of Nature and Science
> Tsukuba, Ibaraki 305-0005
> Japan
>
> biz.sheedy at gmail.com
>
>
--
4-1-1 Amakubo
Department of Botany
National Museum of Nature and Science
Tsukuba, Ibaraki 305-0005
Japan
biz.sheedy at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20161128/6a9ab72e/attachment-0001.html>
More information about the adegenet-forum
mailing list