<div dir="ltr"><div><div><div><div><div>Thanks for looking into this.<br><br></div>Something that I did differently to the code you provided, was that I only answered the prompts for the read.structure function. This meant I did not use sep="\t" and the number of alleles was 62 instead of 72, which I think should be comparable to the excel count. Following the code you provide, '<a href="http://is.na">is.na</a>' finds 23 NAs (instead of 20 NAs at 62 alleles and 16 zeroes in excel). <br><br>Your explanation makes sense to me for the additional three NAs in adegenet, but I still don't understand how in locus 1401_25 the data for two 

individuals (C_KH1059 and M_KH1834) changed from being homozygous for "3" to

 being "NA"?<br><br></div></div>I would really appreciate any further help on this.<br><br></div>Thanks again,<br></div>Elizabeth<br><div><div><div><div><br></div></div></div></div><div class="gmail_extra"><br><div class="gmail_quote">On 28 November 2016 at 18:03, Roman Luštrik <span dir="ltr"><<a href="mailto:roman.lustrik@biolitika.si" target="_blank">roman.lustrik@biolitika.si</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div style="font-family:trebuchet ms,sans-serif;font-size:12pt;color:#000000"><div>Hi,</div><div><br></div><div>I think the problem is that adegenet, for consistency, adds NAs to accommodate the extra alleles present for a particular locus. Take for example C_KH1238 (bottom row in the example pasted belo).</div><div>In raw file, it has missing values for locus 1378_53, but this locus has three alleles, ergo 3 NAs and not 2. Can't go through all the NAs right now, but I think there's a pretty good chance this is what is causing the discrepancy between what you see in "excel" and in adegenet.</div><div><br></div><div><span style="font-family:"courier new",courier,monaco,monospace,sans-serif"> 1369_41.11 1372_14.22 1372_14.24 1373_9.44 1373_9.24 1377_42.44 1377_42.24 1378_53.22 1378_53.24 1378_53.44 1379_10.33 1379_10.13 1382_37.33</span><br>...<br><span style="font-family:"courier new",courier,monaco,monospace,sans-serif">C_KH1238 0 1 0 1 0 1 0 <b>NA NA NA</b> 1 0 1 # notice 3 NAs for all available alleles for 1378_53, not just two (as expected for diploid)</span></div><div><br></div><div><br></div><div>Here is the code I used to explore this:</div><div><br></div><div><span style="font-family:"courier new",courier,monaco,monospace,sans-serif">library(adegenet)</span><br><br><span style="font-family:"courier new",courier,monaco,monospace,sans-serif">xy <- read.table("Sub_batch_1.stru", header = TRUE, sep = "\t")</span><br><span style="font-family:"courier new",courier,monaco,monospace,sans-serif">xy <- xy[, c(-1, -2)]</span><br><span style="font-family:"courier new",courier,monaco,monospace,sans-serif">table(as.matrix(xy))</span><br><br><span style="font-family:"courier new",courier,monaco,monospace,sans-serif"># 0 1 2 3 4 </span><br><span style="font-family:"courier new",courier,monaco,monospace,sans-serif"># 16 467 618 760 867</span><br><br><br><span style="font-family:"courier new",courier,monaco,monospace,sans-serif">xy <- read.structure("Sub_batch_1.<wbr>stru", NA.char="0",</span><br><span style="font-family:"courier new",courier,monaco,monospace,sans-serif"> n.ind = 44, n.loc = 31, onerowperind = FALSE,</span><br><span style="font-family:"courier new",courier,monaco,monospace,sans-serif"> col.lab = 1, col.pop = 2, row.marknames = 1,</span><br><span style="font-family:"courier new",courier,monaco,monospace,sans-serif"> sep = "\t", col.others = 0)</span><br><br><span style="font-family:"courier new",courier,monaco,monospace,sans-serif">xy <- tab(xy)</span><br><span style="font-family:"courier new",courier,monaco,monospace,sans-serif">xy[grepl("C_KH1238", rownames(xy)), grepl("1378_53", colnames(xy))]</span><br></div><span class=""><div><br></div><div>Cheers,</div><div>Roman</div><div><br></div><div>----<br>In god we trust, all others bring data.</div><br><hr id="m_-2885502866782724293zwchr"></span><div><b>From: </b>"Biz Sheedy" <<a href="mailto:biz.sheedy@gmail.com" target="_blank">biz.sheedy@gmail.com</a>><br><b>To: </b>"Roman Luštrik" <<a href="mailto:roman.lustrik@biolitika.si" target="_blank">roman.lustrik@biolitika.si</a>><br><b>Sent: </b>Monday, November 28, 2016 9:11:39 AM<br><b>Subject: </b>Re: [adegenet-forum] Discrepancy in NA counts<br></div><div><div class="h5"><br><div><div dir="ltr"><div>My apologies. First time posting to a forum so I am a little unsure of things. I have attached a subset of the data, which includes the locus that I saw had problems. <br><br>In this case there are 31 loci with 16 zeroes counted (excel), and 20 NAs counted (adegenet). The additional NAs occur in locus 1401_25.<br><br></div><div>Thanks so much,<br></div><div>Elizabeth<br></div><div class="gmail_extra"><br><div class="gmail_quote">On 28 November 2016 at 16:31, Roman Luštrik <span dir="ltr"><<a href="mailto:roman.lustrik@biolitika.si" target="_blank">roman.lustrik@biolitika.si</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div style="font-family:trebuchet ms,sans-serif;font-size:12pt;color:#000000"><div>Hi,</div><br><div>can you share a (subset) of the dataset? It's hard to pinpoint where things might be going wrong without some data in hand.</div><br><div>Cheers,</div><div>Roman</div><br><div>----<br>In god we trust, all others bring data.</div><br><hr id="m_-2885502866782724293m_6274320645841491781zwchr"><div><b>From: </b>"Biz Sheedy" <<a href="mailto:biz.sheedy@gmail.com" target="_blank">biz.sheedy@gmail.com</a>><br><b>To: </b><a href="mailto:adegenet-forum@lists.r-forge.r-project.org" target="_blank">adegenet-forum@lists.r-forge.<wbr>r-project.org</a><br><b>Sent: </b>Friday, November 25, 2016 10:44:16 AM<br><b>Subject: </b>[adegenet-forum] Discrepancy in NA counts<br></div><br><div><div><div class="m_-2885502866782724293h5"><div dir="ltr"><div><div><div><div><div>Dear All,<br><br></div>I am trying to read SNP data from Stacks into adegenet. I have tried read.structure and read.genepop but they both give (the same) NA counts that are higher than expected. Using read.table on the structure-formatted file (with "ind" and "pop" inserted into the first two columns of row one) gave the expected number of missing data. <br><br>I looked at a single population subset (both the original and the converted data) in excel and found a locus where in the original data, all nine individuals were "3", but in the converted data one individual was "NA". The loci before and after this one both matched/were correct.<br><br>I am not sure what I have missed for this to happen, my R skills are beginner at best. Any help with reading the data in correctly would be greatly appreciated!<br></div><div><br>Thank you,<br></div><div>Elizabeth<br></div><div><br><br></div>R version 3.3.2<br>adegenet version 2.0.1<br><br></div><div>Data: 44 individuals, diploid, 4279 loci.<br></div><br>all<-read.structure("all_<wbr>batch_1.stru", NA.char="0")<br><br>Total cells in excel: 376552<br></div><div>After read.structure/genepop: 44*8558=376552<br></div><br><div>0s in excel: 3952<br></div><div>0s after read.table; length(which(X==0)): 3952<br></div><div>NA after read.structure/genepop; sum(<a href="http://is.na" target="_blank">is.na</a>(all$tab)): 4008<br></div><div>Difference: 56<br></div><br><div>Subset Chichi<br></div><div>Total cells: 77022<br></div><div>After read.structure/genepop: 9*8558=77022<br></div><br><div>0s in excel: 742<br></div>NA after read.structure/genepop; sum(<a href="http://is.na" target="_blank">is.na</a>(chi$tab)): 756<br></div>Difference: 14<br><div><div><div><span style="color:#009900;font-weight:bold"></span><br><br><br><span style="color:#009900;font-weight:bold"></span><div><div><div><div><div>-- <br><div class="m_-2885502866782724293m_6274320645841491781gmail_signature"><div dir="ltr"><div>4-1-1 Amakubo<br></div><div>Department of Botany<br></div><div>National Museum of Nature and Science<br></div><div>Tsukuba, Ibaraki 305-0005<br></div><div>Japan<br><br></div><div><a href="mailto:biz.sheedy@gmail.com" target="_blank">biz.sheedy@gmail.com</a><br></div></div></div>

</div></div></div></div></div></div></div></div></div>

<br></div></div>______________________________<wbr>_________________<br>adegenet-forum mailing list<br><a href="mailto:adegenet-forum@lists.r-forge.r-project.org" target="_blank">adegenet-forum@lists.r-forge.<wbr>r-project.org</a><br><a href="https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum" target="_blank">https://lists.r-forge.r-<wbr>project.org/cgi-bin/mailman/<wbr>listinfo/adegenet-forum</a><br></div></div></div></blockquote></div><br><br clear="all"><br>-- <br><div class="m_-2885502866782724293gmail_signature"><div dir="ltr"><div>4-1-1 Amakubo<br></div><div>Department of Botany<br></div><div>National Museum of Nature and Science<br></div><div>Tsukuba, Ibaraki 305-0005<br></div><div>Japan<br><br></div><div><a href="mailto:biz.sheedy@gmail.com" target="_blank">biz.sheedy@gmail.com</a><br></div></div></div>

</div></div><br></div></div></div></div></div></blockquote></div><br><br clear="all"><br>-- <br><div class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div></div><div>4-1-1 Amakubo<br></div><div>Department of Botany<br></div><div>National Museum of Nature and Science<br></div><div>Tsukuba, Ibaraki 305-0005<br></div><div>Japan<br><br></div><div><a href="mailto:biz.sheedy@gmail.com" target="_blank">biz.sheedy@gmail.com</a><br></div></div></div>

</div></div>