[adegenet-forum] Discrepancy in NA counts

Biz Sheedy biz.sheedy at gmail.com
Wed Dec 7 02:00:27 CET 2016


Hi Thibaut and Roman,

Yes, the fix has solved the issue for me. Thanks so much to you both!

(Sorry for the delayed response, I couldn't get the devel version at work.)

Cheers,
Elizabeth


On 5 December 2016 at 21:10, Thibaut Jombart <thibautjombart at gmail.com>
wrote:

> Hello,
>
> Roman has fixed this bug in the current devel version of adegenet. See:
> https://github.com/thibautjombart/adegenet
>
> For guidelines on installing it. Can you confirm it solves your issue?
>
> Best
> Thibaut
>
> --
> Dr Thibaut Jombart
> Lecturer, Department of Infectious Disease Epidemiology, Imperial College
> London
> Head of RECON: repidemicsconsortium.org
> sites.google.com/site/thibautjombart/
> github.com/thibautjombart
> Twitter: @TeebzR
>
>
> On 28 November 2016 at 12:40, Roman Luštrik <roman.lustrik at biolitika.si>
> wrote:
> > Hi Elizabeth,
> >
> > it would appear there is something funky happening with the code due to
> > locus names being numeric. This has happened before in some other
> function.
> > Until we fix this, you can change your locus names so that they start
> with a
> > letter.
> >
> > Here is the excerpt from the genind object indicating that these two
> samples
> > have alleles 33:
> >
> >          X1401_25.13 X1401_25.33 X1403_13.11 X1403_13.13 X1403_13.33
> > X1404_17.13 X1404_17.33 X1404_17.11
> > C_KH1059 0 1 1 0 0 0 1 0
> > M_KH1834 0 1 1 0 0 1 0 0
> >
> >
> > Cheers,
> > Roman
> >
> >
> > ----
> > In god we trust, all others bring data.
> >
> > ________________________________
> > From: "Biz Sheedy" <biz.sheedy at gmail.com>
> > To: "Roman Luštrik" <roman.lustrik at biolitika.si>
> > Cc: adegenet-forum at lists.r-forge.r-project.org
> > Sent: Monday, November 28, 2016 11:00:53 AM
> >
> > Subject: Re: [adegenet-forum] Discrepancy in NA counts
> >
> > Thanks for looking into this.
> >
> > Something that I did differently to the code you provided, was that I
> only
> > answered the prompts for the read.structure function. This meant I did
> not
> > use sep="\t" and the number of alleles was 62 instead of 72, which I
> think
> > should be comparable to the excel count. Following the code you provide,
> > 'is.na' finds 23 NAs (instead of 20 NAs at 62 alleles and 16 zeroes in
> > excel).
> >
> > Your explanation makes sense to me for the additional three NAs in
> adegenet,
> > but I still don't understand how in locus 1401_25 the data for two
> > individuals (C_KH1059 and M_KH1834) changed from being homozygous for
> "3" to
> > being "NA"?
> >
> > I would really appreciate any further help on this.
> >
> > Thanks again,
> > Elizabeth
> >
> >
> > On 28 November 2016 at 18:03, Roman Luštrik <roman.lustrik at biolitika.si>
> > wrote:
> >>
> >> Hi,
> >>
> >> I think the problem is that adegenet, for consistency, adds NAs to
> >> accommodate the extra alleles present for a particular locus. Take for
> >> example C_KH1238 (bottom row in the example pasted belo).
> >> In raw file, it has missing values for locus 1378_53, but this locus has
> >> three alleles, ergo 3 NAs and not 2. Can't go through all the NAs right
> now,
> >> but I think there's a pretty good chance this is what is causing the
> >> discrepancy between what you see in "excel" and in adegenet.
> >>
> >> 1369_41.11 1372_14.22 1372_14.24 1373_9.44 1373_9.24 1377_42.44
> 1377_42.24
> >> 1378_53.22 1378_53.24 1378_53.44 1379_10.33 1379_10.13 1382_37.33
> >> ...
> >> C_KH1238 0 1 0 1 0 1 0 NA NA NA 1 0 1 # notice 3 NAs for all available
> >> alleles for 1378_53, not just two (as expected for diploid)
> >>
> >>
> >> Here is the code I used to explore this:
> >>
> >> library(adegenet)
> >>
> >> xy <- read.table("Sub_batch_1.stru", header = TRUE, sep = "\t")
> >> xy <- xy[, c(-1, -2)]
> >> table(as.matrix(xy))
> >>
> >> # 0 1 2 3 4
> >> # 16 467 618 760 867
> >>
> >>
> >> xy <- read.structure("Sub_batch_1.stru", NA.char="0",
> >> n.ind = 44, n.loc = 31, onerowperind = FALSE,
> >> col.lab = 1, col.pop = 2, row.marknames = 1,
> >> sep = "\t", col.others = 0)
> >>
> >> xy <- tab(xy)
> >> xy[grepl("C_KH1238", rownames(xy)), grepl("1378_53", colnames(xy))]
> >>
> >> Cheers,
> >> Roman
> >>
> >> ----
> >> In god we trust, all others bring data.
> >>
> >> ________________________________
> >> From: "Biz Sheedy" <biz.sheedy at gmail.com>
> >> To: "Roman Luštrik" <roman.lustrik at biolitika.si>
> >> Sent: Monday, November 28, 2016 9:11:39 AM
> >> Subject: Re: [adegenet-forum] Discrepancy in NA counts
> >>
> >> My apologies. First time posting to a forum so I am a little unsure of
> >> things. I have attached a subset of the data, which includes the locus
> that
> >> I saw had problems.
> >>
> >> In this case there are 31 loci with 16 zeroes counted (excel), and 20
> NAs
> >> counted (adegenet). The additional NAs occur in locus 1401_25.
> >>
> >> Thanks so much,
> >> Elizabeth
> >>
> >> On 28 November 2016 at 16:31, Roman Luštrik <roman.lustrik at biolitika.si
> >
> >> wrote:
> >>>
> >>> Hi,
> >>>
> >>> can you share a (subset) of the dataset? It's hard to pinpoint where
> >>> things might be going wrong without some data in hand.
> >>>
> >>> Cheers,
> >>> Roman
> >>>
> >>> ----
> >>> In god we trust, all others bring data.
> >>>
> >>> ________________________________
> >>> From: "Biz Sheedy" <biz.sheedy at gmail.com>
> >>> To: adegenet-forum at lists.r-forge.r-project.org
> >>> Sent: Friday, November 25, 2016 10:44:16 AM
> >>> Subject: [adegenet-forum] Discrepancy in NA counts
> >>>
> >>> Dear All,
> >>>
> >>> I am trying to read SNP data from Stacks into adegenet. I have tried
> >>> read.structure and read.genepop but they both give (the same) NA
> counts that
> >>> are higher than expected. Using read.table on the structure-formatted
> file
> >>> (with "ind" and "pop" inserted into the first two columns of row one)
> gave
> >>> the expected number of missing data.
> >>>
> >>> I looked at a single population subset (both the original and the
> >>> converted data) in excel and found a locus where in the original data,
> all
> >>> nine individuals were "3", but in the converted data one individual was
> >>> "NA". The loci before and after this one both matched/were correct.
> >>>
> >>> I am not sure what I have missed for this to happen, my R skills are
> >>> beginner at best. Any help with reading the data in correctly would be
> >>> greatly appreciated!
> >>>
> >>> Thank you,
> >>> Elizabeth
> >>>
> >>>
> >>> R version 3.3.2
> >>> adegenet version 2.0.1
> >>>
> >>> Data: 44 individuals, diploid, 4279 loci.
> >>>
> >>> all<-read.structure("all_batch_1.stru", NA.char="0")
> >>>
> >>> Total cells in excel: 376552
> >>> After read.structure/genepop: 44*8558=376552
> >>>
> >>> 0s in excel: 3952
> >>> 0s after read.table; length(which(X==0)): 3952
> >>> NA after read.structure/genepop; sum(is.na(all$tab)): 4008
> >>> Difference: 56
> >>>
> >>> Subset Chichi
> >>> Total cells: 77022
> >>> After read.structure/genepop: 9*8558=77022
> >>>
> >>> 0s in excel: 742
> >>> NA after read.structure/genepop; sum(is.na(chi$tab)): 756
> >>> Difference: 14
> >>>
> >>>
> >>>
> >>> --
> >>> 4-1-1 Amakubo
> >>> Department of Botany
> >>> National Museum of Nature and Science
> >>> Tsukuba, Ibaraki 305-0005
> >>> Japan
> >>>
> >>> biz.sheedy at gmail.com
> >>>
> >>> _______________________________________________
> >>> adegenet-forum mailing list
> >>> adegenet-forum at lists.r-forge.r-project.org
> >>>
> >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/
> listinfo/adegenet-forum
> >>
> >>
> >>
> >>
> >> --
> >> 4-1-1 Amakubo
> >> Department of Botany
> >> National Museum of Nature and Science
> >> Tsukuba, Ibaraki 305-0005
> >> Japan
> >>
> >> biz.sheedy at gmail.com
> >>
> >
> >
> >
> > --
> > 4-1-1 Amakubo
> > Department of Botany
> > National Museum of Nature and Science
> > Tsukuba, Ibaraki 305-0005
> > Japan
> >
> > biz.sheedy at gmail.com
> >
> >
> > _______________________________________________
> > adegenet-forum mailing list
> > adegenet-forum at lists.r-forge.r-project.org
> > https://lists.r-forge.r-project.org/cgi-bin/mailman/
> listinfo/adegenet-forum
>



-- 
Dr Elizabeth Sheedy
JSPS Postdoctoral fellow

4-1-1 Amakubo
Department of Botany
National Museum of Nature and Science
Tsukuba, Ibaraki 305-0005
Japan

biz.sheedy at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20161207/3ca4d81f/attachment-0001.html>


More information about the adegenet-forum mailing list