[adegenet-forum] Discrepancy in NA counts

Thibaut Jombart thibautjombart at gmail.com
Wed Dec 7 11:50:35 CET 2016


Awesome. Thanks for the report, and many thanks to Roman for solving the
issue!

Best

Thibaut


--
Dr Thibaut Jombart
Lecturer, Department of Infectious Disease Epidemiology, Imperial College
London
Head of RECON: repidemicsconsortium.org
sites.google.com/site/thibautjombart/
github.com/thibautjombart
Twitter: @TeebzR <http://twitter.com/TeebzR>

On 7 December 2016 at 01:00, Biz Sheedy <biz.sheedy at gmail.com> wrote:

> Hi Thibaut and Roman,
>
> Yes, the fix has solved the issue for me. Thanks so much to you both!
>
> (Sorry for the delayed response, I couldn't get the devel version at work.)
>
> Cheers,
> Elizabeth
>
>
> On 5 December 2016 at 21:10, Thibaut Jombart <thibautjombart at gmail.com>
> wrote:
>
>> Hello,
>>
>> Roman has fixed this bug in the current devel version of adegenet. See:
>> https://github.com/thibautjombart/adegenet
>>
>> For guidelines on installing it. Can you confirm it solves your issue?
>>
>> Best
>> Thibaut
>>
>> --
>> Dr Thibaut Jombart
>> Lecturer, Department of Infectious Disease Epidemiology, Imperial College
>> London
>> Head of RECON: repidemicsconsortium.org
>> sites.google.com/site/thibautjombart/
>> github.com/thibautjombart
>> Twitter: @TeebzR
>>
>>
>> On 28 November 2016 at 12:40, Roman Luštrik <roman.lustrik at biolitika.si>
>> wrote:
>> > Hi Elizabeth,
>> >
>> > it would appear there is something funky happening with the code due to
>> > locus names being numeric. This has happened before in some other
>> function.
>> > Until we fix this, you can change your locus names so that they start
>> with a
>> > letter.
>> >
>> > Here is the excerpt from the genind object indicating that these two
>> samples
>> > have alleles 33:
>> >
>> >          X1401_25.13 X1401_25.33 X1403_13.11 X1403_13.13 X1403_13.33
>> > X1404_17.13 X1404_17.33 X1404_17.11
>> > C_KH1059 0 1 1 0 0 0 1 0
>> > M_KH1834 0 1 1 0 0 1 0 0
>> >
>> >
>> > Cheers,
>> > Roman
>> >
>> >
>> > ----
>> > In god we trust, all others bring data.
>> >
>> > ________________________________
>> > From: "Biz Sheedy" <biz.sheedy at gmail.com>
>> > To: "Roman Luštrik" <roman.lustrik at biolitika.si>
>> > Cc: adegenet-forum at lists.r-forge.r-project.org
>> > Sent: Monday, November 28, 2016 11:00:53 AM
>> >
>> > Subject: Re: [adegenet-forum] Discrepancy in NA counts
>> >
>> > Thanks for looking into this.
>> >
>> > Something that I did differently to the code you provided, was that I
>> only
>> > answered the prompts for the read.structure function. This meant I did
>> not
>> > use sep="\t" and the number of alleles was 62 instead of 72, which I
>> think
>> > should be comparable to the excel count. Following the code you provide,
>> > 'is.na' finds 23 NAs (instead of 20 NAs at 62 alleles and 16 zeroes in
>> > excel).
>> >
>> > Your explanation makes sense to me for the additional three NAs in
>> adegenet,
>> > but I still don't understand how in locus 1401_25 the data for two
>> > individuals (C_KH1059 and M_KH1834) changed from being homozygous for
>> "3" to
>> > being "NA"?
>> >
>> > I would really appreciate any further help on this.
>> >
>> > Thanks again,
>> > Elizabeth
>> >
>> >
>> > On 28 November 2016 at 18:03, Roman Luštrik <roman.lustrik at biolitika.si
>> >
>> > wrote:
>> >>
>> >> Hi,
>> >>
>> >> I think the problem is that adegenet, for consistency, adds NAs to
>> >> accommodate the extra alleles present for a particular locus. Take for
>> >> example C_KH1238 (bottom row in the example pasted belo).
>> >> In raw file, it has missing values for locus 1378_53, but this locus
>> has
>> >> three alleles, ergo 3 NAs and not 2. Can't go through all the NAs
>> right now,
>> >> but I think there's a pretty good chance this is what is causing the
>> >> discrepancy between what you see in "excel" and in adegenet.
>> >>
>> >> 1369_41.11 1372_14.22 1372_14.24 1373_9.44 1373_9.24 1377_42.44
>> 1377_42.24
>> >> 1378_53.22 1378_53.24 1378_53.44 1379_10.33 1379_10.13 1382_37.33
>> >> ...
>> >> C_KH1238 0 1 0 1 0 1 0 NA NA NA 1 0 1 # notice 3 NAs for all available
>> >> alleles for 1378_53, not just two (as expected for diploid)
>> >>
>> >>
>> >> Here is the code I used to explore this:
>> >>
>> >> library(adegenet)
>> >>
>> >> xy <- read.table("Sub_batch_1.stru", header = TRUE, sep = "\t")
>> >> xy <- xy[, c(-1, -2)]
>> >> table(as.matrix(xy))
>> >>
>> >> # 0 1 2 3 4
>> >> # 16 467 618 760 867
>> >>
>> >>
>> >> xy <- read.structure("Sub_batch_1.stru", NA.char="0",
>> >> n.ind = 44, n.loc = 31, onerowperind = FALSE,
>> >> col.lab = 1, col.pop = 2, row.marknames = 1,
>> >> sep = "\t", col.others = 0)
>> >>
>> >> xy <- tab(xy)
>> >> xy[grepl("C_KH1238", rownames(xy)), grepl("1378_53", colnames(xy))]
>> >>
>> >> Cheers,
>> >> Roman
>> >>
>> >> ----
>> >> In god we trust, all others bring data.
>> >>
>> >> ________________________________
>> >> From: "Biz Sheedy" <biz.sheedy at gmail.com>
>> >> To: "Roman Luštrik" <roman.lustrik at biolitika.si>
>> >> Sent: Monday, November 28, 2016 9:11:39 AM
>> >> Subject: Re: [adegenet-forum] Discrepancy in NA counts
>> >>
>> >> My apologies. First time posting to a forum so I am a little unsure of
>> >> things. I have attached a subset of the data, which includes the locus
>> that
>> >> I saw had problems.
>> >>
>> >> In this case there are 31 loci with 16 zeroes counted (excel), and 20
>> NAs
>> >> counted (adegenet). The additional NAs occur in locus 1401_25.
>> >>
>> >> Thanks so much,
>> >> Elizabeth
>> >>
>> >> On 28 November 2016 at 16:31, Roman Luštrik <
>> roman.lustrik at biolitika.si>
>> >> wrote:
>> >>>
>> >>> Hi,
>> >>>
>> >>> can you share a (subset) of the dataset? It's hard to pinpoint where
>> >>> things might be going wrong without some data in hand.
>> >>>
>> >>> Cheers,
>> >>> Roman
>> >>>
>> >>> ----
>> >>> In god we trust, all others bring data.
>> >>>
>> >>> ________________________________
>> >>> From: "Biz Sheedy" <biz.sheedy at gmail.com>
>> >>> To: adegenet-forum at lists.r-forge.r-project.org
>> >>> Sent: Friday, November 25, 2016 10:44:16 AM
>> >>> Subject: [adegenet-forum] Discrepancy in NA counts
>> >>>
>> >>> Dear All,
>> >>>
>> >>> I am trying to read SNP data from Stacks into adegenet. I have tried
>> >>> read.structure and read.genepop but they both give (the same) NA
>> counts that
>> >>> are higher than expected. Using read.table on the structure-formatted
>> file
>> >>> (with "ind" and "pop" inserted into the first two columns of row one)
>> gave
>> >>> the expected number of missing data.
>> >>>
>> >>> I looked at a single population subset (both the original and the
>> >>> converted data) in excel and found a locus where in the original
>> data, all
>> >>> nine individuals were "3", but in the converted data one individual
>> was
>> >>> "NA". The loci before and after this one both matched/were correct.
>> >>>
>> >>> I am not sure what I have missed for this to happen, my R skills are
>> >>> beginner at best. Any help with reading the data in correctly would be
>> >>> greatly appreciated!
>> >>>
>> >>> Thank you,
>> >>> Elizabeth
>> >>>
>> >>>
>> >>> R version 3.3.2
>> >>> adegenet version 2.0.1
>> >>>
>> >>> Data: 44 individuals, diploid, 4279 loci.
>> >>>
>> >>> all<-read.structure("all_batch_1.stru", NA.char="0")
>> >>>
>> >>> Total cells in excel: 376552
>> >>> After read.structure/genepop: 44*8558=376552
>> >>>
>> >>> 0s in excel: 3952
>> >>> 0s after read.table; length(which(X==0)): 3952
>> >>> NA after read.structure/genepop; sum(is.na(all$tab)): 4008
>> >>> Difference: 56
>> >>>
>> >>> Subset Chichi
>> >>> Total cells: 77022
>> >>> After read.structure/genepop: 9*8558=77022
>> >>>
>> >>> 0s in excel: 742
>> >>> NA after read.structure/genepop; sum(is.na(chi$tab)): 756
>> >>> Difference: 14
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> 4-1-1 Amakubo
>> >>> Department of Botany
>> >>> National Museum of Nature and Science
>> >>> Tsukuba, Ibaraki 305-0005
>> >>> Japan
>> >>>
>> >>> biz.sheedy at gmail.com
>> >>>
>> >>> _______________________________________________
>> >>> adegenet-forum mailing list
>> >>> adegenet-forum at lists.r-forge.r-project.org
>> >>>
>> >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo
>> /adegenet-forum
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >> 4-1-1 Amakubo
>> >> Department of Botany
>> >> National Museum of Nature and Science
>> >> Tsukuba, Ibaraki 305-0005
>> >> Japan
>> >>
>> >> biz.sheedy at gmail.com
>> >>
>> >
>> >
>> >
>> > --
>> > 4-1-1 Amakubo
>> > Department of Botany
>> > National Museum of Nature and Science
>> > Tsukuba, Ibaraki 305-0005
>> > Japan
>> >
>> > biz.sheedy at gmail.com
>> >
>> >
>> > _______________________________________________
>> > adegenet-forum mailing list
>> > adegenet-forum at lists.r-forge.r-project.org
>> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo
>> /adegenet-forum
>>
>
>
>
> --
> Dr Elizabeth Sheedy
> JSPS Postdoctoral fellow
>
> 4-1-1 Amakubo
> Department of Botany
> National Museum of Nature and Science
> Tsukuba, Ibaraki 305-0005
> Japan
>
> biz.sheedy at gmail.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20161207/6c71666b/attachment.html>


More information about the adegenet-forum mailing list