[adegenet-forum] Discrepancy in NA counts
Thibaut Jombart
thibautjombart at gmail.com
Mon Dec 5 13:10:03 CET 2016
Hello,
Roman has fixed this bug in the current devel version of adegenet. See:
https://github.com/thibautjombart/adegenet
For guidelines on installing it. Can you confirm it solves your issue?
Best
Thibaut
--
Dr Thibaut Jombart
Lecturer, Department of Infectious Disease Epidemiology, Imperial College London
Head of RECON: repidemicsconsortium.org
sites.google.com/site/thibautjombart/
github.com/thibautjombart
Twitter: @TeebzR
On 28 November 2016 at 12:40, Roman Luštrik <roman.lustrik at biolitika.si> wrote:
> Hi Elizabeth,
>
> it would appear there is something funky happening with the code due to
> locus names being numeric. This has happened before in some other function.
> Until we fix this, you can change your locus names so that they start with a
> letter.
>
> Here is the excerpt from the genind object indicating that these two samples
> have alleles 33:
>
> X1401_25.13 X1401_25.33 X1403_13.11 X1403_13.13 X1403_13.33
> X1404_17.13 X1404_17.33 X1404_17.11
> C_KH1059 0 1 1 0 0 0 1 0
> M_KH1834 0 1 1 0 0 1 0 0
>
>
> Cheers,
> Roman
>
>
> ----
> In god we trust, all others bring data.
>
> ________________________________
> From: "Biz Sheedy" <biz.sheedy at gmail.com>
> To: "Roman Luštrik" <roman.lustrik at biolitika.si>
> Cc: adegenet-forum at lists.r-forge.r-project.org
> Sent: Monday, November 28, 2016 11:00:53 AM
>
> Subject: Re: [adegenet-forum] Discrepancy in NA counts
>
> Thanks for looking into this.
>
> Something that I did differently to the code you provided, was that I only
> answered the prompts for the read.structure function. This meant I did not
> use sep="\t" and the number of alleles was 62 instead of 72, which I think
> should be comparable to the excel count. Following the code you provide,
> 'is.na' finds 23 NAs (instead of 20 NAs at 62 alleles and 16 zeroes in
> excel).
>
> Your explanation makes sense to me for the additional three NAs in adegenet,
> but I still don't understand how in locus 1401_25 the data for two
> individuals (C_KH1059 and M_KH1834) changed from being homozygous for "3" to
> being "NA"?
>
> I would really appreciate any further help on this.
>
> Thanks again,
> Elizabeth
>
>
> On 28 November 2016 at 18:03, Roman Luštrik <roman.lustrik at biolitika.si>
> wrote:
>>
>> Hi,
>>
>> I think the problem is that adegenet, for consistency, adds NAs to
>> accommodate the extra alleles present for a particular locus. Take for
>> example C_KH1238 (bottom row in the example pasted belo).
>> In raw file, it has missing values for locus 1378_53, but this locus has
>> three alleles, ergo 3 NAs and not 2. Can't go through all the NAs right now,
>> but I think there's a pretty good chance this is what is causing the
>> discrepancy between what you see in "excel" and in adegenet.
>>
>> 1369_41.11 1372_14.22 1372_14.24 1373_9.44 1373_9.24 1377_42.44 1377_42.24
>> 1378_53.22 1378_53.24 1378_53.44 1379_10.33 1379_10.13 1382_37.33
>> ...
>> C_KH1238 0 1 0 1 0 1 0 NA NA NA 1 0 1 # notice 3 NAs for all available
>> alleles for 1378_53, not just two (as expected for diploid)
>>
>>
>> Here is the code I used to explore this:
>>
>> library(adegenet)
>>
>> xy <- read.table("Sub_batch_1.stru", header = TRUE, sep = "\t")
>> xy <- xy[, c(-1, -2)]
>> table(as.matrix(xy))
>>
>> # 0 1 2 3 4
>> # 16 467 618 760 867
>>
>>
>> xy <- read.structure("Sub_batch_1.stru", NA.char="0",
>> n.ind = 44, n.loc = 31, onerowperind = FALSE,
>> col.lab = 1, col.pop = 2, row.marknames = 1,
>> sep = "\t", col.others = 0)
>>
>> xy <- tab(xy)
>> xy[grepl("C_KH1238", rownames(xy)), grepl("1378_53", colnames(xy))]
>>
>> Cheers,
>> Roman
>>
>> ----
>> In god we trust, all others bring data.
>>
>> ________________________________
>> From: "Biz Sheedy" <biz.sheedy at gmail.com>
>> To: "Roman Luštrik" <roman.lustrik at biolitika.si>
>> Sent: Monday, November 28, 2016 9:11:39 AM
>> Subject: Re: [adegenet-forum] Discrepancy in NA counts
>>
>> My apologies. First time posting to a forum so I am a little unsure of
>> things. I have attached a subset of the data, which includes the locus that
>> I saw had problems.
>>
>> In this case there are 31 loci with 16 zeroes counted (excel), and 20 NAs
>> counted (adegenet). The additional NAs occur in locus 1401_25.
>>
>> Thanks so much,
>> Elizabeth
>>
>> On 28 November 2016 at 16:31, Roman Luštrik <roman.lustrik at biolitika.si>
>> wrote:
>>>
>>> Hi,
>>>
>>> can you share a (subset) of the dataset? It's hard to pinpoint where
>>> things might be going wrong without some data in hand.
>>>
>>> Cheers,
>>> Roman
>>>
>>> ----
>>> In god we trust, all others bring data.
>>>
>>> ________________________________
>>> From: "Biz Sheedy" <biz.sheedy at gmail.com>
>>> To: adegenet-forum at lists.r-forge.r-project.org
>>> Sent: Friday, November 25, 2016 10:44:16 AM
>>> Subject: [adegenet-forum] Discrepancy in NA counts
>>>
>>> Dear All,
>>>
>>> I am trying to read SNP data from Stacks into adegenet. I have tried
>>> read.structure and read.genepop but they both give (the same) NA counts that
>>> are higher than expected. Using read.table on the structure-formatted file
>>> (with "ind" and "pop" inserted into the first two columns of row one) gave
>>> the expected number of missing data.
>>>
>>> I looked at a single population subset (both the original and the
>>> converted data) in excel and found a locus where in the original data, all
>>> nine individuals were "3", but in the converted data one individual was
>>> "NA". The loci before and after this one both matched/were correct.
>>>
>>> I am not sure what I have missed for this to happen, my R skills are
>>> beginner at best. Any help with reading the data in correctly would be
>>> greatly appreciated!
>>>
>>> Thank you,
>>> Elizabeth
>>>
>>>
>>> R version 3.3.2
>>> adegenet version 2.0.1
>>>
>>> Data: 44 individuals, diploid, 4279 loci.
>>>
>>> all<-read.structure("all_batch_1.stru", NA.char="0")
>>>
>>> Total cells in excel: 376552
>>> After read.structure/genepop: 44*8558=376552
>>>
>>> 0s in excel: 3952
>>> 0s after read.table; length(which(X==0)): 3952
>>> NA after read.structure/genepop; sum(is.na(all$tab)): 4008
>>> Difference: 56
>>>
>>> Subset Chichi
>>> Total cells: 77022
>>> After read.structure/genepop: 9*8558=77022
>>>
>>> 0s in excel: 742
>>> NA after read.structure/genepop; sum(is.na(chi$tab)): 756
>>> Difference: 14
>>>
>>>
>>>
>>> --
>>> 4-1-1 Amakubo
>>> Department of Botany
>>> National Museum of Nature and Science
>>> Tsukuba, Ibaraki 305-0005
>>> Japan
>>>
>>> biz.sheedy at gmail.com
>>>
>>> _______________________________________________
>>> adegenet-forum mailing list
>>> adegenet-forum at lists.r-forge.r-project.org
>>>
>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum
>>
>>
>>
>>
>> --
>> 4-1-1 Amakubo
>> Department of Botany
>> National Museum of Nature and Science
>> Tsukuba, Ibaraki 305-0005
>> Japan
>>
>> biz.sheedy at gmail.com
>>
>
>
>
> --
> 4-1-1 Amakubo
> Department of Botany
> National Museum of Nature and Science
> Tsukuba, Ibaraki 305-0005
> Japan
>
> biz.sheedy at gmail.com
>
>
> _______________________________________________
> adegenet-forum mailing list
> adegenet-forum at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum
More information about the adegenet-forum
mailing list