[adegenet-forum] missing data not recognized for partially incomplete genotypes
Zhian Kamvar
zkamvar at gmail.com
Sun Mar 1 05:05:18 CET 2020
Hi Chloe,
Because adegenet internally represents genotypes as counts of alleles, it's not possible to have partially-missing genotypes that acknowledge the presence of an absence.
If you are working in a halo-diploid system, the poppr package does allow for partially-missing data coded as zero (see: https://grunwaldlab.github.io/poppr/articles/poppr_manual.html#intro:import:polyploid <https://grunwaldlab.github.io/poppr/articles/poppr_manual.html#intro:import:polyploid>) and you can remove those zero alleles using the `recode_polyploids()` function and partial genotypes will be designated to the observed ploidy. It does not, however explicitly encode the null allele. Below is an example of what this looks like:
suppressPackageStartupMessages(library("poppr"))
x <- data.frame(LOC = c("100/100", "000/000", "200/000"))
g <- as.genclone(df2genind(x, sep = "/", NA.char = "000"))
#> Warning in df2genind(x, sep = "/", NA.char = "000"): entirely non-type
#> individual(s) deleted
g
#>
#> This is a genclone object
#> -------------------------
#> Genotype information:
#>
#> 2 original multilocus genotypes
#> 2 diploid individuals
#> 1 codominant loci
#>
#> Population information:
#>
#> 0 strata.
#> 0 populations defined.
tab(g) # zero alleles still coded
#> LOC.100 LOC.200 LOC.000
#> 1 2 0 0
#> 3 0 1 1
gr <- recode_polyploids(g, newploidy = TRUE)
gr
#>
#> This is a genclone object
#> -------------------------
#> Genotype information:
#>
#> 2 original multilocus genotypes
#> 2 haploid (1) and diploid (1) individuals
#> 1 codominant loci
#>
#> Population information:
#>
#> 0 strata.
#> 0 populations defined.
tab(gr) # zero alleles removed
#> LOC.100 LOC.200
#> 1 2 0
#> 3 0 1
I hope that helps.
Best,
Zhian
> On Feb 29, 2020, at 03:00 , adegenet-forum-request at lists.r-forge.r-project.org wrote:
>
> Date: Fri, 28 Feb 2020 22:38:22 -0500
> From: Chloe Chen-Kraus <chloe.chen-kraus at yale.edu <mailto:chloe.chen-kraus at yale.edu>>
> To: adegenet-forum at lists.r-forge.r-project.org <mailto:adegenet-forum at lists.r-forge.r-project.org>
> Subject: [adegenet-forum] missing data not recognized for partially
> incomplete genotypes
> Message-ID:
> <CAONapZceFOaC-Jmnh9T7fhX9zdE4HRkjS4edvon=rNMMhz+XFg at mail.gmail.com <mailto:CAONapZceFOaC-Jmnh9T7fhX9zdE4HRkjS4edvon=rNMMhz+XFg at mail.gmail.com>>
> Content-Type: text/plain; charset="utf-8"
>
> Hi--
>
> I am creating a genind object for diploid microsatellite data with
> df2genind. I indicate that missing values are coded as "000" with
> NA.char="000". When I examine the resulting genind object I can see that
> genotypes where both alleles are missing are recognized as NA, but "000" is
> still recognized as an allele for genotypes that have only one allele
> missing.
>
> For example:
>
>> genotypes.df
> Ind Pop PV1 PV16 PV14 PV15 PV6 PV8
> 1 A01 A 000/000 275/279 300/306 252/262 264/270 214/218
> 2 A02 A 155/159 275/291 300/000 252/262 268/268 216/216
> 3 A03 A 163/163 275/291 000/000 248/256 268/272 216/218
> 4 A04 A 159/159 275/000 300/312 252/262 270/274 212/220
> 5 A05 A 000/000 275/283 306/312 252/252 270/272 214/216
>
>> genotypes <- df2genind(genotypes.df[,-(1:2)], ploidy=2, sep="/", ncode=3,
> NA.char = "000")
>
> When I check to see if this worked by converting back to a dataframe, I see
> that 000/000 genotypes are now recognized as NAs but 000 is still
> recognized as an allele for genotypes with only one missing allele.
>
>> genind2df(genotypes, sep="/")
> PV1 PV16 PV14 PV15 PV6 PV8
> 001 <NA> 275/279 300/306 252/262 264/270 214/218
> 002 155/159 275/291 300/000 252/262 268/268 216/216
> 003 163/163 275/291 <NA> 248/256 268/272 218/216
> 004 159/159 275/000 300/312 252/262 270/274 212/220
> 005 <NA> 275/283 306/312 252/252 270/272 214/216
>
>> genotypes at all.names <mailto:genotypes at all.names>
> $PV1
> [1] "155" "159" "163" "157" "161" "169" "000" "165"
>
> $PV16
> [1] "275" "279" "291" "000" "283" "287" "289" "281" "277" "285" "293"
>
> $PV14
> [1] "300" "306" "000" "312" "308" "316" "304" "302" "318"
>
> $PV15
> [1] "252" "262" "248" "256" "250" "258" "246" "266" "254" "264"
>
> $PV6
> [1] "264" "270" "268" "272" "274" "266" "280" "276" "000"
>
> $PV8
> [1] "214" "218" "216" "212" "220" "222" "224" "000"
>
>
> Any insights on how to fix this issue would be greatly appreciated. Thanks!
>
> -*Chloe Chen-Kraus*
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20200228/66a1a431/attachment-0001.html <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20200228/66a1a431/attachment-0001.html>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20200229/b84c494c/attachment-0001.html>
More information about the adegenet-forum
mailing list