[adegenet-forum] missing data not recognized for partially incomplete genotypes

Zhian Kamvar zkamvar at gmail.com
Sun Mar 1 05:05:18 CET 2020


Hi Chloe,

Because adegenet internally represents genotypes as counts of alleles, it's not possible to have partially-missing genotypes that acknowledge the presence of an absence. 

If you are working in a halo-diploid system, the poppr package does allow for partially-missing data coded as zero (see: https://grunwaldlab.github.io/poppr/articles/poppr_manual.html#intro:import:polyploid <https://grunwaldlab.github.io/poppr/articles/poppr_manual.html#intro:import:polyploid>) and you can remove those zero alleles using the `recode_polyploids()` function and partial genotypes will be designated to the observed ploidy. It does not, however explicitly encode the null allele. Below is an example of what this looks like:

suppressPackageStartupMessages(library("poppr"))
x <- data.frame(LOC = c("100/100", "000/000", "200/000"))
g <- as.genclone(df2genind(x, sep = "/", NA.char = "000"))
#> Warning in df2genind(x, sep = "/", NA.char = "000"): entirely non-type
#> individual(s) deleted
g
#> 
#> This is a genclone object
#> -------------------------
#> Genotype information:
#> 
#>    2 original multilocus genotypes 
#>    2 diploid individuals
#>    1 codominant loci
#> 
#> Population information:
#> 
#>    0 strata. 
#>    0 populations defined.
tab(g) # zero alleles still coded 
#>   LOC.100 LOC.200 LOC.000
#> 1       2       0       0
#> 3       0       1       1
gr <- recode_polyploids(g, newploidy = TRUE)
gr
#> 
#> This is a genclone object
#> -------------------------
#> Genotype information:
#> 
#>    2 original multilocus genotypes 
#>    2 haploid (1) and diploid (1) individuals
#>    1 codominant loci
#> 
#> Population information:
#> 
#>    0 strata. 
#>    0 populations defined.
tab(gr) # zero alleles removed
#>   LOC.100 LOC.200
#> 1       2       0
#> 3       0       1 

I hope that helps.

Best,
Zhian

> On Feb 29, 2020, at 03:00 , adegenet-forum-request at lists.r-forge.r-project.org wrote:
> 
> Date: Fri, 28 Feb 2020 22:38:22 -0500
> From: Chloe Chen-Kraus <chloe.chen-kraus at yale.edu <mailto:chloe.chen-kraus at yale.edu>>
> To: adegenet-forum at lists.r-forge.r-project.org <mailto:adegenet-forum at lists.r-forge.r-project.org>
> Subject: [adegenet-forum] missing data not recognized for partially
> 	incomplete genotypes
> Message-ID:
> 	<CAONapZceFOaC-Jmnh9T7fhX9zdE4HRkjS4edvon=rNMMhz+XFg at mail.gmail.com <mailto:CAONapZceFOaC-Jmnh9T7fhX9zdE4HRkjS4edvon=rNMMhz+XFg at mail.gmail.com>>
> Content-Type: text/plain; charset="utf-8"
> 
> Hi--
> 
> I am creating a genind object for diploid microsatellite data with
> df2genind. I indicate that missing values are coded as "000" with
> NA.char="000". When I examine the resulting genind object I can see that
> genotypes where both alleles are missing are recognized as NA, but "000" is
> still recognized as an allele for genotypes that have only one allele
> missing.
> 
> For example:
> 
>> genotypes.df
>      Ind Pop     PV1    PV16    PV14    PV15     PV6     PV8
> 1    A01    A 000/000 275/279 300/306 252/262 264/270 214/218
> 2    A02    A 155/159 275/291 300/000 252/262 268/268 216/216
> 3    A03    A 163/163 275/291 000/000 248/256 268/272 216/218
> 4    A04    A 159/159 275/000 300/312 252/262 270/274 212/220
> 5    A05    A 000/000 275/283 306/312 252/252 270/272 214/216
> 
>> genotypes <- df2genind(genotypes.df[,-(1:2)], ploidy=2, sep="/", ncode=3,
> NA.char = "000")
> 
> When I check to see if this worked by converting back to a dataframe, I see
> that 000/000 genotypes are now recognized as NAs but 000 is still
> recognized as an allele for genotypes with only one missing allele.
> 
>> genind2df(genotypes, sep="/")
>        PV1    PV16    PV14    PV15     PV6     PV8
> 001    <NA> 275/279 300/306 252/262 264/270 214/218
> 002 155/159 275/291 300/000 252/262 268/268 216/216
> 003 163/163 275/291    <NA> 248/256 268/272 218/216
> 004 159/159 275/000 300/312 252/262 270/274 212/220
> 005    <NA> 275/283 306/312 252/252 270/272 214/216
> 
>> genotypes at all.names <mailto:genotypes at all.names>
> $PV1
> [1] "155" "159" "163" "157" "161" "169" "000" "165"
> 
> $PV16
> [1] "275" "279" "291" "000" "283" "287" "289" "281" "277" "285" "293"
> 
> $PV14
> [1] "300" "306" "000" "312" "308" "316" "304" "302" "318"
> 
> $PV15
> [1] "252" "262" "248" "256" "250" "258" "246" "266" "254" "264"
> 
> $PV6
> [1] "264" "270" "268" "272" "274" "266" "280" "276" "000"
> 
> $PV8
> [1] "214" "218" "216" "212" "220" "222" "224" "000"
> 
> 
> Any insights on how to fix this issue would be greatly appreciated. Thanks!
> 
> -*Chloe Chen-Kraus*
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20200228/66a1a431/attachment-0001.html <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20200228/66a1a431/attachment-0001.html>>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20200229/b84c494c/attachment-0001.html>


More information about the adegenet-forum mailing list