[adegenet-forum] read.genepop (adegenet 2.0.0 with R v. 3.2.1)
Jombart, Thibaut
t.jombart at imperial.ac.uk
Thu Jul 16 12:50:50 CEST 2015
Looks like a bug indeed. Thanks for spotting it. Will fix today.
Cheers
Thibaut
________________________________________
From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Zhian Kamvar [zkamvar at gmail.com]
Sent: 16 July 2015 01:35
To: adegenet-forum at lists.r-forge.r-project.org
Subject: Re: [adegenet-forum] read.genepop (adegenet 2.0.0 with R v. 3.2.1)
This smells like a bug. After poking around some, it is indeed one in read.fstat and read.genepop. (Both read.genetix and read.structure still work):
> obj <- read.genepop(system.file("files/nancycats.gen",package="adegenet"))
Converting data from a Genepop .gen file to a genind object...
File description: Genotypes of cats from 17 colonies of Nancy (France)
...done.
> obj
/// GENIND OBJECT /////////
// 237 individuals; 9 loci; 111 alleles; size: 138.5 Kb
// Basic content
@tab: 237 x 111 matrix of allele counts
@loc.n.all: number of alleles per locus (range: 8-18)
@loc.fac: locus factor for the 111 columns of @tab
@all.names: list of allele names for each locus
@ploidy: ploidy of each individual (range: 2-2)
@type: codom
@call: read.genepop(file = system.file("files/nancycats.gen", package = "adegenet"))
// Optional content
@pop: population of each individual (group size range: 9-23)
> summary(obj)
# Total number of genotypes: 237
# Population sample sizes:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
10 22 12 23 15 11 14 10 9 11 20 14 13 17 11 12 13
# Number of alleles per locus:
fca8 fca23 fca43 fca45 fca77 fca78 fca90 fca96 fca37
17 11 10 10 12 8 12 13 18
# Number of alleles per population:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
37 53 50 67 48 56 43 54 43 46 73 53 44 62 42 40 37
# Percentage of missing data:
[1] 0
# Observed heterozygosity:
fca8 fca23 fca43 fca45 fca77 fca78 fca90 fca96 fca37
0.6118143 0.6666667 0.6793249 0.6455696 0.6329114 0.5654008 0.6497890 0.5949367 0.4514768
# Expected heterozygosity:
fca8 fca23 fca43 fca45 fca77 fca78 fca90 fca96 fca37
0.8803076 0.7928751 0.7953319 0.7930531 0.8702576 0.6884669 0.8157881 0.7767630 0.6062686
This will be reported and fixed.
Cheers,
Zhian
> On Jul 15, 2015, at 11:16 , adegenet-forum-request at lists.r-forge.r-project.org wrote:
>
> On closer inspection, it appears the new version stores missing data as
> alleles (i.e. *.00 in @tab). So using tab to replace the allele counts
> doesn't work. For example, x at tab <- tab(x, NA.method="mean") does nothing
> because missing data is stored as normal data. Here's a workaround I
> created, although probably not the most clever method, it fixed my problem.
> Hopefully this helps someone!
> Paul
>
> # Fix missing values to reflect depracated option, missing = "mean"
> x at tab <- x at tab[,-grep("\\.00",colnames(x at tab))] #remove "00" alleles
> rep <- gsub("([^\\.]+)\\.\\d+","\\1",colnames(x at tab)) #locus names
> loci <- unique(x at loc.fac) #unique locus names
> x at loc.fac <- as.factor(rep)
> for (i in 1:length(x at all.names))
> if ("00" %in% x at all.names[[i]]) #remove "00" from allele names
> x at all.names[[i]] <- x at all.names[[i]][-which(x at all.names[[i]]=="00")]
> for (i in 1:length(x at loc.n.all)) #remove "00" from allele counts
> x at loc.n.all[[i]] <- length(x at all.names[[i]])
> for (i in 1:length(loci)) { #replace missing data with mean allele counts
> df <- data.frame(x at tab[,which(loci[i] == rep)]) #df, alleles for one locus
> for (j in 1:nrow(df)) {
> if (sum(df[j,]) == 0) {
> for (k in 1:length(df[j,])) { #mean allele counts from rows with data
> df[j,k] <- round(mean( df[which(apply(df,1,sum) != 0),k] ))
> }
> x at tab[j,which(loci[i] == rep)] <- as.numeric(df[j,])
> }
> }
> }
_______________________________________________
adegenet-forum mailing list
adegenet-forum at lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum
More information about the adegenet-forum
mailing list