[adegenet-forum] read.genepop (adegenet 2.0.0 with R v. 3.2.1)

Zhian Kamvar zkamvar at gmail.com
Thu Jul 16 02:35:25 CEST 2015


This smells like a bug. After poking around some, it is indeed one in read.fstat and read.genepop. (Both read.genetix and read.structure still work):

> obj <- read.genepop(system.file("files/nancycats.gen",package="adegenet"))

 Converting data from a Genepop .gen file to a genind object... 


File description:  Genotypes of cats from 17 colonies of Nancy (France) 

...done.

> obj
/// GENIND OBJECT /////////

 // 237 individuals; 9 loci; 111 alleles; size: 138.5 Kb

 // Basic content
   @tab:  237 x 111 matrix of allele counts
   @loc.n.all: number of alleles per locus (range: 8-18)
   @loc.fac: locus factor for the 111 columns of @tab
   @all.names: list of allele names for each locus
   @ploidy: ploidy of each individual  (range: 2-2)
   @type:  codom
   @call: read.genepop(file = system.file("files/nancycats.gen", package = "adegenet"))

 // Optional content
   @pop: population of each individual (group size range: 9-23)
> summary(obj)

 # Total number of genotypes:  237 

 # Population sample sizes:  
 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 
10 22 12 23 15 11 14 10  9 11 20 14 13 17 11 12 13 

 # Number of alleles per locus:  
 fca8 fca23 fca43 fca45 fca77 fca78 fca90 fca96 fca37 
   17    11    10    10    12     8    12    13    18 

 # Number of alleles per population:  
 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 
37 53 50 67 48 56 43 54 43 46 73 53 44 62 42 40 37 

 # Percentage of missing data:  
[1] 0

 # Observed heterozygosity:  
     fca8     fca23     fca43     fca45     fca77     fca78     fca90     fca96     fca37 
0.6118143 0.6666667 0.6793249 0.6455696 0.6329114 0.5654008 0.6497890 0.5949367 0.4514768 

 # Expected heterozygosity:  
     fca8     fca23     fca43     fca45     fca77     fca78     fca90     fca96     fca37 
0.8803076 0.7928751 0.7953319 0.7930531 0.8702576 0.6884669 0.8157881 0.7767630 0.6062686 

This will be reported and fixed.

Cheers,
Zhian

> On Jul 15, 2015, at 11:16 , adegenet-forum-request at lists.r-forge.r-project.org wrote:
> 
> On closer inspection, it appears the new version stores missing data as
> alleles (i.e. *.00 in @tab). So using tab to replace the allele counts
> doesn't work. For example, x at tab <- tab(x, NA.method="mean") does nothing
> because missing data is stored as normal data. Here's a workaround I
> created, although probably not the most clever method, it fixed my problem.
> Hopefully this helps someone!
> Paul
> 
> # Fix missing values to reflect depracated option, missing = "mean"
> x at tab <- x at tab[,-grep("\\.00",colnames(x at tab))] #remove "00" alleles
> rep <- gsub("([^\\.]+)\\.\\d+","\\1",colnames(x at tab)) #locus names
> loci <- unique(x at loc.fac) #unique locus names
> x at loc.fac <- as.factor(rep)
> for (i in 1:length(x at all.names))
>  if ("00" %in% x at all.names[[i]]) #remove "00" from allele names
>    x at all.names[[i]] <- x at all.names[[i]][-which(x at all.names[[i]]=="00")]
> for (i in 1:length(x at loc.n.all)) #remove "00" from allele counts
>  x at loc.n.all[[i]] <- length(x at all.names[[i]])
> for (i in 1:length(loci)) { #replace missing data with mean allele counts
>  df <- data.frame(x at tab[,which(loci[i] == rep)]) #df, alleles for one locus
>  for (j in 1:nrow(df)) {
>    if (sum(df[j,]) == 0) {
>      for (k in 1:length(df[j,])) { #mean allele counts from rows with data
>        df[j,k] <- round(mean( df[which(apply(df,1,sum) != 0),k] ))
>      }
>      x at tab[j,which(loci[i] == rep)] <- as.numeric(df[j,])
>    }
>  }
> }



More information about the adegenet-forum mailing list