[adegenet-forum] read.genepop (adegenet 2.0.0 with R v. 3.2.1)

Thu Jul 30 12:31:01 CEST 2015

Hi Paul,

yes, I think I wrote this a couple of times already, but possibly in another thread. As of adegenet 2.0.0, missing data are always stored as NA in genind objects. NA replacement will take place when extracting information from the object, typically using tab(...). The user is not supposed to change the content of @tab manually.

The suggestion of replacing missing values with a median rather than mean is interesting. If you know about common / useful practices currently not available, feel free to post an issue - this kind feature is quick to add. Quick to do using:

adegenetIssues()

Best
Thibaut

________________________________
From: Paul Maier [maierpa at gmail.com]
Sent: 30 July 2015 06:34
To: Jombart, Thibaut
Cc: Zhian Kamvar; adegenet-forum at lists.r-forge.r-project.org
Subject: Re: [adegenet-forum] read.genepop (adegenet 2.0.0 with R v. 3.2.1)

FYI - this is fixed in the tab function itself, but seems not to update the rest of the genind object when applied to genind at tab. So for example, if you import to genind, use tab(x, NA.method="mean") on the x at tab, then export using genind2df, it will fail. Also, did the old NA.method="mean" replace missing values with median alleles? This v gives a mean, which is not ideal if other programs are expecting integers.

I'm still using my above code as a workaround.

----------------------------------------------
Paul Maier

San Diego State, PhD Student
US Geological Survey, Biologist
The Biodiversity Group, Science Advisor

On Thu, Jul 16, 2015 at 4:25 AM, Jombart, Thibaut <t.jombart at imperial.ac.uk<mailto:t.jombart at imperial.ac.uk>> wrote:
Fixed now:
https://github.com/thibautjombart/adegenet/issues/71#issuecomment-121790358

And readily available in the devel version:

install.packages("devtools")
library(devtools)
install_github("thibautjombart/adegenet")
library("adegenet")

Cheers
Thibaut

________________________________________
From: adegenet-forum-bounces at lists.r-forge.r-project.org<mailto:adegenet-forum-bounces at lists.r-forge.r-project.org> [adegenet-forum-bounces at lists.r-forge.r-project.org<mailto:adegenet-forum-bounces at lists.r-forge.r-project.org>] on behalf of Jombart, Thibaut [t.jombart at imperial.ac.uk<mailto:t.jombart at imperial.ac.uk>]
Sent: 16 July 2015 11:50
To: Zhian Kamvar; adegenet-forum at lists.r-forge.r-project.org<mailto:adegenet-forum at lists.r-forge.r-project.org>
Subject: Re: [adegenet-forum] read.genepop (adegenet 2.0.0 with R v. 3.2.1)

Looks like a bug indeed. Thanks for spotting it. Will fix today.

Cheers
Thibaut

________________________________________
From: adegenet-forum-bounces at lists.r-forge.r-project.org<mailto:adegenet-forum-bounces at lists.r-forge.r-project.org> [adegenet-forum-bounces at lists.r-forge.r-project.org<mailto:adegenet-forum-bounces at lists.r-forge.r-project.org>] on behalf of Zhian Kamvar [zkamvar at gmail.com<mailto:zkamvar at gmail.com>]
Sent: 16 July 2015 01:35
To: adegenet-forum at lists.r-forge.r-project.org<mailto:adegenet-forum at lists.r-forge.r-project.org>
Subject: Re: [adegenet-forum] read.genepop (adegenet 2.0.0 with R v. 3.2.1)

This smells like a bug. After poking around some, it is indeed one in read.fstat and read.genepop. (Both read.genetix and read.structure still work):

> obj <- read.genepop(system.file("files/nancycats.gen",package="adegenet"))

 Converting data from a Genepop .gen file to a genind object...

File description:  Genotypes of cats from 17 colonies of Nancy (France)

...done.

> obj
/// GENIND OBJECT /////////

 // 237 individuals; 9 loci; 111 alleles; size: 138.5 Kb

 // Basic content
   @tab:  237 x 111 matrix of allele counts
   @loc.n.all: number of alleles per locus (range: 8-18)
   @loc.fac: locus factor for the 111 columns of @tab
   @all.names: list of allele names for each locus
   @ploidy: ploidy of each individual  (range: 2-2)
   @type:  codom
   @call: read.genepop(file = system.file("files/nancycats.gen", package = "adegenet"))

 // Optional content
   @pop: population of each individual (group size range: 9-23)
> summary(obj)

 # Total number of genotypes:  237

 # Population sample sizes:
 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17
10 22 12 23 15 11 14 10  9 11 20 14 13 17 11 12 13

 # Number of alleles per locus:
 fca8 fca23 fca43 fca45 fca77 fca78 fca90 fca96 fca37
   17    11    10    10    12     8    12    13    18

 # Number of alleles per population:
 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17
37 53 50 67 48 56 43 54 43 46 73 53 44 62 42 40 37

 # Percentage of missing data:
[1] 0

 # Observed heterozygosity:
     fca8     fca23     fca43     fca45     fca77     fca78     fca90     fca96     fca37
0.6118143 0.6666667 0.6793249 0.6455696 0.6329114 0.5654008 0.6497890 0.5949367 0.4514768

 # Expected heterozygosity:
     fca8     fca23     fca43     fca45     fca77     fca78     fca90     fca96     fca37
0.8803076 0.7928751 0.7953319 0.7930531 0.8702576 0.6884669 0.8157881 0.7767630 0.6062686

This will be reported and fixed.

Cheers,
Zhian

> On Jul 15, 2015, at 11:16 , adegenet-forum-request at lists.r-forge.r-project.org<mailto:adegenet-forum-request at lists.r-forge.r-project.org> wrote:
>
> On closer inspection, it appears the new version stores missing data as
> alleles (i.e. *.00 in @tab). So using tab to replace the allele counts
> doesn't work. For example, x at tab <- tab(x, NA.method="mean") does nothing
> because missing data is stored as normal data. Here's a workaround I
> created, although probably not the most clever method, it fixed my problem.
> Hopefully this helps someone!
> Paul
>
> # Fix missing values to reflect depracated option, missing = "mean"
> x at tab <- x at tab[,-grep("\\.00",colnames(x at tab))] #remove "00" alleles
> rep <- gsub("([^\\.]+)\\.\\d+","\\1",colnames(x at tab)) #locus names
> loci <- unique(x at loc.fac) #unique locus names
> x at loc.fac <- as.factor(rep)
> for (i in 1:length(x at all.names))
>  if ("00" %in% x at all.names[[i]]) #remove "00" from allele names
>    x at all.names[[i]] <- x at all.names[[i]][-which(x at all.names[[i]]=="00")]
> for (i in 1:length(x at loc.n.all)) #remove "00" from allele counts
>  x at loc.n.all[[i]] <- length(x at all.names[[i]])
> for (i in 1:length(loci)) { #replace missing data with mean allele counts
>  df <- data.frame(x at tab[,which(loci[i] == rep)]) #df, alleles for one locus
>  for (j in 1:nrow(df)) {
>    if (sum(df[j,]) == 0) {
>      for (k in 1:length(df[j,])) { #mean allele counts from rows with data
>        df[j,k] <- round(mean( df[which(apply(df,1,sum) != 0),k] ))
>      }
>      x at tab[j,which(loci[i] == rep)] <- as.numeric(df[j,])
>    }
>  }
> }

_______________________________________________
adegenet-forum mailing list
adegenet-forum at lists.r-forge.r-project.org<mailto:adegenet-forum at lists.r-forge.r-project.org>
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum
_______________________________________________
adegenet-forum mailing list
adegenet-forum at lists.r-forge.r-project.org<mailto:adegenet-forum at lists.r-forge.r-project.org>
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum
_______________________________________________
adegenet-forum mailing list
adegenet-forum at lists.r-forge.r-project.org<mailto:adegenet-forum at lists.r-forge.r-project.org>
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20150730/30ecff44/attachment-0001.html>