[adegenet-forum] Inbreeding coefficient calculation by adegenet

Jombart, Thibaut t.jombart at imperial.ac.uk
Mon Mar 24 09:51:51 CET 2014


Dear Tohamy

sorry, I am currently teaching at a workshop for the week and won't have time to look into this before April. Meanwhile I would suggest trying on a sample of SNPs with growing size (e.g. 10, 50, 100, 500, etc) to ensure that nothing fishy is happening here.

Cheers
Thibaut

________________________________________
From: Tohamy Yousef [tohamyy at yahoo.com]
Sent: 21 March 2014 13:50
To: Jombart, Thibaut; adegenet-forum at lists.r-forge.r-project.org
Subject: Re: [adegenet-forum] Inbreeding coefficient calculation by adegenet

Dear Dr. Jombart
I wrote you before and asked about inbreeding calculation. You suggested that I use less number of markers and other advices. SO, I tried to follow all your comments. For example I decided to use only 2247 SNP markers without missing data (before I had 43231 with missing data). But I still have the same problem. I attached you the new data (new_inbreeding.stru). Thanks in advance.

######### my script

> library(adegenet)
> toto<-read.structure(file="new_inbreeding.stru",n.ind=154,n.loc=2247,onerowperind=FALSE,col.lab=1,row.marknames=1,NA.char="-9",missing="mean")

Which column contains the population factor ('0' if absent)? 2

 Which other optional columns should be read (press 'return' when done)? 1:
> toto

   #####################
   ### Genind object ###
   #####################
- genotypes of individuals -

S4 class:  genind
@call: read.structure(file = "inbreeding.stru",
    n.ind = 154, n.loc = 2247, onerowperind = FALSE, col.lab = 1,
    row.marknames = 1, NA.char = "-9", missing = "mean")

@tab:  154 x 3610 matrix of genotypes

@ind.names: vector of  154 individual names
@loc.names: vector of  2247 locus names
@loc.nall: number of alleles per locus
@loc.fac: locus factor for the  3610 columns of @tab
@all.names: list of  2247 components yielding allele names for each locus
@ploidy:  2
@type:  codom

Optionnal contents:
@pop:  factor giving the population of each individual
@pop.names:  factor giving the population of each individual

@other: - empty -
> toto$pop.names
  P1   P2
"US" "IP"
>U<- seppop(toto)$US
> U

   #####################
   ### Genind object ###
   #####################
- genotypes of individuals -

S4 class:  genind
@call: .local(x = x, i = i, j = j, treatOther = ..1, quiet = ..2, drop = drop)

@tab:  81 x 3610 matrix of genotypes

@ind.names: vector of  81 individual names
@loc.names: vector of  2247 locus names
@loc.nall: number of alleles per locus
@loc.fac: locus factor for the  3610 columns of @tab
@all.names: list of  2247 components yielding allele names for each locus
@ploidy:  2
@type:  codom

Optionnal contents:
@pop:  factor giving the population of each individual
@pop.names:  factor giving the population of each individual

@other: a list containing: elements without names

> U1<- inbreeding(U, N=100)
Error in sample.int(length(x), size, replace, prob) :
  NA in probability vector



On Thursday, December 19, 2013 6:32 AM, "Jombart, Thibaut" <t.jombart at imperial.ac.uk> wrote:

Hello,

this was a tricky one. There is a problem in your command line, and then there is a problem inherent to likelihood and large datasets.

First, the problem is you replace missing data by "zero"s  allele frequencies (missing=0), which means that for some loci and some individuals, you have no allele, but still not treated as missing data. This screws the definition of homozygotes and thus the computations of inbreeding.

Second, for large datasets, the sum of log-likelihoods is so low that reverting them back to likelihoods gives 0. You end up with a density distribution that is approximately zero everywhere. You may still be able to visualize it as a density, but when deriving samples, this results in probabilities of zero and 'sample' complains about it. Note that this is not a theoretical issue, only a numerical precision problem.

I have just committed a patch so that now, a meaningful warning will be issued.

In principle, one could just add a constant to the sum of log-likelihood values as a workaround. But which value to add is not a trivial choice, and may vary from one individual to another. I'll pass on that for now.

As a workaround for your problem:
- don't use "missing=0"
- ask for the distributions, not the samples: "res.type="function""
- alternatively, use a smaller subset of loci
- use the new patch (attached) to replace the error with a warning - you'll get flat (uniform) distributions for the problematic individuals.

Cheers
Thibaut
________________________________________

From: adegenet-forum-bounces at lists.r-forge.r-project.org<mailto:adegenet-forum-bounces at lists.r-forge.r-project.org> [adegenet-forum-bounces at lists.r-forge.r-project.org<mailto:adegenet-forum-bounces at lists.r-forge.r-project.org>] on behalf of Tohamy Yousef [tohamyy at yahoo.com<mailto:tohamyy at yahoo.com>]
Sent: 18 December 2013 08:44
To: adegenet-forum at lists.r-forge.r-project.org<mailto:adegenet-forum at lists.r-forge.r-project.org>
Subject: [adegenet-forum] Inbreeding coefficient calculation by adegenet

Dear Dr. Jombart,
I am trying to calculate the inbreeding coefficient for two populations (US=81 and IP=73)  with 43231 SNPs. I am using your your manual, an introduction to adgenet 1.4-0, but I have a problem in the calculation. It gives me an error message:
Error in sample.int(length(x), size, replace, prob) :
  NA in probability vector
I do not know why? I did it as following:

> toto<-read.structure(file="finalFiltered_noLowCov_e0_LOCUS_POP.stru",n.ind=154,n.loc=43231,onerowperind=FALSE,col.lab=1,row.marknames=1,NA.char="-9",missing=0)
Which column contains the population factor ('0' if absent)? 2

Which other optional columns should be read (press 'return' when done)? 1:
Converting data from a STRUCTURE .stru file to a genind object...
> is.genind(toto)
[1] TRUE
> toto$pop.names
  P1  P2
"US" "IP"
> sa1<- seppop(toto)$US
> sa1
  #####################
  ### Genind object ###
  #####################
- genotypes of individuals -
S4 class:  genind
@call: .local(x = x, i = i, j = j, treatOther = ..1, quiet = ..2, drop = drop)
@tab:  81 x 61169 matrix of genotypes
@ind.names: vector of  81 individual names
@loc.names: vector of  43231 locus names
@loc.nall: number of alleles per locus
@loc.fac: locus factor for the  61169 columns of @tab
@all.names: list of  43231 components yielding allele names for each locus
@ploidy:  2
@type:  codom
Optionnal contents:
@pop:  factor giving the population of each individual
@pop.names:  factor giving the population of each individual
@other: a list containing: elements without names

> temp1<- inbreeding(sa1, N=81)
Error in sample.int(length(x), size, replace, prob) :
  NA in probability vector

> sa2<- seppop(toto)$IP
> sa2
  #####################
  ### Genind object ###
  #####################
- genotypes of individuals -
S4 class:  genind
@call: .local(x = x, i = i, j = j, treatOther = ..1, quiet = ..2, drop = drop)
@tab:  73 x 61169 matrix of genotypes
@ind.names: vector of  73 individual names
@loc.names: vector of  43231 locus names
@loc.nall: number of alleles per locus
@loc.fac: locus factor for the  61169 columns of @tab
@all.names: list of  43231 components yielding allele names for each locus
@ploidy:  2
@type:  codom
Optionnal contents:
@pop:  factor giving the population of each individual
@pop.names:  factor giving the population of each individual
@other: a list containing: elements without names

> temp2<- inbreeding(sa2, N=73)
Error in sample.int(length(x), size, replace, prob) :
  NA in probability vector


Could you please tell me how can I fix this problem?
I attached you a part of my data. and I hope that  help me.

Thank you advance
Best regards,
Tohamy
-------------------------------------------
Eltohamy Yousef  M.Sc.
Crop Biodiversity and Breeding Informatics
Institute of Plant Breeding, Seed Science and Population Genetics (350)
University of Hohenheim
Fruwirtstrasse 21, 70599 Stuttgart
Office phone:0049711 459-24437
Email: tohamyy at yahoo.com<mailto:tohamyy at yahoo.com>
            E.yousef at uni-hohenheim.de<mailto:E.yousef at uni-hohenheim.de>
            tohamy_yousef at agr.suez.edu.eg<mailto:tohamy_yousef at agr.suez.edu.eg>
Web.www.evoplant.uni-hohenheim.de



More information about the adegenet-forum mailing list