[adegenet-forum] Inbreeding coefficient calculation by adegenet
Jombart, Thibaut
t.jombart at imperial.ac.uk
Thu Dec 19 06:32:13 CET 2013
Hello,
this was a tricky one. There is a problem in your command line, and then there is a problem inherent to likelihood and large datasets.
First, the problem is you replace missing data by "zero"s allele frequencies (missing=0), which means that for some loci and some individuals, you have no allele, but still not treated as missing data. This screws the definition of homozygotes and thus the computations of inbreeding.
Second, for large datasets, the sum of log-likelihoods is so low that reverting them back to likelihoods gives 0. You end up with a density distribution that is approximately zero everywhere. You may still be able to visualize it as a density, but when deriving samples, this results in probabilities of zero and 'sample' complains about it. Note that this is not a theoretical issue, only a numerical precision problem.
I have just committed a patch so that now, a meaningful warning will be issued.
In principle, one could just add a constant to the sum of log-likelihood values as a workaround. But which value to add is not a trivial choice, and may vary from one individual to another. I'll pass on that for now.
As a workaround for your problem:
- don't use "missing=0"
- ask for the distributions, not the samples: "res.type="function""
- alternatively, use a smaller subset of loci
- use the new patch (attached) to replace the error with a warning - you'll get flat (uniform) distributions for the problematic individuals.
Cheers
Thibaut
________________________________________
From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Tohamy Yousef [tohamyy at yahoo.com]
Sent: 18 December 2013 08:44
To: adegenet-forum at lists.r-forge.r-project.org
Subject: [adegenet-forum] Inbreeding coefficient calculation by adegenet
Dear Dr. Jombart,
I am trying to calculate the inbreeding coefficient for two populations (US=81 and IP=73) with 43231 SNPs. I am using your your manual, an introduction to adgenet 1.4-0, but I have a problem in the calculation. It gives me an error message:
Error in sample.int(length(x), size, replace, prob) :
NA in probability vector
I do not know why? I did it as following:
> toto<-read.structure(file="finalFiltered_noLowCov_e0_LOCUS_POP.stru",n.ind=154,n.loc=43231,onerowperind=FALSE,col.lab=1,row.marknames=1,NA.char="-9",missing=0)
Which column contains the population factor ('0' if absent)? 2
Which other optional columns should be read (press 'return' when done)? 1:
Converting data from a STRUCTURE .stru file to a genind object...
> is.genind(toto)
[1] TRUE
> toto$pop.names
P1 P2
"US" "IP"
> sa1<- seppop(toto)$US
> sa1
#####################
### Genind object ###
#####################
- genotypes of individuals -
S4 class: genind
@call: .local(x = x, i = i, j = j, treatOther = ..1, quiet = ..2, drop = drop)
@tab: 81 x 61169 matrix of genotypes
@ind.names: vector of 81 individual names
@loc.names: vector of 43231 locus names
@loc.nall: number of alleles per locus
@loc.fac: locus factor for the 61169 columns of @tab
@all.names: list of 43231 components yielding allele names for each locus
@ploidy: 2
@type: codom
Optionnal contents:
@pop: factor giving the population of each individual
@pop.names: factor giving the population of each individual
@other: a list containing: elements without names
> temp1<- inbreeding(sa1, N=81)
Error in sample.int(length(x), size, replace, prob) :
NA in probability vector
> sa2<- seppop(toto)$IP
> sa2
#####################
### Genind object ###
#####################
- genotypes of individuals -
S4 class: genind
@call: .local(x = x, i = i, j = j, treatOther = ..1, quiet = ..2, drop = drop)
@tab: 73 x 61169 matrix of genotypes
@ind.names: vector of 73 individual names
@loc.names: vector of 43231 locus names
@loc.nall: number of alleles per locus
@loc.fac: locus factor for the 61169 columns of @tab
@all.names: list of 43231 components yielding allele names for each locus
@ploidy: 2
@type: codom
Optionnal contents:
@pop: factor giving the population of each individual
@pop.names: factor giving the population of each individual
@other: a list containing: elements without names
> temp2<- inbreeding(sa2, N=73)
Error in sample.int(length(x), size, replace, prob) :
NA in probability vector
Could you please tell me how can I fix this problem?
I attached you a part of my data. and I hope that help me.
Thank you advance
Best regards,
Tohamy
-------------------------------------------
Eltohamy Yousef M.Sc.
Crop Biodiversity and Breeding Informatics
Institute of Plant Breeding, Seed Science and Population Genetics (350)
University of Hohenheim
Fruwirtstrasse 21, 70599 Stuttgart
Office phone:0049711 459-24437
Email: tohamyy at yahoo.com
E.yousef at uni-hohenheim.de
tohamy_yousef at agr.suez.edu.eg
Web.www.evoplant.uni-hohenheim.de
-------------- next part --------------
A non-text attachment was scrubbed...
Name: inbreeding.R
Type: application/octet-stream
Size: 5793 bytes
Desc: inbreeding.R
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20131219/9d2a06c6/attachment.obj>
More information about the adegenet-forum
mailing list