[adegenet-forum] Inbreeding coefficient calculation by adegenet
Tohamy Yousef
tohamyy at yahoo.com
Fri Mar 21 14:50:28 CET 2014
Dear Dr. JombartI wrote you before and asked about inbreeding calculation. You suggested that I use less number of markers and other advices. SO, I tried to follow all your comments. For example I decided to use only
2247 SNP markers without missing data (before I had 43231 with missing
data). But I still have the same problem. I attached you the new data (new_inbreeding.stru). Thanks in advance.
######### my script
> library(adegenet)
>
toto<-read.structure(file="new_inbreeding.stru",n.ind=154,n.loc=2247,onerowperind=FALSE,col.lab=1,row.marknames=1,NA.char="-9",missing="mean")
Which column contains the population factor ('0' if absent)? 2
Which other optional columns should be read (press 'return' when done)? 1:
> toto
#####################
### Genind object ###
#####################
- genotypes of individuals -
S4 class: genind
@call: read.structure(file = "inbreeding.stru",
n.ind = 154, n.loc = 2247, onerowperind = FALSE, col.lab = 1,
row.marknames = 1, NA.char = "-9", missing =
"mean")
@tab: 154 x 3610 matrix of genotypes
@ind.names: vector of 154 individual names
@loc.names: vector of 2247 locus names
@loc.nall: number of alleles per locus
@loc.fac: locus factor for the 3610 columns of @tab
@all.names: list of 2247 components yielding allele names for each locus
@ploidy: 2
@type: codom
Optionnal contents:
@pop: factor giving the population of each individual
@pop.names: factor giving the population of each individual
@other: - empty -
> toto$pop.names
P1 P2
"US" "IP"
>U<- seppop(toto)$US
> U
#####################
### Genind object ###
#####################
- genotypes of individuals -
S4 class: genind
@call: .local(x = x, i = i, j = j, treatOther = ..1, quiet = ..2, drop = drop)
@tab: 81 x
3610 matrix of genotypes
@ind.names: vector of 81 individual names
@loc.names: vector of 2247 locus names
@loc.nall: number of alleles per locus
@loc.fac: locus factor for the 3610 columns of @tab
@all.names: list of 2247 components yielding allele names for each locus
@ploidy: 2
@type: codom
Optionnal contents:
@pop: factor giving the population of each individual
@pop.names: factor giving the population of each individual
@other: a list containing: elements without names
> U1<- inbreeding(U, N=100)
Error in sample.int(length(x), size, replace, prob) :
NA in probability vector
On Thursday, December 19, 2013 6:32 AM, "Jombart, Thibaut" <t.jombart at imperial.ac.uk> wrote:
Hello,
this was a tricky one. There is a problem in your command line, and then there is a problem inherent to likelihood and large datasets.
First, the problem is you replace missing data by "zero"s allele frequencies (missing=0), which means that for some loci and some individuals, you have no allele, but still not treated as missing data. This screws the definition of homozygotes and thus the computations of inbreeding.
Second, for large datasets, the sum of log-likelihoods is so low that reverting them back to likelihoods gives 0. You end up with a density distribution that is approximately zero everywhere. You may still be able to visualize it as a density, but when deriving samples, this results in probabilities of zero and 'sample' complains about it. Note that this is not a theoretical issue, only a numerical precision problem.
I have just committed a patch so that now, a meaningful warning will be issued.
In principle, one could just add a constant to the sum of log-likelihood values as a workaround. But which value to add is not a trivial choice, and may vary from one individual to another. I'll pass on that for now.
As a workaround for your problem:
- don't use "missing=0"
- ask for the distributions, not the samples: "res.type="function""
- alternatively, use a smaller subset of loci
- use the new patch (attached) to replace the error with a warning - you'll get flat (uniform) distributions for the problematic individuals.
Cheers
Thibaut
________________________________________
From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Tohamy Yousef [tohamyy at yahoo.com]
Sent: 18 December 2013 08:44
To: adegenet-forum at lists.r-forge.r-project.org
Subject: [adegenet-forum] Inbreeding coefficient calculation by adegenet
Dear Dr. Jombart,
I am trying to calculate the inbreeding coefficient for two populations (US=81 and IP=73) with 43231 SNPs. I am using your your manual, an introduction to adgenet 1.4-0, but I have a problem in the calculation. It gives me an error message:
Error in sample.int(length(x), size, replace, prob) :
NA in probability vector
I do not know why? I did it as following:
> toto<-read.structure(file="finalFiltered_noLowCov_e0_LOCUS_POP.stru",n.ind=154,n.loc=43231,onerowperind=FALSE,col.lab=1,row.marknames=1,NA.char="-9",missing=0)
Which column contains the population factor ('0' if absent)? 2
Which other optional columns should be read (press 'return' when done)? 1:
Converting data from a STRUCTURE .stru file to a genind object...
> is.genind(toto)
[1] TRUE
> toto$pop.names
P1 P2
"US" "IP"
> sa1<- seppop(toto)$US
> sa1
#####################
### Genind object ###
#####################
- genotypes of individuals -
S4 class: genind
@call: .local(x = x, i = i, j = j, treatOther = ..1, quiet = ..2, drop = drop)
@tab: 81 x 61169 matrix of genotypes
@ind.names: vector of 81 individual names
@loc.names: vector of 43231 locus names
@loc.nall: number of alleles per locus
@loc.fac: locus factor for the 61169 columns of @tab
@all.names: list of 43231 components yielding allele names for each locus
@ploidy: 2
@type: codom
Optionnal contents:
@pop: factor giving the population of each individual
@pop.names: factor giving the population of each individual
@other: a list containing: elements without names
> temp1<- inbreeding(sa1, N=81)
Error in sample.int(length(x), size, replace, prob) :
NA in probability vector
> sa2<- seppop(toto)$IP
> sa2
#####################
### Genind object ###
#####################
- genotypes of individuals -
S4 class: genind
@call: .local(x = x, i = i, j = j, treatOther = ..1, quiet = ..2, drop = drop)
@tab: 73 x 61169 matrix of genotypes
@ind.names: vector of 73 individual names
@loc.names: vector of 43231 locus names
@loc.nall: number of alleles per locus
@loc.fac: locus factor for the 61169 columns of @tab
@all.names: list of 43231 components yielding allele names for each locus
@ploidy: 2
@type: codom
Optionnal contents:
@pop: factor giving the population of each individual
@pop.names: factor giving the population of each individual
@other: a list containing: elements without names
> temp2<- inbreeding(sa2, N=73)
Error in sample.int(length(x), size, replace, prob) :
NA in probability vector
Could you please tell me how can I fix this problem?
I attached you a part of my data. and I hope that help me.
Thank you advance
Best regards,
Tohamy
-------------------------------------------
Eltohamy Yousef M.Sc.
Crop Biodiversity and Breeding Informatics
Institute of Plant Breeding, Seed Science and Population Genetics (350)
University of Hohenheim
Fruwirtstrasse 21, 70599 Stuttgart
Office phone:0049711 459-24437
Email: tohamyy at yahoo.com
E.yousef at uni-hohenheim.de
tohamy_yousef at agr.suez.edu.eg
Web.www.evoplant.uni-hohenheim.de
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20140321/d6e76746/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: new_inbreeding.stru
Type: application/octet-stream
Size: 1416445 bytes
Desc: not available
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20140321/d6e76746/attachment-0001.obj>
More information about the adegenet-forum
mailing list