[adegenet-forum] Request an example of genetic distance among two individuals
Jombart, Thibaut
t.jombart at imperial.ac.uk
Sun Nov 17 19:52:59 CET 2013
Hello,
just to clarify, 'nj' from APE is agnostic with respect to the distance used.
Here in your code you are using 'dist', thus the Euclidean distance between SNP profiles.
Cheers
Thibaut
________________________________________
From: Fernando Cruz [fernando.cruz at ebd.csic.es]
Sent: 17 November 2013 16:03
To: Jombart, Thibaut; adegenet-forum at lists.r-forge.r-project.org
Subject: Re: [adegenet-forum] Request an example of genetic distance among two individuals
Hi Tibaut,
The nj tree of APE. What I basically did was:
mygenlight <- read.snp("/Users/Nando/Documents/mydata.snp", chunk=2)
x<- seploc(k31_13c_lp23,n.block=100) # ~10000 SNPs each
library(ape)
lD<-lapply(x, function(e) dist(as.matrix(e))) # dist is used within a
lapply loop to compute pairwise distances between individuals for each block
class(lD[[1]])
#The general distance matrix is obtained by summing these:
D <- Reduce("+", lD)
plot (nj(D), type="fan")
Cheers,
Fernando
On 11/17/13 4:45 PM, Jombart, Thibaut wrote:
> Hi there,
>
> I'm not sure which tree you are referring to.
>
> Cheers
> Thibaut
> ________________________________________
> From: Fernando Cruz [fernando.cruz at ebd.csic.es]
> Sent: 17 November 2013 15:41
> To: Jombart, Thibaut; adegenet-forum at lists.r-forge.r-project.org
> Subject: Re: [adegenet-forum] Request an example of genetic distance among two individuals
>
> Thanks Tibaut,
>
> This clarifies. In both the euclidean and the Hamming distances, the
> distance between a pair of individuals depends on the number of
> "unshared alleles".
> By the way, then the standardized distance is plot in the NJ Tree
> instead of using the Saitou & Nei (1987) used by APE library, right?
>
> Cheers,
> Fernando
>
> On 11/17/13 4:23 PM, Jombart, Thibaut wrote:
>> Just realized a typo:
>>
>> sqrt(\sum_i (x_i - y_i)^2
>>
>> should read
>>
>> sqrt{ \sum_i (x_i - y_i)^2 }
>>
>> Cheers
>> Thibaut
>> ________________________________________
>> From:adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Jombart, Thibaut [t.jombart at imperial.ac.uk]
>> Sent: 17 November 2013 15:07
>> To: Fernando Cruz;adegenet-forum at lists.r-forge.r-project.org
>> Subject: Re: [adegenet-forum] Request an example of genetic distance among two individuals
>>
>> Hello there,
>>
>> there are many different distances that can be computed between allelic profiles, but at an individual levels there is somewhat less options.
>>
>> One is the Hamming distance, which you mention here (D=6), and which you can deduce from 'propShared'.
>>
>> The usual Euclidean distance is different though. Between two vectors of allelic profiles x=[x_i] and y=[y_i], the Euclidean distance is given by (using latex notations):
>>
>> D(x,y) = || x - y || = sqrt{ (x-y)^T (x-y)} = sqrt(\sum_i (x_i - y_i)^2
>>
>> Using your example:
>>> x <- c(0,0,1,2,2)
>>> y <- c(0,2,2,1,0)
>>> sqrt(sum((x-y)^2))
>> [1] 3.162278
>>> dist(rbind.data.frame(x,y))
>> 1
>> 2 3.162278
>>
>>
>> Note that in adegenet, data in genind objects are standardized to relative frequencies, so that the distance would be different:
>>> x.rel <- x/2
>>> y.rel <- y/2
>>> dist(rbind.data.frame(x.rel,y.rel))
>> 1
>> 2 1.581139
>>
>> That is, the distance between the raw allele count profiles divided by the ploidy.
>>
>> As a last note, there is a particular case for haploid data, where the Hamming distance equals the squared Euclidean distance (it follows that a PCA on the covariance matrix is also the best reduced-space representation of Hamming distances).
>>
>> Cheers
>>
>> Thibaut
>>
>>
>> --
>> ######################################
>> Dr Thibaut JOMBART
>> MRC Centre for Outbreak Analysis and Modelling
>> Department of Infectious Disease Epidemiology
>> Imperial College - School of Public Health
>> St Mary’s Campus
>> Norfolk Place
>> London W2 1PG
>> United Kingdom
>> Tel. : 0044 (0)20 7594 3658
>> t.jombart at imperial.ac.uk
>> http://sites.google.com/site/thibautjombart/
>> http://adegenet.r-forge.r-project.org/
>> ________________________________________
>> From:adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Fernando Cruz [fernando.cruz at ebd.csic.es]
>> Sent: 15 November 2013 18:53
>> To:adegenet-forum at lists.r-forge.r-project.org
>> Subject: [adegenet-forum] Request an example of genetic distance among two individuals
>>
>> Hi Thibaut,
>>
>> I performed a NJ Tree using 1M SNPs with 10 samples, following the
>> instructions in the documentation. However I would like to know exactly
>> the genetic distance among individuals is calculated. Is it based on the
>> number of shared alleles?
>>
>> Could you provide a simple example? Like for this two individuals using
>> 5 SNPs:
>> Ind1 00122
>> Ind2 02210
>>
>> Using the binary information, they share 2+0+1+1+0= 4 alleles out of 10
>>
>> Thanks in advance,
>> Fernando Cruz
>>
>>
>> --
>> ****************************************
>> Dr. Fernando Cruz
>> Estación Biológica de Doñana (EBD-CSIC)
>> Avd. Americo Vespucio s/n
>> 41092-Seville (Spain)
>> Tel. +34 954466700/Ext. 1079
>> Fax: +34 95 4621125
>> Room: 0/12
>>
>> e-mail:fernando.cruz at ebd.csic.es
>> Website:http://openwetware.org/wiki/User:Fernando_Cruz
>> Web EcoGenes EU-FP7:http://www.ebd.csic.es/ecogenes/news.html
>> ****************************************
>>
>> _______________________________________________
>> adegenet-forum mailing list
>> adegenet-forum at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum
>> _______________________________________________
>> adegenet-forum mailing list
>> adegenet-forum at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum
>
> --
> ****************************************
> Dr. Fernando Cruz
> Estación Biológica de Doñana (EBD-CSIC)
> Avd. Americo Vespucio s/n
> 41092-Seville (Spain)
> Tel. +34 954466700/Ext. 1079
> Fax: +34 95 4621125
> Room: 0/12
>
> e-mail:fernando.cruz at ebd.csic.es
> Website:http://openwetware.org/wiki/User:Fernando_Cruz
> Web EcoGenes EU-FP7:http://www.ebd.csic.es/ecogenes/news.html
> ****************************************
>
--
****************************************
Dr. Fernando Cruz
Estación Biológica de Doñana (EBD-CSIC)
Avd. Americo Vespucio s/n
41092-Seville (Spain)
Tel. +34 954466700/Ext. 1079
Fax: +34 95 4621125
Room: 0/12
e-mail: fernando.cruz at ebd.csic.es
Website: http://openwetware.org/wiki/User:Fernando_Cruz
Web EcoGenes EU-FP7: http://www.ebd.csic.es/ecogenes/news.html
****************************************
More information about the adegenet-forum
mailing list