[adegenet-forum] Monmonier algorithm and individual scores

Jombart, Thibaut t.jombart at imperial.ac.uk
Tue Jun 3 11:26:20 CEST 2014


Hi there, 

I would not recommend using all three phylogenetic reconstruction methods, even if with 19 SNPs there shouldn't be major differences. I covered the maximum parsimony for historical reasons, but I can't see it being useful here. 

Other clustering approaches sounds like a good idea. If you ever fancy documenting how to use them on genetic data in a small tutorial, I think that would be a very handy to others ;) 

As for your last question, it makes a lot of sense, but you will need external information for this. Eigenvalue selection procedures based on inertia will basically fail to detect the structures you talk about. So you will need to be able to test e.g. the correlation of your PCs to a set of traits, or their spatial distribution, etc.

Cheers
Thibaut


________________________________________
From: Manuela [manuelacorreia2 at gmail.com]
Sent: 03 June 2014 10:01
To: Jombart, Thibaut
Cc: adegenet-forum at lists.r-forge.r-project.org
Subject: Re: [adegenet-forum] Monmonier algorithm and individual scores

Doctor Thibault and dear colleagues,

I would like to thank you for the valuable criticism you made in this output. The idea behind the IS was, solely, to have a first draft of the georeferenced clusters because in spatial clusters  I'm well-aware that several different genoypes at the same coordinates in species with a very low mobility or with no mobility could be a strong indication that the genetic variability is only due to environment while a great genetic diversity nearby may result from a short dispersal highly spatial correlated. To need of further confirmation by sPCA and/or clustering techniques.

The identification of spatial clusters in PCA, particularly by sPCA is no doubt more realiable than with Monmonier algoritm in this case. But I'd rather try to study more deeply each one of the 3 different methods (distance based-methods, Parsymony and maximum Likelihood) proposed in your tutorial "Trees" just to check it in first place if they might be appropriate to this dataset, Secondly, if they would gave different information perhaps with higher resolution when compared to classic NJ Tree, after validation by bootstrap.  Eventually, if none is appropriate I always be able to rely on several clustering techniques more adequate for qualitative data, available at the "Cluster" package and to perform the validation by "cl Valid" following several criteria.

>From a very simplistic point of view, PCA analysis (not scaled) might provides us with information of the genetic variability whereas sPCA about the significance of local and global structures. But, on the whole, the information provided by these two analysis: Moran's Index , variance and allele loadings, enable us to discriminate the loci more informative on genetic variability but not spatially structured from those whose variability its spatial structured. To be further confirmed through biplots.

Another challenge ahead. To figure out the way to select the PC's having biological meaning and most probably not associated to the highest eigenvalues. Particularly, in the absence of traits or phenotype information.

Please, feel free to make more comments or to give another suggestion(s).

Cheers,
Manuela


2014-06-02 17:20 GMT+01:00 Jombart, Thibaut <t.jombart at imperial.ac.uk<mailto:t.jombart at imperial.ac.uk>>:
Hi Manuela,

thanks for re-posting on the forum. In this case, it seems that locations are very aggregated - a lot of genotypes were sampled roughly at the same place. Monmonier is unlikely to do well under such circumstances. The algorithm is very sensitive to local differences, and these are unstable for this kind of spatial distribution. I would recommend other approaches. For instance, if you want to define spatial clusters, you could use a basic clustering algorithm based on the principal components of a PCA (if spatial structure is obvious) or sPCA (if not, but there is still a spatial structure). Assuming 'foo' is your analysis (PCA or sPCA), one example would be using something along the lines of:

h1 <- hclust(dist(foo$li)^2)
plot(h1)
cutree(h1)

Etc.
Check ?hclust for different clustering methods.

Cheers
Thibaut


________________________________________
From: adegenet-forum-bounces at lists.r-forge.r-project.org<mailto:adegenet-forum-bounces at lists.r-forge.r-project.org> [adegenet-forum-bounces at lists.r-forge.r-project.org<mailto:adegenet-forum-bounces at lists.r-forge.r-project.org>] on behalf of Manuela [manuelacorreia2 at gmail.com<mailto:manuelacorreia2 at gmail.com>]
Sent: 31 May 2014 21:46
To: adegenet-forum at lists.r-forge.r-project.org<mailto:adegenet-forum at lists.r-forge.r-project.org>
Subject: [adegenet-forum] Monmonier algorithm and individual scores

Dear colleagues of Adegenet forum,

First of all I must congratulate Doctor Thimbault for the wonderful work he has been so far developed. And following his own suggestion I'm sharing with you a specific issue raised by the output generated by Monmonier algorithm used for boundary detection.
I have a sample made of 170 individuals, collected on 9 different places and genotyped for 19 SNPs by Realtime PCR.
Before I run this line on the R script I had to explain to you about each one of them:
mon1<- monmonier(xy ,D, gab)

xy – spatial coordinates UTM/Km) ;
D – pairwise allele sharing distance (“Prabclus” package);
gab <-chooseCN(xy,ask=FALSE,type=1)  (Delaunay Triangulation)

plot(mon1,1:170,method=”greylevel”,add.arr=FALSE,bwd=6,col=”red”)
>From the output produced, it can be clearly seen that there are 4 clusters of individuals having four scores (50,100,150,200). But, I can't find a way to have access to individual scores. As matter in fact, I consulted in detail all the arguments provided on Plot function but none of them seemed to me to be on the way I could extract the individuals scores (IS).
I’m wondering if you could give me a hint about it. Any help will be appreciated.
Kind regards,
Manuela (Biochemist)



More information about the adegenet-forum mailing list