[adegenet-forum] scatterplot and hybridization

Jombart, Thibaut t.jombart at imperial.ac.uk
Tue Sep 6 15:09:46 CEST 2011


Hello Avik, 

In the case you describe (two demes + hybrids), you would more probably expect K=2, not K=3.

The error in your code is that x1 and x2 are not genind objects, butvectors of integers returned by "which". A correct code would be:
###
x1 <- dat[grp==1]
x2 <- dat[grp==2]
###

where dat is your original genind object; alternatively, if grp has just groups 1 and 2:
temp <- seppop(dat, grp)
###
x1 <- temp[[1]]
x2<- temp[[2]]
###

About your last question: DAPC always has K-1 discriminant functions at most, K being the number of clusters. Therefore, for K=2, you'll end up with only on discriminant function, and no scatterplot as you'd need two axes to get one. Note that 'scatter' adapts automatically and uses a more appropriate representation when there is just one axis to be plotted.

Cheers

Thibaut.

________________________________________
From: AVIK RAY [avik.ray.kol at gmail.com]
Sent: 06 September 2011 13:15
To: Jombart, Thibaut; adegenet-forum at r-forge.wu-wien.ac.at
Subject: Re: [adegenet-forum] scatterplot and hybridization

Dear Thibaut
I am trying to simulate hybridization; I am considering the clusters K=2 and K=3 and I guess there are hybrid individuals in one of the clusters when K=3 (because other programs also given this notion) ; so say clust1, 2, 3 where 3 is mostly derived from  1 and 2 by means of hybridization (my guess). when I try to simulate taking 1 and 2to make hybrids and compare those with clust 3 ...
the code went wrong!....

> x1<-which (grp$grp==1)
> x2<-which (grp$grp==2)
> which (grp$grp==1)
....
>x1
.......
hyb <- hybridize(x1, x2, n=40)

Error in hybridize(x1, x2, n = 40) : x1 is not a valid genind object

actually my data is through read.table-> obj -> data.frame -> dat-> df2genind etc.
so i guess wildly it can not recognize x1 etc.

one more query, whenever in find.clusters to choose no. of clusters; when K=2; it is not showing any scatterplot rather showing up two discriminant functions; I did it for some time but all in vain, no clue whether doing any wrong step though following same steps as in K=3
hope the problems are understandable
sorry for providing all my mails but this may give you some lost reference!

Thanks in advance
cheers

AVIK



On 8/30/2011 3:58 PM, Jombart, Thibaut wrote:

Hello,

I am afraid the interpretation of the pattern you observe will be up to you, since I don't know a thing about the organism under scrutiny.

As I said, the structure would be compatible with IBD - even with decent genetic structure, STRUCTURE would probably largely underestimate the most likely number of demes. This is the first explanation, but I guess other scenarios would be possible - different processes can lead to the same pattern.

Cheers

Thibaut.

--
######################################
Dr Thibaut JOMBART
MRC Centre for Outbreak Analysis and Modelling
Department of Infectious Disease Epidemiology
Imperial College - Faculty of Medicine
St Mary’s Campus
Norfolk Place
London W2 1PG
United Kingdom
Tel. : 0044 (0)20 7594 3658
t.jombart at imperial.ac.uk<mailto:t.jombart at imperial.ac.uk>
http://sites.google.com/site/thibautjombart/
http://adegenet.r-forge.r-project.org/

________________________________________
From: AVIK RAY [avik.ray.kol at gmail.com<mailto:avik.ray.kol at gmail.com>]
Sent: 29 August 2011 08:06
To: Jombart, Thibaut; adegenet-forum at r-forge.wu-wien.ac.at<mailto:adegenet-forum at r-forge.wu-wien.ac.at>
Subject: Re: [adegenet-forum] PCA query?

Dear Thibaut and all
resuming the earlier discussion (mails below for reference) :
I want to narrow it down a little bit; what could be the causal factor/s for this pattern ..as you already mentioned that this is mostly visible in IBD (see your mail below), where it fails to find any clusters or would it be possible for high gene flow among populations, so all of them are quite mixed up and showing up no signature of clusters; since both scenarios are true at least to some extent with my data;
so to summarize, what would I consider?
thanks in advance
cheers
AVIK



On 7/5/2011 2:37 PM, Jombart, Thibaut wrote:
Hello,

actually I doubt there is ever a true K in real biological data, if only for the fact that there is no clear definition of 'genetic clusters'. What we consider as "clusters" are models of reality, and so false by definition.

Anyway. In your case I would stick to BIC-based choice of K. The reason for this is that DAPC scatterplots show you only a few dimensions, while k-means+BIC takes much more (if not all, depending on how many PCs retained) of the genetic information into account.

Cheers

Thibaut
________________________________
From: AVIK RAY [avik.ray.kol at gmail.com<mailto:avik.ray.kol at gmail.com><mailto:avik.ray.kol at gmail.com><mailto:avik.ray.kol at gmail.com>]
Sent: 05 July 2011 07:33
To: Jombart, Thibaut
Subject: Re: [adegenet-forum] PCA query?

Dear Thibaut
It is quite unlikely that there is no true K !
if so, then how can I account for the quite divergent clusters obtained in DAPC analysis, refer to the images attached; say in the image DAPC clust 6 - clusters 2, cluster 3 and cluster 4,5,1  are quite divergent genetic groups it seems, even 6 is well separated  from 2 and 3; similarly in the image DAPC cluster 8- clusters 3,4,7 and 3,8 and 2,6 are widely divergent (however, if you compare both these it appears both very similar except some clusters are breaking into sub clusters which is quite reasonable)
I think it (in my case) may be wise to optimize number of clusters by looking at BIC curve as well as cluster diagram considering highly divergent clusters
what do you think?

cheers

AVIK


On 6/22/2011 2:49 PM, Jombart, Thibaut wrote:

Dear Avik,

the BIC plot you sent resembles what we usually get under IBD models. In this case, it is not surprising that STRUCTURE identifies less clusters than DAPC (see the paper, STRUCTURE basically failed to identify clusters under the IBD model).

There is probably no "true k", but just a choice of a number of groups useful to summarize the data. You may want to have a look at the section "how many clusters..." in the DAPC vignette, online in "Documents" on the website.

Cheers

Thibaut

________________________________________
From: AVIK RAY [avik.ray.kol at gmail.com<mailto:avik.ray.kol at gmail.com><mailto:avik.ray.kol at gmail.com><mailto:avik.ray.kol at gmail.com>]
Sent: 21 June 2011 19:08
To: Jombart, Thibaut; adegenet-forum at r-forge.wu-wien.ac.at<mailto:adegenet-forum at r-forge.wu-wien.ac.at><mailto:adegenet-forum at r-forge.wu-wien.ac.at><mailto:adegenet-forum at r-forge.wu-wien.ac.at>
Subject: Re: [adegenet-forum] PCA query?

Dear Thibaut
Thanks for very effective reply; it seems DAPC is more suitable for my
dataset and for the question I'm looking at!
I did few mock runs to see the very initial results, and the BIC curve
shows gradual leveling off after K=9 it seems, however from STRUCTURE
(Bayesian) and FLOCK (Max Likelihood) number of putative clusters
appears to be 2/3; so wondering what made this difference? or I am
wrongly interpreting it ! ....anyways my dataset contains lot of missing
data, does that matter much, shall I remove those and then try!
I am attaching BIC and retained PC curves for reference
Thanks
cheers

AVIK


On 6/20/2011 6:58 PM, Jombart, Thibaut wrote:


Hello,

in none, as far as PCoA / MDS are concerned, they do the same as PCA, but just allow for using fancier Euclidean distances. Loosing information in terms of total variance does not necessarily imply loosing information in terms of group discrimination. But if you're looking for clusters, you don't necessarily need to reduce the dimensionality of the data - most clustering algorithm don't.

Please have a look at the DAPC paper which is really on these topics. You may also be interested in the DAPC vignette for the next release of adegenet.
DAPC paper is here:
http://www.biomedcentral.com/1471-2156/11/94

DAPC vignette is there:
http://adegenet.r-forge.r-project.org/files/adegenet-dapc.pdf

Cheers

Thibaut

________________________________________
From: adegenet-forum-bounces at r-forge.wu-wien.ac.at<mailto:adegenet-forum-bounces at r-forge.wu-wien.ac.at><mailto:adegenet-forum-bounces at r-forge.wu-wien.ac.at><mailto:adegenet-forum-bounces at r-forge.wu-wien.ac.at> [adegenet-forum-bounces at r-forge.wu-wien.ac.at<mailto:adegenet-forum-bounces at r-forge.wu-wien.ac.at><mailto:adegenet-forum-bounces at r-forge.wu-wien.ac.at><mailto:adegenet-forum-bounces at r-forge.wu-wien.ac.at>] on behalf of AVIK RAY [avik.ray.kol at gmail.com<mailto:avik.ray.kol at gmail.com><mailto:avik.ray.kol at gmail.com><mailto:avik.ray.kol at gmail.com>]
Sent: 20 June 2011 13:12
To: adegenet-forum at r-forge.wu-wien.ac.at<mailto:adegenet-forum at r-forge.wu-wien.ac.at><mailto:adegenet-forum at r-forge.wu-wien.ac.at><mailto:adegenet-forum at r-forge.wu-wien.ac.at>
Subject: [adegenet-forum] PCA query?

Hi all
bit of confusion with PCA in general, I did PCA in adegenet and it has
shown some plot with multiple clusters. My data is tetraploid
microsatellite data and I need to find out potential clusters i.e. some
individuals are more similar than others with allele data. But If not
mistaken PCA converts allele information into some synthetic variable
and does clustering where we tend to loose out lot of information since
it will select most but not all alleles; so in that sense does PCoA/
Multidimentional scaling or simply clustering analysis (e.g. K means or
hierarchical clustering) make more sense?
Thanks in advance for reply

AVIK

_______________________________________________
adegenet-forum mailing list
adegenet-forum at lists.r-forge.r-project.org<mailto:adegenet-forum at lists.r-forge.r-project.org><mailto:adegenet-forum at lists.r-forge.r-project.org><mailto:adegenet-forum at lists.r-forge.r-project.org>
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum


--

AVIK RAY
Visiting Fellow
National Center for Biological Sciences
Tata Institute of Fundamental Research
GKVK Campus
Bellary Road
Bangalore-560065
India
Ph 91-80-23666340
Fax 91-80-2363 6662




--
AVIK RAY
Visiting Fellow
National Center for Biological Sciences
Tata Institute of Fundamental Research
GKVK Campus
Bellary Road
Bangalore-560065
India
Ph 91-80-23666340
Fax 91-80-2363 6662






--
AVIK RAY
Visiting Fellow
National Center for Biological Sciences
Tata Institute of Fundamental Research
GKVK Campus
Bellary Road
Bangalore-560065
India
Ph 91-80-23666340
Fax 91-80-2363 6662




More information about the adegenet-forum mailing list