[adegenet-forum] PCA query?
AVIK RAY
avik.ray.kol at gmail.com
Mon Aug 29 09:06:27 CEST 2011
Dear Thibaut and all
resuming the earlier discussion (mails below for reference) :
I want to narrow it down a little bit; what could be the causal factor/s
for this pattern ..as you already mentioned that this is mostly visible
in IBD (see your mail below), where it fails to find any clusters or
would it be possible for high gene flow among populations, so all of
them are quite mixed up and showing up no signature of clusters; since
both scenarios are true at least to some extent with my data;
so to summarize, what would I consider?
thanks in advance
cheers
AVIK
On 7/5/2011 2:37 PM, Jombart, Thibaut wrote:
> Hello,
>
> actually I doubt there is ever a true K in real biological data, if
> only for the fact that there is no clear definition of 'genetic
> clusters'. What we consider as "clusters" are models of reality, and
> so false by definition.
>
> Anyway. In your case I would stick to BIC-based choice of K. The
> reason for this is that DAPC scatterplots show you only a few
> dimensions, while k-means+BIC takes much more (if not all, depending
> on how many PCs retained) of the genetic information into account.
>
> Cheers
>
> Thibaut
> ------------------------------------------------------------------------
> *From:* AVIK RAY [avik.ray.kol at gmail.com]
> *Sent:* 05 July 2011 07:33
> *To:* Jombart, Thibaut
> *Subject:* Re: [adegenet-forum] PCA query?
>
> Dear Thibaut
> It is quite unlikely that there is no true K !
> if so, then how can I account for the quite divergent clusters
> obtained in DAPC analysis, refer to the images attached; say in the
> image _DAPC clust 6_ - clusters 2, cluster 3 and cluster 4,5,1 are
> quite divergent genetic groups it seems, even 6 is well separated
> from 2 and 3; similarly in the image _DAPC cluster 8_- clusters 3,4,7
> and 3,8 and 2,6 are widely divergent (however, if you compare both
> these it appears both very similar except some clusters are breaking
> into sub clusters which is quite reasonable)
> I think it (in my case) may be wise to optimize number of clusters by
> looking at BIC curve as well as cluster diagram considering highly
> divergent clusters
> what do you think?
>
> cheers
>
> AVIK
>
>
> On 6/22/2011 2:49 PM, Jombart, Thibaut wrote:
>> Dear Avik,
>>
>> the BIC plot you sent resembles what we usually get under IBD models. In this case, it is not surprising that STRUCTURE identifies less clusters than DAPC (see the paper, STRUCTURE basically failed to identify clusters under the IBD model).
>>
>> There is probably no "true k", but just a choice of a number of groups useful to summarize the data. You may want to have a look at the section "how many clusters..." in the DAPC vignette, online in "Documents" on the website.
>>
>> Cheers
>>
>> Thibaut
>>
>> ________________________________________
>> From: AVIK RAY [avik.ray.kol at gmail.com]
>> Sent: 21 June 2011 19:08
>> To: Jombart, Thibaut;adegenet-forum at r-forge.wu-wien.ac.at
>> Subject: Re: [adegenet-forum] PCA query?
>>
>> Dear Thibaut
>> Thanks for very effective reply; it seems DAPC is more suitable for my
>> dataset and for the question I'm looking at!
>> I did few mock runs to see the very initial results, and the BIC curve
>> shows gradual leveling off after K=9 it seems, however from STRUCTURE
>> (Bayesian) and FLOCK (Max Likelihood) number of putative clusters
>> appears to be 2/3; so wondering what made this difference? or I am
>> wrongly interpreting it ! ....anyways my dataset contains lot of missing
>> data, does that matter much, shall I remove those and then try!
>> I am attaching BIC and retained PC curves for reference
>> Thanks
>> cheers
>>
>> AVIK
>>
>>
>> On 6/20/2011 6:58 PM, Jombart, Thibaut wrote:
>>> Hello,
>>>
>>> in none, as far as PCoA / MDS are concerned, they do the same as PCA, but just allow for using fancier Euclidean distances. Loosing information in terms of total variance does not necessarily imply loosing information in terms of group discrimination. But if you're looking for clusters, you don't necessarily need to reduce the dimensionality of the data - most clustering algorithm don't.
>>>
>>> Please have a look at the DAPC paper which is really on these topics. You may also be interested in the DAPC vignette for the next release of adegenet.
>>> DAPC paper is here:
>>> http://www.biomedcentral.com/1471-2156/11/94
>>>
>>> DAPC vignette is there:
>>> http://adegenet.r-forge.r-project.org/files/adegenet-dapc.pdf
>>>
>>> Cheers
>>>
>>> Thibaut
>>>
>>> ________________________________________
>>> From:adegenet-forum-bounces at r-forge.wu-wien.ac.at [adegenet-forum-bounces at r-forge.wu-wien.ac.at] on behalf of AVIK RAY [avik.ray.kol at gmail.com]
>>> Sent: 20 June 2011 13:12
>>> To:adegenet-forum at r-forge.wu-wien.ac.at
>>> Subject: [adegenet-forum] PCA query?
>>>
>>> Hi all
>>> bit of confusion with PCA in general, I did PCA in adegenet and it has
>>> shown some plot with multiple clusters. My data is tetraploid
>>> microsatellite data and I need to find out potential clusters i.e. some
>>> individuals are more similar than others with allele data. But If not
>>> mistaken PCA converts allele information into some synthetic variable
>>> and does clustering where we tend to loose out lot of information since
>>> it will select most but not all alleles; so in that sense does PCoA/
>>> Multidimentional scaling or simply clustering analysis (e.g. K means or
>>> hierarchical clustering) make more sense?
>>> Thanks in advance for reply
>>>
>>> AVIK
>>>
>>> _______________________________________________
>>> adegenet-forum mailing list
>>> adegenet-forum at lists.r-forge.r-project.org
>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum
> --
> AVIK RAY
> Visiting Fellow
> National Center for Biological Sciences
> Tata Institute of Fundamental Research
> GKVK Campus
> Bellary Road
> Bangalore-560065
> India
> Ph 91-80-23666340
> Fax 91-80-2363 6662
--
AVIK RAY
Visiting Fellow
National Center for Biological Sciences
Tata Institute of Fundamental Research
GKVK Campus
Bellary Road
Bangalore-560065
India
Ph 91-80-23666340
Fax 91-80-2363 6662
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20110829/55d09064/attachment.htm>
More information about the adegenet-forum
mailing list