[adegenet-forum] A few basic questions
f.calboli at imperial.ac.uk
Thu Apr 4 11:36:18 CEST 2013
On 4 Apr 2013, at 10:27, "Jombart, Thibaut" <t.jombart at imperial.ac.uk> wrote:
> * message to the list * : I will offer a pint to the person who will implement this feature in adegenet; nothing complicated, but I just don't have time for this at the moment
I can come by your office tomorrow and see if it is something I could do in a nice and reasonably speedy way. Contact me off list. No beer required.
Federico C. F. Calboli
Neuroepidemiology and Ageing Research
Imperial College, St. Mary's Campus
Norfolk Place, London W2 1PG
Tel +44 (0)20 75941602 Fax +44 (0)20 75943193
f.calboli [.a.t] imperial.ac.uk
f.calboli [.a.t] gmail.com
> - DAPC is good at finding an optimal typology of groups; cluster assignment is merely a by-product, useful but limited. This is where model-based classifiers will be better. I recommend using BAPS, especially on microsat data since it should run quite fast.
> Dr Thibaut JOMBART
> MRC Centre for Outbreak Analysis and Modelling
> Department of Infectious Disease Epidemiology
> Imperial College - School of Public Health
> St Mary’s Campus
> Norfolk Place
> London W2 1PG
> United Kingdom
> Tel. : 0044 (0)20 7594 3658
> t.jombart at imperial.ac.uk
> From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Thomas Vignaud [thomfromsea at gmail.com]
> Sent: 02 April 2013 09:10
> To: adegenet-forum at lists.r-forge.r-project.org
> Subject: [adegenet-forum] A few basic questions
> Hi everyone,
> I find Adegenet -DAPC- to be very usefull -yet I don't fully understand all the subtilities.
> I'll here try to ask a few simple questions with associated screenshots. I'll mostly use examples to ask my questions as I believe it a very efficient way to do it.
> (I'm working with 17 microsats on animals)
> I'm sorry if all this sounds newbie - please feel free to redirect me to any .pdf I might have miss.
> I believe the two main questions I want to answer with DAPC are :
> 1 - How different my clusters are ? (I know this depend on a lot of things and that I can't compare with other species/genes)
> I feel like one way to do it is to check is a few components still finds a lot of structure.
> Another is, using alpha scores and the whole classic process, to visually see how assigned to their cluster the individuals are.
> 2 - Is there any sub-(genetic)clusters in my sample? for example, I have sampled 50 ids in the same location. But maybe there is two (sub) population here and I sample 40 of the first one and 10 of the other. I want to see that (i.e. compoplot), to go back to my data and to check if I can find patterns related with what the genetic tells me.
> Now here is my problem : depending what number of discriminant function I'm using, I get totally different results with the same sub-dataset.
> And, with the same number of discriminant function but with adding another population (very structured) to my first sub-dataset, then the first sub-dataset will be different again.
> ---> I'm a little lost in what to choose as a number of discriminant function (I understand the alpha-score, but sometimes it will tell me "21", when using only "5" will give me the same exact compoplot).
> It would not be such a problem if differences would be small, but here it is : often all my individuals are 100% in one color, but it's never the same pattern.
> One compoplot I'll have ids 1, 2, 5, 6 that are 100% red, and 3, 4, 7 that are 100% blue.
> Then I just redo the analysis changing the number of discriminant function and I get 1, 3, 7 100% red and 2, 4, 5, 6 100% blue.
> See attached screenshots A, B and C from the SAME dataset. (I'm trying to use small number of DF as I don't like my ids to be 100% in one color, I feel I miss some information)
> ---> the same thing happen if I add other populations. The whole pattern change again. See screenshot D
> So is there any guideline that would give me something a little less absolute that totally different results?
> If I want, for example, to note all my outliers (ids that does not belong the their original geographic cluster) and check for their caracteristic (size, sex etc...) how am I supposed to do that if outliers change depending on priors ? especially with more than 700 individuals and 16 geographic clusters.
> If I want to account for how much different 3 clusters are, and if using the opt alpha score gives me three 100% differenciated clusters, but using a lower one start to create a mix between two of the clusters : can I just decide to use a lot of different numbers of discriminant function to explore the dataset ? or is it "wrong" ?
> Additional information :
> my 'exploring' workflow looks like :
>> grp <- find.clusters(obj, max.n.clust = 35)
> x (50-150)
> x (depend what I want to see)
>> dapc1 <- dapc(obj, grp$grp)
> x (N/3 or 100 if N is large)
> x (either alpha score number or smaller because I have a strong structure)
>> compoplot(dapc1, grp$grp)
> Any imput or help more than welcome.
> adegenet-forum mailing list
> adegenet-forum at lists.r-forge.r-project.org
More information about the adegenet-forum