[adegenet-forum] A few basic questions
Federico Calboli
f.calboli at imperial.ac.uk
Thu Apr 4 11:36:18 CEST 2013
On 4 Apr 2013, at 10:27, "Jombart, Thibaut" <t.jombart at imperial.ac.uk> wrote:
> * message to the list * : I will offer a pint to the person who will implement this feature in adegenet; nothing complicated, but I just don't have time for this at the moment
I can come by your office tomorrow and see if it is something I could do in a nice and reasonably speedy way. Contact me off list. No beer required.
BW
F
--
Federico C. F. Calboli
Neuroepidemiology and Ageing Research
Imperial College, St. Mary's Campus
Norfolk Place, London W2 1PG
Tel +44 (0)20 75941602 Fax +44 (0)20 75943193
f.calboli [.a.t] imperial.ac.uk
f.calboli [.a.t] gmail.com
>
> - DAPC is good at finding an optimal typology of groups; cluster assignment is merely a by-product, useful but limited. This is where model-based classifiers will be better. I recommend using BAPS, especially on microsat data since it should run quite fast.
>
> Cheers
> Thibaut
>
> --
> ######################################
> Dr Thibaut JOMBART
> MRC Centre for Outbreak Analysis and Modelling
> Department of Infectious Disease Epidemiology
> Imperial College - School of Public Health
> St Mary’s Campus
> Norfolk Place
> London W2 1PG
> United Kingdom
> Tel. : 0044 (0)20 7594 3658
> t.jombart at imperial.ac.uk
> http://sites.google.com/site/thibautjombart/
> http://adegenet.r-forge.r-project.org/
> ________________________________________
> From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Thomas Vignaud [thomfromsea at gmail.com]
> Sent: 02 April 2013 09:10
> To: adegenet-forum at lists.r-forge.r-project.org
> Subject: [adegenet-forum] A few basic questions
>
> Hi everyone,
>
> I find Adegenet -DAPC- to be very usefull -yet I don't fully understand all the subtilities.
>
> I'll here try to ask a few simple questions with associated screenshots. I'll mostly use examples to ask my questions as I believe it a very efficient way to do it.
> (I'm working with 17 microsats on animals)
>
> I'm sorry if all this sounds newbie - please feel free to redirect me to any .pdf I might have miss.
>
> I believe the two main questions I want to answer with DAPC are :
>
> 1 - How different my clusters are ? (I know this depend on a lot of things and that I can't compare with other species/genes)
> I feel like one way to do it is to check is a few components still finds a lot of structure.
> Another is, using alpha scores and the whole classic process, to visually see how assigned to their cluster the individuals are.
>
> 2 - Is there any sub-(genetic)clusters in my sample? for example, I have sampled 50 ids in the same location. But maybe there is two (sub) population here and I sample 40 of the first one and 10 of the other. I want to see that (i.e. compoplot), to go back to my data and to check if I can find patterns related with what the genetic tells me.
>
> Now here is my problem : depending what number of discriminant function I'm using, I get totally different results with the same sub-dataset.
> And, with the same number of discriminant function but with adding another population (very structured) to my first sub-dataset, then the first sub-dataset will be different again.
>
> ---> I'm a little lost in what to choose as a number of discriminant function (I understand the alpha-score, but sometimes it will tell me "21", when using only "5" will give me the same exact compoplot).
> It would not be such a problem if differences would be small, but here it is : often all my individuals are 100% in one color, but it's never the same pattern.
> One compoplot I'll have ids 1, 2, 5, 6 that are 100% red, and 3, 4, 7 that are 100% blue.
> Then I just redo the analysis changing the number of discriminant function and I get 1, 3, 7 100% red and 2, 4, 5, 6 100% blue.
> See attached screenshots A, B and C from the SAME dataset. (I'm trying to use small number of DF as I don't like my ids to be 100% in one color, I feel I miss some information)
>
> ---> the same thing happen if I add other populations. The whole pattern change again. See screenshot D
>
>
> So is there any guideline that would give me something a little less absolute that totally different results?
>
> If I want, for example, to note all my outliers (ids that does not belong the their original geographic cluster) and check for their caracteristic (size, sex etc...) how am I supposed to do that if outliers change depending on priors ? especially with more than 700 individuals and 16 geographic clusters.
>
> If I want to account for how much different 3 clusters are, and if using the opt alpha score gives me three 100% differenciated clusters, but using a lower one start to create a mix between two of the clusters : can I just decide to use a lot of different numbers of discriminant function to explore the dataset ? or is it "wrong" ?
>
>
>
> Additional information :
> my 'exploring' workflow looks like :
>
>> grp <- find.clusters(obj, max.n.clust = 35)
> x (50-150)
> x (depend what I want to see)
>
>> dapc1 <- dapc(obj, grp$grp)
> x (N/3 or 100 if N is large)
> x (either alpha score number or smaller because I have a strong structure)
>
>> compoplot(dapc1, grp$grp)
>
>
>
>
> Any imput or help more than welcome.
>
> Best,
>
> Thomas
>
>
> _______________________________________________
> adegenet-forum mailing list
> adegenet-forum at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum
More information about the adegenet-forum
mailing list