[adegenet-forum] A few basic questions

Thu Apr 4 11:36:18 CEST 2013

On 4 Apr 2013, at 10:27, "Jombart, Thibaut" <t.jombart at imperial.ac.uk> wrote:

> * message to the list * : I will offer a pint to the person who will implement this feature in adegenet; nothing complicated, but I just don't have time for this at the moment

I can come by your office tomorrow and see if it is something I could do in a nice and reasonably speedy way.  Contact me off list.  No beer required.

BW

F

--
Federico C. F. Calboli
Neuroepidemiology and Ageing Research
Imperial College, St. Mary's Campus
Norfolk Place, London W2 1PG

Tel +44 (0)20 75941602   Fax +44 (0)20 75943193

f.calboli [.a.t] imperial.ac.uk
f.calboli [.a.t] gmail.com

> 
> - DAPC is good at finding an optimal typology of groups; cluster assignment is merely a by-product, useful but limited. This is where model-based classifiers will be better. I recommend using BAPS, especially on microsat data since it should run quite fast.
> 
> Cheers
> Thibaut
> 
> --
> ######################################
> Dr Thibaut JOMBART
> MRC Centre for Outbreak Analysis and Modelling
> Department of Infectious Disease Epidemiology
> Imperial College - School of Public Health
> St Mary’s Campus
> Norfolk Place
> London W2 1PG
> United Kingdom
> Tel. : 0044 (0)20 7594 3658
> t.jombart at imperial.ac.uk
> http://sites.google.com/site/thibautjombart/
> http://adegenet.r-forge.r-project.org/
> ________________________________________
> From: adegenet-forum-bounces at lists.r-forge.r-project.org [adegenet-forum-bounces at lists.r-forge.r-project.org] on behalf of Thomas Vignaud [thomfromsea at gmail.com]
> Sent: 02 April 2013 09:10
> To: adegenet-forum at lists.r-forge.r-project.org
> Subject: [adegenet-forum] A few basic questions
> 
> Hi everyone,
> 
> I find Adegenet -DAPC-  to be very usefull -yet I don't fully understand all the subtilities.
> 
> I'll here try to ask a few simple questions with associated screenshots. I'll mostly use examples to ask my questions as I believe it a very efficient way to do it.
> (I'm working with 17 microsats on animals)
> 
> I'm sorry if all this sounds newbie - please feel free to redirect me to any .pdf I might have miss.
> 
> I believe the two main questions I want to answer with DAPC are :
> 
> 1 - How different my clusters are ?  (I know this depend on a lot of things and that I can't compare with other species/genes)
>  I feel like one way to do it is to check is a few components still finds a lot of structure.
>  Another is, using alpha scores and the whole classic process, to visually see how assigned to their cluster the individuals are.
> 
> 2 - Is there any sub-(genetic)clusters in my sample? for example, I have sampled 50 ids in the same location. But maybe there is two (sub) population here and I sample 40 of the first one and 10 of the other. I want to see that (i.e. compoplot), to go back to my data and to check if I can find patterns related with what the genetic tells me.
> 
> Now here is my problem : depending what number of discriminant function I'm using, I get totally different results with the same sub-dataset.
> And, with the same number of discriminant function but with adding another population (very structured) to my first sub-dataset, then the first sub-dataset will be different again.
> 
> --->  I'm a little lost in what to choose as a number of discriminant function (I understand the alpha-score, but sometimes it will tell me "21", when using only "5" will give me the same exact compoplot).
> It would not be such a problem if differences would be small, but here it is : often all my individuals are 100% in one color, but it's never the same pattern.
> One compoplot I'll have ids 1, 2, 5, 6 that are 100% red, and 3, 4, 7 that are 100% blue.
> Then I just redo the analysis changing the number of discriminant function and I get 1, 3, 7 100% red and 2, 4, 5, 6 100% blue.
> See attached screenshots A, B and C from the SAME dataset. (I'm trying to use small number of DF as I don't like my ids to be 100% in one color, I feel I miss some information)
> 
> ---> the same thing happen if I add other populations. The whole pattern change again. See screenshot D
> 
> 
> So is there any guideline that would give me something a little less absolute that totally different results?
> 
> If I want, for example, to note all my outliers (ids that does not belong the their original geographic cluster) and check for their caracteristic (size, sex etc...) how am I supposed to do that if outliers change depending on priors ? especially with more than 700 individuals and 16 geographic clusters.
> 
> If I want to account for how much different 3 clusters are, and if using the opt alpha score gives me three 100% differenciated clusters, but using a lower one start to create a mix between two of the clusters : can I just decide to use a lot of different numbers of discriminant function to explore the dataset ? or is it "wrong" ?
> 
> 
> 
> Additional information :
> my 'exploring' workflow looks like :
> 
>> grp <- find.clusters(obj, max.n.clust = 35)
> x (50-150)
> x (depend what I want to see)
> 
>> dapc1 <- dapc(obj, grp$grp)
> x (N/3 or 100 if N is large)
> x (either alpha score number or smaller because I have a strong structure)
> 
>> compoplot(dapc1, grp$grp)
> 
> 
> 
> 
> Any imput or help more than welcome.
> 
> Best,
> 
> Thomas
> 
> 
> _______________________________________________
> adegenet-forum mailing list
> adegenet-forum at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum