[adegenet-forum] Grouping crossvalidation DAPC
Thibaut Jombart
thibautjombart at gmail.com
Wed Dec 6 18:50:04 CET 2017
Hi Flo,
sorry about the late reply. I'd make an excuse, but it won't bit "been
busy". In short:
There is no proper solution to the group labelling issue you mention, at
least none that I know if. A workaround I have used alongside other
colleagues uses two statistics, relying on all pairwise comparisons of
individuals in the clusters:
- % of times 2 indiv are in the same cluster when they should
- % of times 2 indiv are in different clusters when they should
The average of the two quantities is I think the rand index:
https://en.wikipedia.org/wiki/Rand_index
For the second, simple answer: DAPC is *bad* at finding hybrids. For this,
consider using the soon-to-be-published (hopefully) method 'snapclust',
also in adegenet. Doc for this is hidden there:
https://github.com/thibautjombart/adegenet/raw/master/tutorials/tutorial-snapclust.pdf
Best
Thibaut
--
Dr Thibaut Jombart
Lecturer, Department of Infectious Disease Epidemiology, Imperial College
London
Head of RECON: repidemicsconsortium.org
WHO Consultant - outbreak analysis
sites.google.com/site/thibautjombart/
Twitter: @TeebzR
+44(0)20 7594 3658
On 21 November 2017 at 18:45, Florian Leplat <leplat.florian at gmail.com>
wrote:
> Dear Adegenet user,
>
>
> I was previously using adegenet package for genetic applications.
>
> Since I started to use the DAPC function. The methods is very informative.
> However there are some details that I would like to understand better.
>
>
>
> The main objective of my use is to group a certain number of plant
> genotype regarding to their genotypic background. Each genotype are
> homozygous lines.
>
> I have no prerequisite information for any genotype. Therefore the main
> idea is to attribute a “group number” for each of my plants.
>
>
>
> I does work fine with a good comprehensiveness regarding to the
> information that I have for each plants (origin, parents...)
>
> However, one of the limitations that I face is the repeatability of the
> grouping. If running several time, a certain number of plant will be
> attributed to a different group. Is there a cross-validation procedure at
> this step in order to look at the percentage of plant always grouped
> together for each run ?
>
>
>
> Furthermore, even if my grouping are quite consistent “group wise” from
> one run to the other, the grouping number will change. Is there a way to
> solve that (for instance give a group number to the founder genotype of our
> population) ?
>
>
>
> Then I have a second issue. After the first step which define my groups, I
> would like to plot hybrids genotypes which should (could) be an admixture
> between 2 groups. Therefore I cannot use them to build my groups as they
> add some noise to the model. I wanted to use the procedure described in the
> tutorial using supplementary individuals. I realized that it only works if
> the supplementary individuals have already grouping information, however I
> don’t have this information. Indeed I only want to see (mostly visually)
> where the hybrids are positioned in relation to the other groups that I
> previously defined. Is there a way to use the script to do that ?
>
>
>
> Thanks in advance for your help.
>
> Best regards.
>
>
> *Flo*
>
> _______________________________________________
> adegenet-forum mailing list
> adegenet-forum at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/
> listinfo/adegenet-forum
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20171206/248cc078/attachment.html>
More information about the adegenet-forum
mailing list