[adegenet-forum] Fwd: Individuals for genind not plotting dapc

Fri Aug 4 22:53:59 CEST 2017

Yes I overlooked that, but +1 to Zhian's suggestions.

Also a sign I need to get more familiar with poppr ;)

Best
Thibaut

--
Dr Thibaut Jombart
Lecturer, Department of Infectious Disease Epidemiology, Imperial College
London
Head of RECON: repidemicsconsortium.org
WHO Consultant - outbreak analysis
sites.google.com/site/thibautjombart/
Twitter: @TeebzR
+44(0)20 7594 3658

On 4 August 2017 at 17:22, Zhian Kamvar <zkamvar at gmail.com> wrote:

> Hi Phillip,
>
> I would agree with Thibaut to first check to make sure you don't have
> duplicated genotypes for each cluster in your data. If you use the poppr
> package, you can use nmll() to check the number of unique genotypes and
> mll() to view their assignment.
>
> One thing I notice in your DAPC output is the fact that you are retaining
> ~95% of the variance for these artificial groups, which may suggest that
> you are over-fitting the model (although, I would defer to Thibaut for
> confirmation on this). This could result in minute differences within the
> groups as compared to between groups, giving the appearance of all within
> group points stacked on top of one another if the differences between
> groups is sufficiently large.
>
> My suggestion is to try reducing the number of retained PCs for the DAPC.
>
> Sent from my iPhone
>
> On Aug 4, 2017, at 08:10, Thibaut Jombart <thibautjombart at gmail.com>
> wrote:
>
> Dear Philip,
>
> it looks like all individuals from a given cluster are exactly at the same
> location, which would be the case if they are identical genotypes.
>
> Assuming your genind object is 'x', can you check what this returns:
>
>  table(table(apply(genind2df(x, sep="", usepop = FALSE), 1, paste,
> collapse = "")))
>
> This will derive the frequencies of haplotypes in the data. I think there
> is something to do this more elegantly in poppr but I will let Zhian
> comment if this is the case.
>
> Best
> Thibaut
>
>
>
>
> --
> Dr Thibaut Jombart
> Lecturer, Department of Infectious Disease Epidemiology, Imperial College
> London
> Head of RECON: repidemicsconsortium.org
> WHO Consultant - outbreak analysis
> sites.google.com/site/thibautjombart/
> Twitter: @TeebzR
> +44(0)20 7594 3658 <+44%2020%207594%203658>
>
> On 22 July 2017 at 05:19, Phillip Skipwith <pskipwith at gmail.com> wrote:
>
>> Hi,
>>
>> I'm pretty new to Adegenet, but I have been through the tutorials and
>> have been more or less successful getting it to work on my empirical data.
>> This is a phylogenomic dataset of 83 individuals from eight clades and
>> 4,268 loci (I'm using 4,035 SNPs for ordination, etc.).  I realize the
>> sample size is small, but this is hard-earned field data. The problem
>> arises when I'm trying to use dapc after find.clusters on the below genind
>> object.
>>
>> gen.struct
>> /// GENIND OBJECT /////////
>>
>>  // 83 individuals; 4,035 loci; 8,341 alleles; size: 4.5 Mb
>>
>>  // Basic content
>>    @tab:  83 x 8341 matrix of allele counts
>>    @loc.n.all: number of alleles per locus (range: 2-4)
>>    @loc.fac: locus factor for the 8341 columns of @tab
>>    @all.names: list of allele names for each locus
>>    @ploidy: ploidy of each individual  (range: 2-2)
>>    @type:  codom
>>    @call: read.structure(file = "final_Struct_good_maybe.str", n.ind =
>> 83,
>>     n.loc = 4035, onerowperind = F, col.lab = 1, col.pop = 2,
>>     row.marknames = 0, ask = F)
>>
>>  // Optional content
>>    @pop: population of each individual (group size range: 2-27)
>>
>> grp <- find.clusters(gen.struct, max.n.clust=35)
>>
>> Choose the number PCs to retain (>=1):
>> 80
>> Choose the number of clusters (>=2:
>> 9
>>
>> dapc1 <- dapc(gen.struct, grp$grp)
>>
>> dapc1
>> #################################################
>> # Discriminant Analysis of Principal Components #
>> #################################################
>> class: dapc
>> $call: dapc.genind(x = gen.struct, pop = grp$grp)
>>
>> $n.pca: 60 first PCs of PCA used
>> $n.da: 4 discriminant functions saved
>> $var (proportion of conserved variance): 0.946
>>
>> $eig (eigenvalues): 182000 71010 34130 20710 16790 ...
>>
>>   vector    length content
>> 1 $eig      8      eigenvalues
>> 2 $grp      83     prior group assignment
>> 3 $prior    9      prior group probabilities
>> 4 $assign   83     posterior group assignment
>> 5 $pca.cent 8341   centring vector of PCA
>> 6 $pca.norm 8341   scaling vector of PCA
>> 7 $pca.eig  82     eigenvalues of PCA
>>
>>   data.frame    nrow ncol content
>>
>> 1 $tab          83   60   retained PCs of PCA
>>
>> 2 $means        9    60   group means
>>
>> 3 $loadings     60   4    loadings of variables
>>
>> 4 $ind.coord    83   4    coordinates of individuals (principal
>> components)
>> 5 $grp.coord    9    4    coordinates of groups
>>
>> 6 $posterior    83   9    posterior membership probabilities
>>
>> 7 $pca.loadings 8341 60   PCA loadings of original variables
>>
>> 8 $var.contr    8341 4    contribution of original variables
>>
>> Choose the number PCs to retain (>=1):
>> 60
>> Choose the number discriminant functions to retain (>=1):
>> 4
>>
>> scatter(dapc1, scree.da = T)
>>
>> The end result is a plot with the centroid points for each of the
>> clusters but not the individuals. I know there is probably something simple
>> that I'm missing or there's something intrinsically wrong with my code and
>> or data. I've perused the forum for similar issues and nothing is quite
>> spot on to what I'm asking here.
>>
>> Any help would be greatly appreciated.
>>
>> Best,
>>
>> Phillip
>>
>>
>>
>> _______________________________________________
>> adegenet-forum mailing list
>> adegenet-forum at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo
>> /adegenet-forum
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20170804/031ffa78/attachment.html>