[adegenet-forum] Fwd: Individuals for genind not plotting dapc
Thibaut Jombart
thibautjombart at gmail.com
Fri Aug 4 22:53:59 CEST 2017
Yes I overlooked that, but +1 to Zhian's suggestions.
Also a sign I need to get more familiar with poppr ;)
Best
Thibaut
--
Dr Thibaut Jombart
Lecturer, Department of Infectious Disease Epidemiology, Imperial College
London
Head of RECON: repidemicsconsortium.org
WHO Consultant - outbreak analysis
sites.google.com/site/thibautjombart/
Twitter: @TeebzR
+44(0)20 7594 3658
On 4 August 2017 at 17:22, Zhian Kamvar <zkamvar at gmail.com> wrote:
> Hi Phillip,
>
> I would agree with Thibaut to first check to make sure you don't have
> duplicated genotypes for each cluster in your data. If you use the poppr
> package, you can use nmll() to check the number of unique genotypes and
> mll() to view their assignment.
>
> One thing I notice in your DAPC output is the fact that you are retaining
> ~95% of the variance for these artificial groups, which may suggest that
> you are over-fitting the model (although, I would defer to Thibaut for
> confirmation on this). This could result in minute differences within the
> groups as compared to between groups, giving the appearance of all within
> group points stacked on top of one another if the differences between
> groups is sufficiently large.
>
> My suggestion is to try reducing the number of retained PCs for the DAPC.
>
> Sent from my iPhone
>
> On Aug 4, 2017, at 08:10, Thibaut Jombart <thibautjombart at gmail.com>
> wrote:
>
> Dear Philip,
>
> it looks like all individuals from a given cluster are exactly at the same
> location, which would be the case if they are identical genotypes.
>
> Assuming your genind object is 'x', can you check what this returns:
>
> table(table(apply(genind2df(x, sep="", usepop = FALSE), 1, paste,
> collapse = "")))
>
> This will derive the frequencies of haplotypes in the data. I think there
> is something to do this more elegantly in poppr but I will let Zhian
> comment if this is the case.
>
> Best
> Thibaut
>
>
>
>
> --
> Dr Thibaut Jombart
> Lecturer, Department of Infectious Disease Epidemiology, Imperial College
> London
> Head of RECON: repidemicsconsortium.org
> WHO Consultant - outbreak analysis
> sites.google.com/site/thibautjombart/
> Twitter: @TeebzR
> +44(0)20 7594 3658 <+44%2020%207594%203658>
>
> On 22 July 2017 at 05:19, Phillip Skipwith <pskipwith at gmail.com> wrote:
>
>> Hi,
>>
>> I'm pretty new to Adegenet, but I have been through the tutorials and
>> have been more or less successful getting it to work on my empirical data.
>> This is a phylogenomic dataset of 83 individuals from eight clades and
>> 4,268 loci (I'm using 4,035 SNPs for ordination, etc.). I realize the
>> sample size is small, but this is hard-earned field data. The problem
>> arises when I'm trying to use dapc after find.clusters on the below genind
>> object.
>>
>> gen.struct
>> /// GENIND OBJECT /////////
>>
>> // 83 individuals; 4,035 loci; 8,341 alleles; size: 4.5 Mb
>>
>> // Basic content
>> @tab: 83 x 8341 matrix of allele counts
>> @loc.n.all: number of alleles per locus (range: 2-4)
>> @loc.fac: locus factor for the 8341 columns of @tab
>> @all.names: list of allele names for each locus
>> @ploidy: ploidy of each individual (range: 2-2)
>> @type: codom
>> @call: read.structure(file = "final_Struct_good_maybe.str", n.ind =
>> 83,
>> n.loc = 4035, onerowperind = F, col.lab = 1, col.pop = 2,
>> row.marknames = 0, ask = F)
>>
>> // Optional content
>> @pop: population of each individual (group size range: 2-27)
>>
>> grp <- find.clusters(gen.struct, max.n.clust=35)
>>
>> Choose the number PCs to retain (>=1):
>> 80
>> Choose the number of clusters (>=2:
>> 9
>>
>> dapc1 <- dapc(gen.struct, grp$grp)
>>
>> dapc1
>> #################################################
>> # Discriminant Analysis of Principal Components #
>> #################################################
>> class: dapc
>> $call: dapc.genind(x = gen.struct, pop = grp$grp)
>>
>> $n.pca: 60 first PCs of PCA used
>> $n.da: 4 discriminant functions saved
>> $var (proportion of conserved variance): 0.946
>>
>> $eig (eigenvalues): 182000 71010 34130 20710 16790 ...
>>
>> vector length content
>> 1 $eig 8 eigenvalues
>> 2 $grp 83 prior group assignment
>> 3 $prior 9 prior group probabilities
>> 4 $assign 83 posterior group assignment
>> 5 $pca.cent 8341 centring vector of PCA
>> 6 $pca.norm 8341 scaling vector of PCA
>> 7 $pca.eig 82 eigenvalues of PCA
>>
>> data.frame nrow ncol content
>>
>> 1 $tab 83 60 retained PCs of PCA
>>
>> 2 $means 9 60 group means
>>
>> 3 $loadings 60 4 loadings of variables
>>
>> 4 $ind.coord 83 4 coordinates of individuals (principal
>> components)
>> 5 $grp.coord 9 4 coordinates of groups
>>
>> 6 $posterior 83 9 posterior membership probabilities
>>
>> 7 $pca.loadings 8341 60 PCA loadings of original variables
>>
>> 8 $var.contr 8341 4 contribution of original variables
>>
>> Choose the number PCs to retain (>=1):
>> 60
>> Choose the number discriminant functions to retain (>=1):
>> 4
>>
>> scatter(dapc1, scree.da = T)
>>
>> The end result is a plot with the centroid points for each of the
>> clusters but not the individuals. I know there is probably something simple
>> that I'm missing or there's something intrinsically wrong with my code and
>> or data. I've perused the forum for similar issues and nothing is quite
>> spot on to what I'm asking here.
>>
>> Any help would be greatly appreciated.
>>
>> Best,
>>
>> Phillip
>>
>>
>>
>> _______________________________________________
>> adegenet-forum mailing list
>> adegenet-forum at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo
>> /adegenet-forum
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20170804/031ffa78/attachment.html>
More information about the adegenet-forum
mailing list