[adegenet-forum] Fwd: Individuals for genind not plotting dapc

Fri Aug 4 18:22:55 CEST 2017

Hi Phillip,

I would agree with Thibaut to first check to make sure you don't have duplicated genotypes for each cluster in your data. If you use the poppr package, you can use nmll() to check the number of unique genotypes and mll() to view their assignment. 

One thing I notice in your DAPC output is the fact that you are retaining ~95% of the variance for these artificial groups, which may suggest that you are over-fitting the model (although, I would defer to Thibaut for confirmation on this). This could result in minute differences within the groups as compared to between groups, giving the appearance of all within group points stacked on top of one another if the differences between groups is sufficiently large. 

My suggestion is to try reducing the number of retained PCs for the DAPC. 

Sent from my iPhone

> On Aug 4, 2017, at 08:10, Thibaut Jombart <thibautjombart at gmail.com> wrote:
> 
> Dear Philip, 
> 
> it looks like all individuals from a given cluster are exactly at the same location, which would be the case if they are identical genotypes. 
> 
> Assuming your genind object is 'x', can you check what this returns:
> 
>  table(table(apply(genind2df(x, sep="", usepop = FALSE), 1, paste, collapse = "")))
> 
> This will derive the frequencies of haplotypes in the data. I think there is something to do this more elegantly in poppr but I will let Zhian comment if this is the case.
> 
> Best
> Thibaut
> 
> 
> 
> 
> --
> Dr Thibaut Jombart
> Lecturer, Department of Infectious Disease Epidemiology, Imperial College London
> Head of RECON: repidemicsconsortium.org
> WHO Consultant - outbreak analysis
> sites.google.com/site/thibautjombart/
> Twitter: @TeebzR
> +44(0)20 7594 3658
> 
>> On 22 July 2017 at 05:19, Phillip Skipwith <pskipwith at gmail.com> wrote:
>> Hi,
>> 
>> I'm pretty new to Adegenet, but I have been through the tutorials and have been more or less successful getting it to work on my empirical data. This is a phylogenomic dataset of 83 individuals from eight clades and 4,268 loci (I'm using 4,035 SNPs for ordination, etc.).  I realize the sample size is small, but this is hard-earned field data. The problem arises when I'm trying to use dapc after find.clusters on the below genind object.
>> 
>> gen.struct
>> /// GENIND OBJECT /////////
>> 
>>  // 83 individuals; 4,035 loci; 8,341 alleles; size: 4.5 Mb
>> 
>>  // Basic content
>>    @tab:  83 x 8341 matrix of allele counts
>>    @loc.n.all: number of alleles per locus (range: 2-4)
>>    @loc.fac: locus factor for the 8341 columns of @tab
>>    @all.names: list of allele names for each locus
>>    @ploidy: ploidy of each individual  (range: 2-2)
>>    @type:  codom
>>    @call: read.structure(file = "final_Struct_good_maybe.str", n.ind = 83, 
>>     n.loc = 4035, onerowperind = F, col.lab = 1, col.pop = 2, 
>>     row.marknames = 0, ask = F)
>> 
>>  // Optional content
>>    @pop: population of each individual (group size range: 2-27)
>> 
>> grp <- find.clusters(gen.struct, max.n.clust=35)
>> 
>> Choose the number PCs to retain (>=1): 
>> 80
>> Choose the number of clusters (>=2: 
>> 9
>> 
>> dapc1 <- dapc(gen.struct, grp$grp)
>> 
>> dapc1
>> 	#################################################
>> 	# Discriminant Analysis of Principal Components #
>> 	#################################################
>> class: dapc
>> $call: dapc.genind(x = gen.struct, pop = grp$grp)
>> 
>> $n.pca: 60 first PCs of PCA used
>> $n.da: 4 discriminant functions saved
>> $var (proportion of conserved variance): 0.946
>> 
>> $eig (eigenvalues): 182000 71010 34130 20710 16790 ...
>> 
>>   vector    length content                   
>> 1 $eig      8      eigenvalues               
>> 2 $grp      83     prior group assignment    
>> 3 $prior    9      prior group probabilities 
>> 4 $assign   83     posterior group assignment
>> 5 $pca.cent 8341   centring vector of PCA    
>> 6 $pca.norm 8341   scaling vector of PCA     
>> 7 $pca.eig  82     eigenvalues of PCA        
>> 
>>   data.frame    nrow ncol content                                          
>> 1 $tab          83   60   retained PCs of PCA                              
>> 2 $means        9    60   group means                                      
>> 3 $loadings     60   4    loadings of variables                            
>> 4 $ind.coord    83   4    coordinates of individuals (principal components)
>> 5 $grp.coord    9    4    coordinates of groups                            
>> 6 $posterior    83   9    posterior membership probabilities               
>> 7 $pca.loadings 8341 60   PCA loadings of original variables              
>> 8 $var.contr    8341 4    contribution of original variables   
>> 
>> Choose the number PCs to retain (>=1): 
>> 60
>> Choose the number discriminant functions to retain (>=1): 
>> 4
>> 
>> scatter(dapc1, scree.da = T)
>> 
>> The end result is a plot with the centroid points for each of the clusters but not the individuals. I know there is probably something simple that I'm missing or there's something intrinsically wrong with my code and or data. I've perused the forum for similar issues and nothing is quite spot on to what I'm asking here.
>> 
>> Any help would be greatly appreciated.
>> 
>> Best,
>> 
>> Phillip
>> 
>> 
>> 
>> _______________________________________________
>> adegenet-forum mailing list
>> adegenet-forum at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/adegenet-forum
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20170804/365da929/attachment-0001.html>