[adegenet-forum] [poppr] PCA results

Zhian Kamvar zkamvar at gmail.com
Wed Feb 5 18:07:03 CET 2020


Hello Matthew,

I'm glad you found the solution and thank you for posting the answer to this forum! You are correct that it's not necessary to retain all 900 PCs for downstream analysis since there generally will be a long tail of <1% variance explained. I'm also forwarding it to the adegenet forum as this also concerns non-poppr users. 

Best,
Zhian


> On Feb 5, 2020, at 08:57 , Matthew Haas <haasx092 at umn.edu> wrote:
> 
> Hi Zhian,
> 
> Thank you for such a quick response. I believe I found the root cause last night. I have ~900 individuals, so there are 900 eigenvalues. PLiNK calculates the % variance based on the top 20 values only, while poppr uses all of them. When I altered my code to only use the top 20 eigenvalues generated by poppr, I get the same result. This is somewhat unfamiliar territory for me, but I think it is not necessary to use all of the eigenvalues when many of them are as small as they are.
> 
> Thanks again.
> 
> Kind regards,
> Matthew
> 
> On Wed, Feb 5, 2020 at 10:49 AM Zhian Kamvar <zkamvar at gmail.com <mailto:zkamvar at gmail.com>> wrote:
> Hello Matthew,
> 
> I'm afraid that I don't have a good answer for you at the moment. I'm not familiar with how PLiNK processes the GBS data for PCA. Without seeing your code, it's also difficult to figure out what may be going on, so I can only make guesses. One common mistake is centering and scaling the data beforehand (which is default in ade4). It's a common procedure for PCA to account for different variable types, but is not necessary where all the data are allele counts; it can dampen the signal. If that's not the issue, then try making sure you have the most recent version of adegenet/ade4 (poppr doesn't perform PCA, ade4 does).
> 
> Hope that helps.
> 
> Best,
> Zhian
> 
>> On Feb 1, 2020, at 16:16 , Matthew Haas <haasx092 at umn.edu <mailto:haasx092 at umn.edu>> wrote:
>> 
>> Hello fellow poppr users,
>> 
>> I am confused by some results that I am getting from poppr. I have attached two PCA plots: one was created with PLINK and explains a greater proportion of phenotypic variation (PC1=23%). According to poppr, the first PC explains only about 6% of the variation.. The PLINK results make more biological sense, but I am troubled that poppr isn't in better agreement. The samples are clustering as expected, but I would think the % variation should be in better agreement. Initially, I attributed different methods of handling missing data as the underlying cause, but after imputing missing SNPs, I am seeing the same plots. I am not 100% sure I have imputation figured out, but I now think there must be some other cause that I'm not thinking of.. I also filter to remove indels and retain only balletic SNPs. I set the minor allele frequency to 0.05 and have tried a few different thresholds for Hardy-Weinberg inclusion.
>> 
>> Is there some majorly obvious parameter I'm not considering? I am new to population genetics--having previously worked as a plan geneticist/bioinformatician. One last point that might be important: I am working with an outcrossed, so I expect a fair amount of heterozygosity in my population. How much does mating system matter to poppr?
>> 
>> Thank you all in advance for your help.
>> 
>> Kind regards,
>> Matthew
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups "poppr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an email to poppr+unsubscribe at googlegroups.com <mailto:poppr+unsubscribe at googlegroups.com>.
>> To view this discussion on the web visit https://groups.google.com/d/msgid/poppr/8a2a84b1-7cca-478c-99b8-b89ec2e39342%40googlegroups.com <https://groups.google.com/d/msgid/poppr/8a2a84b1-7cca-478c-99b8-b89ec2e39342%40googlegroups.com?utm_medium=email&utm_source=footer>.
>> <200130_main_gbs_imputed.pdf><200130_main_GBS_imputed_PCA_poppr.pdf>
> 
> 
> 
> -- 
> Matthew Haas, PhD
> Post-doctoral research associate
> Department of Agronomy and Plant Genetics
> University of Minnesota
> 1991 Upper Buford Circle
> 411 Borlaug Hall
> Saint Paul, MN 55108
> 
> Mobile: (651) 356-9305

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/adegenet-forum/attachments/20200205/59a62c0e/attachment.html>


More information about the adegenet-forum mailing list