[GenABEL-dev] ld-based pruning
P.S. de Vries
p.s.devries at erasmusmc.nl
Fri Mar 15 15:03:11 CET 2013
Dear all,
In my research I constantly work with genetic risk scores based on a large amount of SNPs. Sometimes it is appropriate to prune the SNPs by LD before constructing the genetic risk score to obtain a set of independent SNPs. The most widely used function for this (plink --indep-pairwise) uses a sliding window to find pairs of SNPs in high LD with each other according to their R^2 and then removes one. However, it removes the SNP with the lowest minor allele frequency. In practice we may instead want to keep the SNP with the lowest p-value: i.e. the most important to us. This should lead to higher quality genetic risk scores.
I have written a function that reads in a list of SNPs and then prunes it for LD by choosing which SNP to remove from a pair in high LD according to its position in the snp list. So the SNP that comes first in the SNP list will never be pruned out, and the second one only if it is in LD with the first one. The LD structure is based on your own data: it relies on the probabel configuration file for filepaths, and on databel for accessing the dosages.
I initially did this just to apply it to my own data, but Lennart suggested it might be a nice addition to genabel. I am interested in developing this further if there is indeed interest in such a function. If so I will share the code and we can go from there.
Let me know what you think,
Paul
P.S. de Vries
PhD Scientist
Cardiovascular group
Department of Epidemiology
Erasmus MC, University Medical Center Rotterdam
Office Ee 21-33
PO Box 2040, 3000 CA Rotterdam,
The Netherlands
Email: p.s.devries at erasmusmc.nl<mailto:p.s.devries at erasmusmc.nl>
http://www.erasmus-epidemiology.nl/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/genabel-devel/attachments/20130315/eafc5629/attachment.html>
More information about the genabel-devel
mailing list