Paul, I think this indeed may be interesting for other people as well; at least I ran across similar problem a couple of times.<div><br></div><div>Could you please let us know more details about the implementation and your thoughts about 'architecture' (is it / should it become a part of GenA, ProbA, or something else, e.g. we have something called "GanABEL-suite general" - smaller things not tied specifically to any package).</div>

<div><div><br></div><div>I also refer you to our set of devel-tutorials (<a href="http://genabel.r-forge.r-project.org/">http://genabel.r-forge.r-project.org/</a>) and some documents about our policies (at <a href="http://www.genabel.org/developers">http://www.genabel.org/developers</a>) for general information</div>

<div><br></div><div>best wishes,</div><div>YA<br><br><div class="gmail_quote">On Fri, Mar 15, 2013 at 3:03 PM, P.S. de Vries <span dir="ltr"><<a href="mailto:p.s.devries@erasmusmc.nl" target="_blank">p.s.devries@erasmusmc.nl</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<div lang="NL" link="blue" vlink="purple">

<div>

<p class="MsoNormal">Dear all,<u></u><u></u></p>

<p class="MsoNormal"><u></u> <u></u></p>

<p class="MsoNormal">In my research I constantly work with genetic risk scores based on a large amount of SNPs. Sometimes it is appropriate to prune the SNPs by LD before constructing the genetic risk score to obtain a set of independent SNPs. The most widely

 used function for this (plink --indep-pairwise) uses a sliding window to find pairs of SNPs in high LD with each other according to their R^2 and then removes one. However, it removes the SNP with the lowest minor allele frequency. In practice we may instead

 want to keep the SNP with the lowest p-value: i.e. the most important to us. This should lead to higher quality genetic risk scores.<u></u><u></u></p>

<p class="MsoNormal"><u></u> <u></u></p>

<p class="MsoNormal">I have written a function that reads in a list of SNPs and then prunes it for LD by choosing which SNP to remove from a pair in high LD according to its position in the snp list. So the SNP that comes first in the SNP list will never be

 pruned out, and the second one only if it is in LD with the first one. The LD structure is based on your own data: it relies on the probabel configuration file for filepaths, and on databel for accessing the dosages.<u></u><u></u></p>


<p class="MsoNormal"><u></u> <u></u></p>

<p class="MsoNormal">I initially did this just to apply it to my own data, but Lennart suggested it might be a nice addition to genabel. I am interested in developing this further if there is indeed interest in such a function. If so I will share the code and

 we can go from there.<u></u><u></u></p>

<p class="MsoNormal"><u></u> <u></u></p>

<p class="MsoNormal">Let me know what you think,<u></u><u></u></p>

<p class="MsoNormal"><u></u> <u></u></p>

<p class="MsoNormal">Paul<u></u><u></u></p>

<p class="MsoNormal"><u></u> </p></div></div></blockquote></div>

</div></div>