[GenABEL-dev] ld-based pruning

Mon Mar 18 14:56:06 CET 2013

Dear all,

Although I like the idea of it being available as a regular R function (so GenABEL), I think it would also fit very well with probabel: after all, it completely relies on the existence of a probabel configuration file. I have attached the code: it is by no means fully annotated yet. 

Something else that I have worked on in the past is a script to extract dosages of specific SNPs using databel objects. Again this was needed for me to make genetic risk scores. I think this may be faster than perl scripts when we want to extract many (hundreds of thousands) SNPs. In any case it may also be a nice feature for the GenABEL suite. It would rely on databel objects and the probabel configuration file in the same way as the LD pruning function. This extraction syntax was my first real R project and therefore it very messy at the moment, but maybe it is also something interesting to develop further?

Kind regards,

Paul 

-----Original Message-----
From: genabel-devel-bounces at lists.r-forge.r-project.org [mailto:genabel-devel-bounces at lists.r-forge.r-project.org] On Behalf Of L.C. Karssen
Sent: maandag 18 maart 2013 09:42
To: genabel-devel at lists.r-forge.r-project.org
Subject: Re: [GenABEL-dev] ld-based pruning

Dear all,

On 16/03/13 14:20, Yurii Aulchenko wrote:
> Paul, I think this indeed may be interesting for other people as well; 
> at least I ran across similar problem a couple of times.
> 
> Could you please let us know more details about the implementation and 
> your thoughts about 'architecture' (is it / should it become a part of 
> GenA, ProbA, or something else, e.g. we have something called 
> "GanABEL-suite general" - smaller things not tied specifically to any 
> package).

This is a point Paul and I discussed and I think it's quite open. It seems to me that this function is at the intersection of GenABEL, ProbABEL (and maybe DatABEL).

Technically, if we make it part of GenABEL would that mean that GenABEL depends on DatABEL (now it is only a suggested dependency) since Paul's function 'require()'s DatABEL? Are there any other R packages that have functionality that only works if a given package is installed (that is not in the dependency list but only in the suggested list)?

We could distribute it as part of ProbABEL as well. ProbABEL has a "soft" dependency on DatABEL and we already provide some accompanying R scripts.
I'm not sure if making it a part of the general scripts will work out well as we don't have a way to distribute these (yet).

Having taken a quick look at Paul's code the only major thing that needs to be implemented, as Paul pointed out, is how to deal with chunked DatABEL files (when the data of a single chromosome is split into smaller chunks). The present implementation assumes LD is broken at chromosome (= file) borders.
My guess is that that is quite easy to implement by simply asking the file system to list all files with a given chomosome number. It does requires a bit of pattern matching though (you don't want it to return chr 10, 11, 12 etc, when looking for chr 1 only).

> 
> I also refer you to our set of devel-tutorials
> (http://genabel.r-forge.r-project.org/) and some documents about our 
> policies (at http://www.genabel.org/developers) for general 
> information
> 

Good point!

Lennart.

> best wishes,
> YA
> 
> On Fri, Mar 15, 2013 at 3:03 PM, P.S. de Vries 
> <p.s.devries at erasmusmc.nl <mailto:p.s.devries at erasmusmc.nl>> wrote:
> 
>     Dear all,____
> 
>     __ __
> 
>     In my research I constantly work with genetic risk scores based on a
>     large amount of SNPs. Sometimes it is appropriate to prune the SNPs
>     by LD before constructing the genetic risk score to obtain a set of
>     independent SNPs. The most widely used function for this (plink
>     --indep-pairwise) uses a sliding window to find pairs of SNPs in
>     high LD with each other according to their R^2 and then removes one.
>     However, it removes the SNP with the lowest minor allele frequency.
>     In practice we may instead want to keep the SNP with the lowest
>     p-value: i.e. the most important to us. This should lead to higher
>     quality genetic risk scores.____
> 
>     __ __
> 
>     I have written a function that reads in a list of SNPs and then
>     prunes it for LD by choosing which SNP to remove from a pair in high
>     LD according to its position in the snp list. So the SNP that comes
>     first in the SNP list will never be pruned out, and the second one
>     only if it is in LD with the first one. The LD structure is based on
>     your own data: it relies on the probabel configuration file for
>     filepaths, and on databel for accessing the dosages.____
> 
>     __ __
> 
>     I initially did this just to apply it to my own data, but Lennart
>     suggested it might be a nice addition to genabel. I am interested in
>     developing this further if there is indeed interest in such a
>     function. If so I will share the code and we can go from 
> there.____
> 
>     __ __
> 
>     Let me know what you think,____
> 
>     __ __
> 
>     Paul____
> 
>     __
> 
> 
> 
> _______________________________________________
> genabel-devel mailing list
> genabel-devel at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-d
> evel
> 

--
-----------------------------------------------------------------
L.C. Karssen
Utrecht
The Netherlands

lennart at karssen.org
http://blog.karssen.org

Stuur mij aub geen Word of Powerpoint bestanden!
Zie http://www.gnu.org/philosophy/no-word-attachments.nl.html
------------------------------------------------------------------

-------------- next part --------------
A non-text attachment was scrubbed...
Name: ldprunning.R
Type: application/octet-stream
Size: 9857 bytes
Desc: ldprunning.R
URL: <http://lists.r-forge.r-project.org/pipermail/genabel-devel/attachments/20130318/abb47d5a/attachment.obj>