[GenABEL-dev] ld-based pruning

L.C. Karssen lennart at karssen.org
Mon Mar 18 09:41:37 CET 2013


Dear all,

On 16/03/13 14:20, Yurii Aulchenko wrote:
> Paul, I think this indeed may be interesting for other people as well;
> at least I ran across similar problem a couple of times.
> 
> Could you please let us know more details about the implementation and
> your thoughts about 'architecture' (is it / should it become a part of
> GenA, ProbA, or something else, e.g. we have something called
> "GanABEL-suite general" - smaller things not tied specifically to any
> package).

This is a point Paul and I discussed and I think it's quite open. It
seems to me that this function is at the intersection of GenABEL,
ProbABEL (and maybe DatABEL).

Technically, if we make it part of GenABEL would that mean that GenABEL
depends on DatABEL (now it is only a suggested dependency) since Paul's
function 'require()'s DatABEL? Are there any other R packages that have
functionality that only works if a given package is installed (that is
not in the dependency list but only in the suggested list)?

We could distribute it as part of ProbABEL as well. ProbABEL has a
"soft" dependency on DatABEL and we already provide some accompanying R
scripts.
I'm not sure if making it a part of the general scripts will work out
well as we don't have a way to distribute these (yet).


Having taken a quick look at Paul's code the only major thing that needs
to be implemented, as Paul pointed out, is how to deal with chunked
DatABEL files (when the data of a single chromosome is split into
smaller chunks). The present implementation assumes LD is broken at
chromosome (= file) borders.
My guess is that that is quite easy to implement by simply asking the
file system to list all files with a given chomosome number. It does
requires a bit of pattern matching though (you don't want it to return
chr 10, 11, 12 etc, when looking for chr 1 only).

> 
> I also refer you to our set of devel-tutorials
> (http://genabel.r-forge.r-project.org/) and some documents about our
> policies (at http://www.genabel.org/developers) for general information
> 

Good point!



Lennart.

> best wishes,
> YA
> 
> On Fri, Mar 15, 2013 at 3:03 PM, P.S. de Vries <p.s.devries at erasmusmc.nl
> <mailto:p.s.devries at erasmusmc.nl>> wrote:
> 
>     Dear all,____
> 
>     __ __
> 
>     In my research I constantly work with genetic risk scores based on a
>     large amount of SNPs. Sometimes it is appropriate to prune the SNPs
>     by LD before constructing the genetic risk score to obtain a set of
>     independent SNPs. The most widely used function for this (plink
>     --indep-pairwise) uses a sliding window to find pairs of SNPs in
>     high LD with each other according to their R^2 and then removes one.
>     However, it removes the SNP with the lowest minor allele frequency.
>     In practice we may instead want to keep the SNP with the lowest
>     p-value: i.e. the most important to us. This should lead to higher
>     quality genetic risk scores.____
> 
>     __ __
> 
>     I have written a function that reads in a list of SNPs and then
>     prunes it for LD by choosing which SNP to remove from a pair in high
>     LD according to its position in the snp list. So the SNP that comes
>     first in the SNP list will never be pruned out, and the second one
>     only if it is in LD with the first one. The LD structure is based on
>     your own data: it relies on the probabel configuration file for
>     filepaths, and on databel for accessing the dosages.____
> 
>     __ __
> 
>     I initially did this just to apply it to my own data, but Lennart
>     suggested it might be a nice addition to genabel. I am interested in
>     developing this further if there is indeed interest in such a
>     function. If so I will share the code and we can go from there.____
> 
>     __ __
> 
>     Let me know what you think,____
> 
>     __ __
> 
>     Paul____
> 
>     __ 
> 
> 
> 
> _______________________________________________
> genabel-devel mailing list
> genabel-devel at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
> 

-- 
-----------------------------------------------------------------
L.C. Karssen
Utrecht
The Netherlands

lennart at karssen.org
http://blog.karssen.org

Stuur mij aub geen Word of Powerpoint bestanden!
Zie http://www.gnu.org/philosophy/no-word-attachments.nl.html
------------------------------------------------------------------

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 230 bytes
Desc: OpenPGP digital signature
URL: <http://lists.r-forge.r-project.org/pipermail/genabel-devel/attachments/20130318/4425365f/attachment.sig>


More information about the genabel-devel mailing list