[GenABEL-dev] ld-based pruning

L.C. Karssen lennart at karssen.org
Mon Mar 18 21:07:20 CET 2013


Dear Paul,

On 18/03/13 14:56, P.S. de Vries wrote:
> Dear all,
>
> Although I like the idea of it being available as a regular R
> function (so GenABEL), I think it would also fit very well with
> probabel: after all, it completely relies on the existence of a
> probabel configuration file. I have attached the code: it is by no
> means fully annotated yet.

I'm fine with incorporating it into the ProbABEL package. Your
argument about the PA config file is a good one.

In order to inform people of the function's existence we should add a
section to the ProbABEL manual.

>
> Something else that I have worked on in the past is a script to
> extract dosages of specific SNPs using databel objects. Again this
> was needed for me to make genetic risk scores. I think this may be
> faster than perl scripts when we want to extract many (hundreds of
> thousands) SNPs. In any case it may also be a nice feature for the
> GenABEL suite.

I totally agree. I've worked on a C/C++ version of such a script, but
never got to finish it.

> It would rely on databel objects and the probabel
> configuration file in the same way as the LD pruning function. This
> extraction syntax was my first real R project and therefore it very
> messy at the moment, but maybe it is also something interesting to
> develop further?

I think that will earn you lots of kudos from our users.

Maybe we should combine these companion functions into an R package
"ProbABEL-tools" (or something similar) that PA users can simply
install. What does the list think about that?

The current R scripts distributed as part of ProbABEL are quite simple
and merely serve as examples, so I don't think they need to be part of
such a package.


Lennart.

>
> Kind regards,
>
> Paul
>
> -----Original Message----- From:
> genabel-devel-bounces at lists.r-forge.r-project.org
> [mailto:genabel-devel-bounces at lists.r-forge.r-project.org] On Behalf
> Of L.C. Karssen Sent: maandag 18 maart 2013 09:42 To:
> genabel-devel at lists.r-forge.r-project.org Subject: Re: [GenABEL-dev]
> ld-based pruning
>
> Dear all,
>
> On 16/03/13 14:20, Yurii Aulchenko wrote:
>> Paul, I think this indeed may be interesting for other people as
>> well; at least I ran across similar problem a couple of times.
>>
>> Could you please let us know more details about the implementation
>> and your thoughts about 'architecture' (is it / should it become a
>> part of GenA, ProbA, or something else, e.g. we have something
>> called "GanABEL-suite general" - smaller things not tied
>> specifically to any package).
>
> This is a point Paul and I discussed and I think it's quite open. It
> seems to me that this function is at the intersection of GenABEL,
> ProbABEL (and maybe DatABEL).
>
> Technically, if we make it part of GenABEL would that mean that
> GenABEL depends on DatABEL (now it is only a suggested dependency)
> since Paul's function 'require()'s DatABEL? Are there any other R
> packages that have functionality that only works if a given package
> is installed (that is not in the dependency list but only in the
> suggested list)?
>
> We could distribute it as part of ProbABEL as well. ProbABEL has a
> "soft" dependency on DatABEL and we already provide some accompanying
> R scripts. I'm not sure if making it a part of the general scripts
> will work out well as we don't have a way to distribute these (yet).
>
>
> Having taken a quick look at Paul's code the only major thing that
> needs to be implemented, as Paul pointed out, is how to deal with
> chunked DatABEL files (when the data of a single chromosome is split
> into smaller chunks). The present implementation assumes LD is broken
> at chromosome (= file) borders. My guess is that that is quite easy
> to implement by simply asking the file system to list all files with
> a given chomosome number. It does requires a bit of pattern matching
> though (you don't want it to return chr 10, 11, 12 etc, when looking
> for chr 1 only).
>
>>
>> I also refer you to our set of devel-tutorials
>> (http://genabel.r-forge.r-project.org/) and some documents about
>> our policies (at http://www.genabel.org/developers) for general
>> information
>>
>
> Good point!
>
>
>
> Lennart.
>
>> best wishes, YA
>>
>> On Fri, Mar 15, 2013 at 3:03 PM, P.S. de Vries
>> <p.s.devries at erasmusmc.nl <mailto:p.s.devries at erasmusmc.nl>>
>> wrote:
>>
>> Dear all,____
>>
>> __ __
>>
>> In my research I constantly work with genetic risk scores based on
>> a large amount of SNPs. Sometimes it is appropriate to prune the
>> SNPs by LD before constructing the genetic risk score to obtain a
>> set of independent SNPs. The most widely used function for this
>> (plink --indep-pairwise) uses a sliding window to find pairs of
>> SNPs in high LD with each other according to their R^2 and then
>> removes one. However, it removes the SNP with the lowest minor
>> allele frequency. In practice we may instead want to keep the SNP
>> with the lowest p-value: i.e. the most important to us. This should
>> lead to higher quality genetic risk scores.____
>>
>> __ __
>>
>> I have written a function that reads in a list of SNPs and then
>> prunes it for LD by choosing which SNP to remove from a pair in
>> high LD according to its position in the snp list. So the SNP that
>> comes first in the SNP list will never be pruned out, and the
>> second one only if it is in LD with the first one. The LD structure
>> is based on your own data: it relies on the probabel configuration
>> file for filepaths, and on databel for accessing the dosages.____
>>
>> __ __
>>
>> I initially did this just to apply it to my own data, but Lennart
>> suggested it might be a nice addition to genabel. I am interested
>> in developing this further if there is indeed interest in such a
>> function. If so I will share the code and we can go from
>> there.____
>>
>> __ __
>>
>> Let me know what you think,____
>>
>> __ __
>>
>> Paul____
>>
>> __
>>
>>
>>
>> _______________________________________________ genabel-devel
>> mailing list genabel-devel at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-d
>>
>>
evel
>>
>
> -- -----------------------------------------------------------------
> L.C. Karssen Utrecht The Netherlands
>
> lennart at karssen.org http://blog.karssen.org
>
> Stuur mij aub geen Word of Powerpoint bestanden! Zie
> http://www.gnu.org/philosophy/no-word-attachments.nl.html
> ------------------------------------------------------------------
>
>
>
> _______________________________________________ genabel-devel mailing
> list genabel-devel at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
>
>
--
-----------------------------------------------------------------
L.C. Karssen
Utrecht
The Netherlands

lennart at karssen.org
http://blog.karssen.org

Stuur mij aub geen Word of Powerpoint bestanden!
Zie http://www.gnu.org/philosophy/no-word-attachments.nl.html
------------------------------------------------------------------

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 230 bytes
Desc: OpenPGP digital signature
URL: <http://lists.r-forge.r-project.org/pipermail/genabel-devel/attachments/20130318/b7c91aae/attachment-0001.sig>


More information about the genabel-devel mailing list