[GenABEL-dev] function for conversion a plink format file to a GenABEL format file

Yurii Aulchenko yurii.aulchenko at gmail.com
Fri Nov 22 17:51:53 CET 2013


Great idea

I know nothing of plink bin format, but many packages make use of it, so it
should be not that complicated. Also plink is gnu GPL if I remember
correctly so we can use the code if needed

Y

On Friday, November 22, 2013, L.C. Karssen wrote:

> How difficult would it be to import .bed files [1] instead of the text
> conversion? Given the binary data of both the .bed and the GenABEL
> format, wouldn't conversion be much quicker?
>
>
> Lennart.
>
> [1] http://pngu.mgh.harvard.edu/~purcell/plink/binary.shtml
>
>
> On 11/22/2013 09:54 AM, Yurii Aulchenko wrote:
> > Too slow, too difficult for the user, or both? :)
> >
> > On Friday, November 22, 2013, Maksim Struchalin wrote:
> >
> >     Yes. Looks like it was a bad idea to use plink R-plugin for
> >     converting plink files to *ABEL format.
> >     Maksim
> >
> >     On 18/11/2013 18:48, Yury Aulchenko wrote:
> >>     I would say that in principle DatABEL::text2databel is the
> >>     "natural" way to go from text-files to DatABEL-files
> >>
> >>     The problem is that 'regular' text input may be allele by allele,
> >>     not genotype by genotype... (e.g. data are in format "A G", or
> >>     "A/G", not "0" or "1" or "2").
> >>
> >>     Y
> >>
> >>     On Nov 15, 2013, at 17:48 PM, L.C. Karssen <lennart at karssen.org<javascript:;>
> >
> >>     wrote:
> >>
> >>>     Hi Maksim,
> >>>
> >>>     On 15-11-13 05:53, Maksim Struchalin wrote:
> >>>>     An easy way to write a function for conversion a plink format
> >>>>     file to a
> >>>>     GenABEL format file:
> >>>>
> >>>>     Use plink support of 'plug-in' functions
> >>>
> >>>     Nice find. I didn't know that existed.
> >>>
> >>>>     (http://pngu.mgh.harvard.edu/~purcell/plink/rfunc.shtml
> >>>>     <http://pngu.mgh.harvard.edu/%7Epurcell/plink/rfunc.shtml>).
> >>>>     This allows us
> >>>>     to write a simple R script (myscript.R) which is called by plink
> >>>>     (plink
> >>>>     --file mydata --R myscript.R). plink reads the file mydata
> >>>>     (which is in
> >>>>     plink format) and iteratively, SNP by SNP, trasfer all the data
> to a
> >>>>     script myscript.R. This script contains a function
> >>>>     Rplink(PHENO,GENO,CLUSTER,COVAR) which will take every SNP (GENO
> >>>>     variable) and store it in a *flv format through calling DatABEL
> >>>>     functions.
> >>>>
> >>>>     The whole process of conversion will look like this:
> >>>>
> >>>>     1) User asks GenA convert plink file to GenA file
> >>>>     2) GenA looks weather the plink is installed. If it is not
> >>>>     installed,
> >>>>     then GenA goes to a plink site and download/install it itself
> >>>>     (use an R
> >>>>     function "download.file" from "utils" package)
> >>>>     3) GenA run a simple line: system('plink --file mydata --R
> >>>>     myscript.R')
> >>>>     4) Rplink function (from myscript.R) gets every SNP and stote it
> >>>>     in *flv
> >>>>     format. This function creates an flv file and then open and
> >>>>     close it for
> >>>>     saving every single SNP.
> >>>>     5) Work is Done
> >>>
> >>>     I'm not sure how portable it is to download and run plink. Also,
> the
> >>>     plink page says: Currently, there is only support for R-plugins for
> >>>     Linux-based and Mac OS PLINK distributions.
> >>>
> >>>>
> >>>>     The only issue is how fast the converssion will run: how much
> >>>>     time does
> >>>>     it take to open a filvector file, store one SNP and close it? I
> >>>>     can not
> >>>>     find a DatABEL R function for adding SNP to a flv file. Is there
> a C
> >>>>     DatABEL function which can do it?
> >>>
> >>>     Wouldn't it be easier/possible to use plink to export to text
> >>>     (.csv) and
> >>>     then use filevector's txt2fvf binary (of course this could be
> >>>     done from
> >>>     R using system())?
> >>>
> >>>     I'm also wondering if going per SNP is really necessary. If I
> >>>     understand
> >>>     it correctly the R script (myscript.R) has to have a function
> called:
> >>>     Rplink <- function(PHENO,GENO,CLUSTER,COVAR)
> >>>     where GENO is the matrix of genotypes. So we could write that into
> a
> >>>     DatABEL file at once. Of course you may want to do this per
> >>>     chromosome
> >>>     to reduce memory consumption (not sure how plink/R would handle
> large
> >>>     data sets).
> >>>
> >
> >
> > --
> > -----------------------------------------------------
> > Yurii S. Aulchenko
> >
> > [ LinkedIn <http://nl.linkedin.com/in/yuriiaulchenko> ] [ Twitter
> > <http://twitter.com/YuriiAulchenko> ] [ Blog
> > <http://yurii-aulchenko.blogspot.nl/> ]
> >
> >
> >
> > _______________________________________________
> > genabel-devel mailing list
> > genabel-devel at lists.r-forge.r-project.org <javascript:;>
> >
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
> >
>
> --
> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
> L.C. Karssen
> Utrecht
> The Netherlands
>
> lennart at karssen.org <javascript:;>
> http://blog.karssen.org
> GPG key ID: A88F554A
> -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
>
>

-- 
-----------------------------------------------------
Yurii S. Aulchenko

[ LinkedIn <http://nl.linkedin.com/in/yuriiaulchenko> ] [
Twitter<http://twitter.com/YuriiAulchenko>] [
Blog <http://yurii-aulchenko.blogspot.nl/> ]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/genabel-devel/attachments/20131122/a021e1aa/attachment.html>


More information about the genabel-devel mailing list