[GenABEL-dev] function for conversion a plink format file to a GenABEL format file
Maksim Struchalin
m.v.struchalin at mail.ru
Mon Nov 25 15:39:21 CET 2013
I checked the read.plink from snpMatrix (Nicola) and snpStats (Maarten).
I see that the code under them is quite simple (~40 lines of c code
under snpMatrix read.plink).
The bed plink format is very similar to GenABEL format
(http://pngu.mgh.harvard.edu/~purcell/plink/binary.shtml). Looks like
that the main difference between them is that the plink bed file has
first 3 bytes with some special meaning. The other bytes store genotypes
(0, 1, 2 or NA) in 2 bits per genotype (like in GenA).
I think it would be easy just to write a C function which convert bed to
databel format. Also, we can think about making the bed as the format
which is nativelly supported by genabel. For this, we only need a
function which extract an array from bed and make iterator to use this
function.
best,
Maksim
On 22/11/2013 23:51, Yurii Aulchenko wrote:
> Great idea
>
> I know nothing of plink bin format, but many packages make use of it,
> so it should be not that complicated. Also plink is gnu GPL if I
> remember correctly so we can use the code if needed
>
> Y
>
> On Friday, November 22, 2013, L.C. Karssen wrote:
>
> How difficult would it be to import .bed files [1] instead of the text
> conversion? Given the binary data of both the .bed and the GenABEL
> format, wouldn't conversion be much quicker?
>
>
> Lennart.
>
> [1] http://pngu.mgh.harvard.edu/~purcell/plink/binary.shtml
> <http://pngu.mgh.harvard.edu/%7Epurcell/plink/binary.shtml>
>
>
> On 11/22/2013 09:54 AM, Yurii Aulchenko wrote:
> > Too slow, too difficult for the user, or both? :)
> >
> > On Friday, November 22, 2013, Maksim Struchalin wrote:
> >
> > Yes. Looks like it was a bad idea to use plink R-plugin for
> > converting plink files to *ABEL format.
> > Maksim
> >
> > On 18/11/2013 18:48, Yury Aulchenko wrote:
> >> I would say that in principle DatABEL::text2databel is the
> >> "natural" way to go from text-files to DatABEL-files
> >>
> >> The problem is that 'regular' text input may be allele by
> allele,
> >> not genotype by genotype... (e.g. data are in format "A G", or
> >> "A/G", not "0" or "1" or "2").
> >>
> >> Y
> >>
> >> On Nov 15, 2013, at 17:48 PM, L.C. Karssen
> <lennart at karssen.org <javascript:;>>
> >> wrote:
> >>
> >>> Hi Maksim,
> >>>
> >>> On 15-11-13 05:53, Maksim Struchalin wrote:
> >>>> An easy way to write a function for conversion a plink format
> >>>> file to a
> >>>> GenABEL format file:
> >>>>
> >>>> Use plink support of 'plug-in' functions
> >>>
> >>> Nice find. I didn't know that existed.
> >>>
> >>>> (http://pngu.mgh.harvard.edu/~purcell/plink/rfunc.shtml
> <http://pngu.mgh.harvard.edu/%7Epurcell/plink/rfunc.shtml>
> >>>> <http://pngu.mgh.harvard.edu/%7Epurcell/plink/rfunc.shtml>).
> >>>> This allows us
> >>>> to write a simple R script (myscript.R) which is called
> by plink
> >>>> (plink
> >>>> --file mydata --R myscript.R). plink reads the file mydata
> >>>> (which is in
> >>>> plink format) and iteratively, SNP by SNP, trasfer all
> the data to a
> >>>> script myscript.R. This script contains a function
> >>>> Rplink(PHENO,GENO,CLUSTER,COVAR) which will take every
> SNP (GENO
> >>>> variable) and store it in a *flv format through calling
> DatABEL
> >>>> functions.
> >>>>
> >>>> The whole process of conversion will look like this:
> >>>>
> >>>> 1) User asks GenA convert plink file to GenA file
> >>>> 2) GenA looks weather the plink is installed. If it is not
> >>>> installed,
> >>>> then GenA goes to a plink site and download/install it itself
> >>>> (use an R
> >>>> function "download.file" from "utils" package)
> >>>> 3) GenA run a simple line: system('plink --file mydata --R
> >>>> myscript.R')
> >>>> 4) Rplink function (from myscript.R) gets every SNP and
> stote it
> >>>> in *flv
> >>>> format. This function creates an flv file and then open and
> >>>> close it for
> >>>> saving every single SNP.
> >>>> 5) Work is Done
> >>>
> >>> I'm not sure how portable it is to download and run plink.
> Also, the
> >>> plink page says: Currently, there is only support for
> R-plugins for
> >>> Linux-based and Mac OS PLINK distributions.
> >>>
> >>>>
> >>>> The only issue is how fast the converssion will run: how much
> >>>> time does
> >>>> it take to open a filvector file, store one SNP and close
> it? I
> >>>> can not
> >>>> find a DatABEL R function for adding SNP to a flv file.
> Is there a C
> >>>> DatABEL function which can do it?
> >>>
> >>> Wouldn't it be easier/possible to use plink to export to text
> >>> (.csv) and
> >>> then use filevector's txt2fvf binary (of course this could be
> >>> done from
> >>> R using system())?
> >>>
> >>> I'm also wondering if going per SNP is really necessary. If I
> >>> understand
> >>> it correctly the R script (myscript.R) has to have a
> function called:
> >>> Rplink <- function(PHENO,GENO,CLUSTER,COVAR)
> >>> where GENO is the matrix of genotypes. So we could write
> that into a
> >>> DatABEL file at once. Of course you may want to do this per
> >>> chromosome
> >>> to reduce memory consumption (not sure how plink/R would
> handle large
> >>> data sets).
> >>>
> >
> >
> > --
> > -----------------------------------------------------
> > Yurii S. Aulchenko
> >
> > [ LinkedIn <http://nl.linkedin.com/in/yuriiaulchenko> ] [ Twitter
> > <http://twitter.com/YuriiAulchenko> ] [ Blog
> > <http://yurii-aulchenko.blogspot.nl/> ]
> >
> >
> >
> > _______________________________________________
> > genabel-devel mailing list
> > genabel-devel at lists.r-forge.r-project.org <javascript:;>
> >
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
> >
>
> --
> *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
> L.C. Karssen
> Utrecht
> The Netherlands
>
> lennart at karssen.org <javascript:;>
> http://blog.karssen.org
> GPG key ID: A88F554A
> -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
>
>
>
> --
> -----------------------------------------------------
> Yurii S. Aulchenko
>
> [ LinkedIn <http://nl.linkedin.com/in/yuriiaulchenko> ] [ Twitter
> <http://twitter.com/YuriiAulchenko> ] [ Blog
> <http://yurii-aulchenko.blogspot.nl/> ]
>
>
>
> _______________________________________________
> genabel-devel mailing list
> genabel-devel at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/genabel-devel/attachments/20131125/90581f57/attachment.html>
More information about the genabel-devel
mailing list