[GenABEL-dev] function for conversion a plink format file to a GenABEL format file

Maarten Kooyman m.kooijman at erasmusmc.nl
Fri Nov 22 10:54:16 CET 2013


There is also a function in to read bed files in the bioconductor
snpStats package. This might be a vantage point.

see : search.bioconductor.jp/codes/6594

Maarten Kooyman

On 11/22/2013 10:04 AM, L.C. Karssen wrote:
> How difficult would it be to import .bed files [1] instead of the text
> conversion? Given the binary data of both the .bed and the GenABEL
> format, wouldn't conversion be much quicker?
>
>
> Lennart.
>
> [1] http://pngu.mgh.harvard.edu/~purcell/plink/binary.shtml
>
>
> On 11/22/2013 09:54 AM, Yurii Aulchenko wrote:
>> Too slow, too difficult for the user, or both? :)
>>
>> On Friday, November 22, 2013, Maksim Struchalin wrote:
>>
>>     Yes. Looks like it was a bad idea to use plink R-plugin for
>>     converting plink files to *ABEL format.
>>     Maksim
>>
>>     On 18/11/2013 18:48, Yury Aulchenko wrote:
>>>     I would say that in principle DatABEL::text2databel is the
>>>     "natural" way to go from text-files to DatABEL-files
>>>
>>>     The problem is that 'regular' text input may be allele by allele,
>>>     not genotype by genotype... (e.g. data are in format "A G", or
>>>     "A/G", not "0" or "1" or "2").
>>>
>>>     Y
>>>
>>>     On Nov 15, 2013, at 17:48 PM, L.C. Karssen <lennart at karssen.org>
>>>     wrote:
>>>
>>>>     Hi Maksim,
>>>>
>>>>     On 15-11-13 05:53, Maksim Struchalin wrote:
>>>>>     An easy way to write a function for conversion a plink format
>>>>>     file to a
>>>>>     GenABEL format file:
>>>>>
>>>>>     Use plink support of 'plug-in' functions
>>>>
>>>>     Nice find. I didn't know that existed.
>>>>
>>>>>     (http://pngu.mgh.harvard.edu/~purcell/plink/rfunc.shtml
>>>>>     <http://pngu.mgh.harvard.edu/%7Epurcell/plink/rfunc.shtml>).
>>>>>     This allows us
>>>>>     to write a simple R script (myscript.R) which is called by plink
>>>>>     (plink
>>>>>     --file mydata --R myscript.R). plink reads the file mydata
>>>>>     (which is in
>>>>>     plink format) and iteratively, SNP by SNP, trasfer all the
data to a
>>>>>     script myscript.R. This script contains a function
>>>>>     Rplink(PHENO,GENO,CLUSTER,COVAR) which will take every SNP (GENO
>>>>>     variable) and store it in a *flv format through calling DatABEL
>>>>>     functions.
>>>>>
>>>>>     The whole process of conversion will look like this:
>>>>>
>>>>>     1) User asks GenA convert plink file to GenA file
>>>>>     2) GenA looks weather the plink is installed. If it is not
>>>>>     installed,
>>>>>     then GenA goes to a plink site and download/install it itself
>>>>>     (use an R
>>>>>     function "download.file" from "utils" package)
>>>>>     3) GenA run a simple line: system('plink --file mydata --R
>>>>>     myscript.R')
>>>>>     4) Rplink function (from myscript.R) gets every SNP and stote it
>>>>>     in *flv
>>>>>     format. This function creates an flv file and then open and
>>>>>     close it for
>>>>>     saving every single SNP.
>>>>>     5) Work is Done
>>>>
>>>>     I'm not sure how portable it is to download and run plink.
Also, the
>>>>     plink page says: Currently, there is only support for R-plugins for
>>>>     Linux-based and Mac OS PLINK distributions.
>>>>
>>>>>
>>>>>     The only issue is how fast the converssion will run: how much
>>>>>     time does
>>>>>     it take to open a filvector file, store one SNP and close it? I
>>>>>     can not
>>>>>     find a DatABEL R function for adding SNP to a flv file. Is
there a C
>>>>>     DatABEL function which can do it?
>>>>
>>>>     Wouldn't it be easier/possible to use plink to export to text
>>>>     (.csv) and
>>>>     then use filevector's txt2fvf binary (of course this could be
>>>>     done from
>>>>     R using system())?
>>>>
>>>>     I'm also wondering if going per SNP is really necessary. If I
>>>>     understand
>>>>     it correctly the R script (myscript.R) has to have a function
called:
>>>>     Rplink <- function(PHENO,GENO,CLUSTER,COVAR)
>>>>     where GENO is the matrix of genotypes. So we could write that
into a
>>>>     DatABEL file at once. Of course you may want to do this per
>>>>     chromosome
>>>>     to reduce memory consumption (not sure how plink/R would handle
large
>>>>     data sets).
>>>>
>>
>>
>> --
>> -----------------------------------------------------
>> Yurii S. Aulchenko
>>
>> [ LinkedIn <http://nl.linkedin.com/in/yuriiaulchenko> ] [ Twitter
>> <http://twitter.com/YuriiAulchenko> ] [ Blog
>> <http://yurii-aulchenko.blogspot.nl/> ]
>>
>>
>>
>> _______________________________________________
>> genabel-devel mailing list
>> genabel-devel at lists.r-forge.r-project.org
>>
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
>>
>
>
>
> _______________________________________________
> genabel-devel mailing list
> genabel-devel at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/genabel-devel/attachments/20131122/f370788c/attachment.html>


More information about the genabel-devel mailing list