[GenABEL-dev] function for conversion a plink format file to a GenABEL format file
Maarten Kooyman
m.kooijman at erasmusmc.nl
Fri Nov 22 10:54:16 CET 2013
There is also a function in to read bed files in the bioconductor
snpStats package. This might be a vantage point.
see : search.bioconductor.jp/codes/6594
Maarten Kooyman
On 11/22/2013 10:04 AM, L.C. Karssen wrote:
> How difficult would it be to import .bed files [1] instead of the text
> conversion? Given the binary data of both the .bed and the GenABEL
> format, wouldn't conversion be much quicker?
>
>
> Lennart.
>
> [1] http://pngu.mgh.harvard.edu/~purcell/plink/binary.shtml
>
>
> On 11/22/2013 09:54 AM, Yurii Aulchenko wrote:
>> Too slow, too difficult for the user, or both? :)
>>
>> On Friday, November 22, 2013, Maksim Struchalin wrote:
>>
>> Yes. Looks like it was a bad idea to use plink R-plugin for
>> converting plink files to *ABEL format.
>> Maksim
>>
>> On 18/11/2013 18:48, Yury Aulchenko wrote:
>>> I would say that in principle DatABEL::text2databel is the
>>> "natural" way to go from text-files to DatABEL-files
>>>
>>> The problem is that 'regular' text input may be allele by allele,
>>> not genotype by genotype... (e.g. data are in format "A G", or
>>> "A/G", not "0" or "1" or "2").
>>>
>>> Y
>>>
>>> On Nov 15, 2013, at 17:48 PM, L.C. Karssen <lennart at karssen.org>
>>> wrote:
>>>
>>>> Hi Maksim,
>>>>
>>>> On 15-11-13 05:53, Maksim Struchalin wrote:
>>>>> An easy way to write a function for conversion a plink format
>>>>> file to a
>>>>> GenABEL format file:
>>>>>
>>>>> Use plink support of 'plug-in' functions
>>>>
>>>> Nice find. I didn't know that existed.
>>>>
>>>>> (http://pngu.mgh.harvard.edu/~purcell/plink/rfunc.shtml
>>>>> <http://pngu.mgh.harvard.edu/%7Epurcell/plink/rfunc.shtml>).
>>>>> This allows us
>>>>> to write a simple R script (myscript.R) which is called by plink
>>>>> (plink
>>>>> --file mydata --R myscript.R). plink reads the file mydata
>>>>> (which is in
>>>>> plink format) and iteratively, SNP by SNP, trasfer all the
data to a
>>>>> script myscript.R. This script contains a function
>>>>> Rplink(PHENO,GENO,CLUSTER,COVAR) which will take every SNP (GENO
>>>>> variable) and store it in a *flv format through calling DatABEL
>>>>> functions.
>>>>>
>>>>> The whole process of conversion will look like this:
>>>>>
>>>>> 1) User asks GenA convert plink file to GenA file
>>>>> 2) GenA looks weather the plink is installed. If it is not
>>>>> installed,
>>>>> then GenA goes to a plink site and download/install it itself
>>>>> (use an R
>>>>> function "download.file" from "utils" package)
>>>>> 3) GenA run a simple line: system('plink --file mydata --R
>>>>> myscript.R')
>>>>> 4) Rplink function (from myscript.R) gets every SNP and stote it
>>>>> in *flv
>>>>> format. This function creates an flv file and then open and
>>>>> close it for
>>>>> saving every single SNP.
>>>>> 5) Work is Done
>>>>
>>>> I'm not sure how portable it is to download and run plink.
Also, the
>>>> plink page says: Currently, there is only support for R-plugins for
>>>> Linux-based and Mac OS PLINK distributions.
>>>>
>>>>>
>>>>> The only issue is how fast the converssion will run: how much
>>>>> time does
>>>>> it take to open a filvector file, store one SNP and close it? I
>>>>> can not
>>>>> find a DatABEL R function for adding SNP to a flv file. Is
there a C
>>>>> DatABEL function which can do it?
>>>>
>>>> Wouldn't it be easier/possible to use plink to export to text
>>>> (.csv) and
>>>> then use filevector's txt2fvf binary (of course this could be
>>>> done from
>>>> R using system())?
>>>>
>>>> I'm also wondering if going per SNP is really necessary. If I
>>>> understand
>>>> it correctly the R script (myscript.R) has to have a function
called:
>>>> Rplink <- function(PHENO,GENO,CLUSTER,COVAR)
>>>> where GENO is the matrix of genotypes. So we could write that
into a
>>>> DatABEL file at once. Of course you may want to do this per
>>>> chromosome
>>>> to reduce memory consumption (not sure how plink/R would handle
large
>>>> data sets).
>>>>
>>
>>
>> --
>> -----------------------------------------------------
>> Yurii S. Aulchenko
>>
>> [ LinkedIn <http://nl.linkedin.com/in/yuriiaulchenko> ] [ Twitter
>> <http://twitter.com/YuriiAulchenko> ] [ Blog
>> <http://yurii-aulchenko.blogspot.nl/> ]
>>
>>
>>
>> _______________________________________________
>> genabel-devel mailing list
>> genabel-devel at lists.r-forge.r-project.org
>>
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
>>
>
>
>
> _______________________________________________
> genabel-devel mailing list
> genabel-devel at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/genabel-devel/attachments/20131122/f370788c/attachment.html>
More information about the genabel-devel
mailing list