[GenABEL-dev] function for conversion a plink format file to a GenABEL format file
L.C. Karssen
lennart at karssen.org
Fri Nov 22 10:04:30 CET 2013
How difficult would it be to import .bed files [1] instead of the text
conversion? Given the binary data of both the .bed and the GenABEL
format, wouldn't conversion be much quicker?
Lennart.
[1] http://pngu.mgh.harvard.edu/~purcell/plink/binary.shtml
On 11/22/2013 09:54 AM, Yurii Aulchenko wrote:
> Too slow, too difficult for the user, or both? :)
>
> On Friday, November 22, 2013, Maksim Struchalin wrote:
>
> Yes. Looks like it was a bad idea to use plink R-plugin for
> converting plink files to *ABEL format.
> Maksim
>
> On 18/11/2013 18:48, Yury Aulchenko wrote:
>> I would say that in principle DatABEL::text2databel is the
>> "natural" way to go from text-files to DatABEL-files
>>
>> The problem is that 'regular' text input may be allele by allele,
>> not genotype by genotype... (e.g. data are in format "A G", or
>> "A/G", not "0" or "1" or "2").
>>
>> Y
>>
>> On Nov 15, 2013, at 17:48 PM, L.C. Karssen <lennart at karssen.org>
>> wrote:
>>
>>> Hi Maksim,
>>>
>>> On 15-11-13 05:53, Maksim Struchalin wrote:
>>>> An easy way to write a function for conversion a plink format
>>>> file to a
>>>> GenABEL format file:
>>>>
>>>> Use plink support of 'plug-in' functions
>>>
>>> Nice find. I didn't know that existed.
>>>
>>>> (http://pngu.mgh.harvard.edu/~purcell/plink/rfunc.shtml
>>>> <http://pngu.mgh.harvard.edu/%7Epurcell/plink/rfunc.shtml>).
>>>> This allows us
>>>> to write a simple R script (myscript.R) which is called by plink
>>>> (plink
>>>> --file mydata --R myscript.R). plink reads the file mydata
>>>> (which is in
>>>> plink format) and iteratively, SNP by SNP, trasfer all the data to a
>>>> script myscript.R. This script contains a function
>>>> Rplink(PHENO,GENO,CLUSTER,COVAR) which will take every SNP (GENO
>>>> variable) and store it in a *flv format through calling DatABEL
>>>> functions.
>>>>
>>>> The whole process of conversion will look like this:
>>>>
>>>> 1) User asks GenA convert plink file to GenA file
>>>> 2) GenA looks weather the plink is installed. If it is not
>>>> installed,
>>>> then GenA goes to a plink site and download/install it itself
>>>> (use an R
>>>> function "download.file" from "utils" package)
>>>> 3) GenA run a simple line: system('plink --file mydata --R
>>>> myscript.R')
>>>> 4) Rplink function (from myscript.R) gets every SNP and stote it
>>>> in *flv
>>>> format. This function creates an flv file and then open and
>>>> close it for
>>>> saving every single SNP.
>>>> 5) Work is Done
>>>
>>> I'm not sure how portable it is to download and run plink. Also, the
>>> plink page says: Currently, there is only support for R-plugins for
>>> Linux-based and Mac OS PLINK distributions.
>>>
>>>>
>>>> The only issue is how fast the converssion will run: how much
>>>> time does
>>>> it take to open a filvector file, store one SNP and close it? I
>>>> can not
>>>> find a DatABEL R function for adding SNP to a flv file. Is there a C
>>>> DatABEL function which can do it?
>>>
>>> Wouldn't it be easier/possible to use plink to export to text
>>> (.csv) and
>>> then use filevector's txt2fvf binary (of course this could be
>>> done from
>>> R using system())?
>>>
>>> I'm also wondering if going per SNP is really necessary. If I
>>> understand
>>> it correctly the R script (myscript.R) has to have a function called:
>>> Rplink <- function(PHENO,GENO,CLUSTER,COVAR)
>>> where GENO is the matrix of genotypes. So we could write that into a
>>> DatABEL file at once. Of course you may want to do this per
>>> chromosome
>>> to reduce memory consumption (not sure how plink/R would handle large
>>> data sets).
>>>
>
>
> --
> -----------------------------------------------------
> Yurii S. Aulchenko
>
> [ LinkedIn <http://nl.linkedin.com/in/yuriiaulchenko> ] [ Twitter
> <http://twitter.com/YuriiAulchenko> ] [ Blog
> <http://yurii-aulchenko.blogspot.nl/> ]
>
>
>
> _______________________________________________
> genabel-devel mailing list
> genabel-devel at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
>
--
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
L.C. Karssen
Utrecht
The Netherlands
lennart at karssen.org
http://blog.karssen.org
GPG key ID: A88F554A
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 230 bytes
Desc: OpenPGP digital signature
URL: <http://lists.r-forge.r-project.org/pipermail/genabel-devel/attachments/20131122/ad948595/attachment-0001.sig>
More information about the genabel-devel
mailing list