[GenABEL-dev] function for conversion a plink format file to a GenABEL format file

L.C. Karssen lennart at karssen.org
Fri Nov 22 11:08:16 CET 2013


Thanks Maarten, that's a good finding. It does seem to return the data
(incl. genotype data) in a list. I'm not sure how well that will work
RAM-wise for large data sets. On the other hand the function does allow
SNP selection, so maybe conversion could be done per chromosome.



Lennart.

On 11/22/2013 10:54 AM, Maarten Kooyman wrote:
> There is also a function in to read bed files in the bioconductor
> snpStats package. This might be a vantage point.
> 
> see : search.bioconductor.jp/codes/6594
> 
> Maarten Kooyman
> 
> On 11/22/2013 10:04 AM, L.C. Karssen wrote:
>> How difficult would it be to import .bed files [1] instead of the text
>> conversion? Given the binary data of both the .bed and the GenABEL
>> format, wouldn't conversion be much quicker?
>>
>>
>> Lennart.
>>
>> [1] http://pngu.mgh.harvard.edu/~purcell/plink/binary.shtml
>>
>>
>> On 11/22/2013 09:54 AM, Yurii Aulchenko wrote:
>>> Too slow, too difficult for the user, or both? :)
>>>
>>> On Friday, November 22, 2013, Maksim Struchalin wrote:
>>>
>>>     Yes. Looks like it was a bad idea to use plink R-plugin for
>>>     converting plink files to *ABEL format.
>>>     Maksim
>>>
>>>     On 18/11/2013 18:48, Yury Aulchenko wrote:
>>>>     I would say that in principle DatABEL::text2databel is the
>>>>     "natural" way to go from text-files to DatABEL-files
>>>>
>>>>     The problem is that 'regular' text input may be allele by allele,
>>>>     not genotype by genotype... (e.g. data are in format "A G", or
>>>>     "A/G", not "0" or "1" or "2").
>>>>
>>>>     Y
>>>>
>>>>     On Nov 15, 2013, at 17:48 PM, L.C. Karssen <lennart at karssen.org>
>>>>     wrote:
>>>>
>>>>>     Hi Maksim,
>>>>>
>>>>>     On 15-11-13 05:53, Maksim Struchalin wrote:
>>>>>>     An easy way to write a function for conversion a plink format
>>>>>>     file to a
>>>>>>     GenABEL format file:
>>>>>>
>>>>>>     Use plink support of 'plug-in' functions
>>>>>
>>>>>     Nice find. I didn't know that existed.
>>>>>
>>>>>>     (http://pngu.mgh.harvard.edu/~purcell/plink/rfunc.shtml
>>>>>>     <http://pngu.mgh.harvard.edu/%7Epurcell/plink/rfunc.shtml>).
>>>>>>     This allows us
>>>>>>     to write a simple R script (myscript.R) which is called by plink
>>>>>>     (plink
>>>>>>     --file mydata --R myscript.R). plink reads the file mydata
>>>>>>     (which is in
>>>>>>     plink format) and iteratively, SNP by SNP, trasfer all the
> data to a
>>>>>>     script myscript.R. This script contains a function
>>>>>>     Rplink(PHENO,GENO,CLUSTER,COVAR) which will take every SNP (GENO
>>>>>>     variable) and store it in a *flv format through calling DatABEL
>>>>>>     functions.
>>>>>>
>>>>>>     The whole process of conversion will look like this:
>>>>>>
>>>>>>     1) User asks GenA convert plink file to GenA file
>>>>>>     2) GenA looks weather the plink is installed. If it is not
>>>>>>     installed,
>>>>>>     then GenA goes to a plink site and download/install it itself
>>>>>>     (use an R
>>>>>>     function "download.file" from "utils" package)
>>>>>>     3) GenA run a simple line: system('plink --file mydata --R
>>>>>>     myscript.R')
>>>>>>     4) Rplink function (from myscript.R) gets every SNP and stote it
>>>>>>     in *flv
>>>>>>     format. This function creates an flv file and then open and
>>>>>>     close it for
>>>>>>     saving every single SNP.
>>>>>>     5) Work is Done
>>>>>
>>>>>     I'm not sure how portable it is to download and run plink.
> Also, the
>>>>>     plink page says: Currently, there is only support for R-plugins for
>>>>>     Linux-based and Mac OS PLINK distributions.
>>>>>
>>>>>>
>>>>>>     The only issue is how fast the converssion will run: how much
>>>>>>     time does
>>>>>>     it take to open a filvector file, store one SNP and close it? I
>>>>>>     can not
>>>>>>     find a DatABEL R function for adding SNP to a flv file. Is
> there a C
>>>>>>     DatABEL function which can do it?
>>>>>
>>>>>     Wouldn't it be easier/possible to use plink to export to text
>>>>>     (.csv) and
>>>>>     then use filevector's txt2fvf binary (of course this could be
>>>>>     done from
>>>>>     R using system())?
>>>>>
>>>>>     I'm also wondering if going per SNP is really necessary. If I
>>>>>     understand
>>>>>     it correctly the R script (myscript.R) has to have a function
> called:
>>>>>     Rplink <- function(PHENO,GENO,CLUSTER,COVAR)
>>>>>     where GENO is the matrix of genotypes. So we could write that
> into a
>>>>>     DatABEL file at once. Of course you may want to do this per
>>>>>     chromosome
>>>>>     to reduce memory consumption (not sure how plink/R would handle
> large
>>>>>     data sets).
>>>>>
>>>
>>>
>>> --
>>> -----------------------------------------------------
>>> Yurii S. Aulchenko
>>>
>>> [ LinkedIn <http://nl.linkedin.com/in/yuriiaulchenko> ] [ Twitter
>>> <http://twitter.com/YuriiAulchenko> ] [ Blog
>>> <http://yurii-aulchenko.blogspot.nl/> ]
>>>
>>>
>>>
>>> _______________________________________________
>>> genabel-devel mailing list
>>> genabel-devel at lists.r-forge.r-project.org
>>>
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
>>>
>>
>>
>>
>> _______________________________________________
>> genabel-devel mailing list
>> genabel-devel at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
> 
> 
> 
> 
> _______________________________________________
> genabel-devel mailing list
> genabel-devel at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
> 

-- 
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
L.C. Karssen
Utrecht
The Netherlands

lennart at karssen.org
http://blog.karssen.org
GPG key ID: A88F554A
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 230 bytes
Desc: OpenPGP digital signature
URL: <http://lists.r-forge.r-project.org/pipermail/genabel-devel/attachments/20131122/c500a584/attachment.sig>


More information about the genabel-devel mailing list