[GenABEL-dev] function for conversion a plink format file to a GenABEL format file

Yurii Aulchenko yurii.aulchenko at gmail.com
Fri Nov 22 09:54:29 CET 2013


Too slow, too difficult for the user, or both? :)

On Friday, November 22, 2013, Maksim Struchalin wrote:

>  Yes. Looks like it was a bad idea to use plink R-plugin for converting
> plink files to *ABEL format.
> Maksim
>
> On 18/11/2013 18:48, Yury Aulchenko wrote:
>
> I would say that in principle DatABEL::text2databel is the "natural" way
> to go from text-files to DatABEL-files
>
>  The problem is that 'regular' text input may be allele by allele, not
> genotype by genotype... (e.g. data are in format "A G", or "A/G", not "0"
> or "1" or "2").
>
>  Y
>
>  On Nov 15, 2013, at 17:48 PM, L.C. Karssen <lennart at karssen.org> wrote:
>
>  Hi Maksim,
>
> On 15-11-13 05:53, Maksim Struchalin wrote:
>
> An easy way to write a function for conversion a plink format file to a
> GenABEL format file:
>
> Use plink support of 'plug-in' functions
>
>
> Nice find. I didn't know that existed.
>
> (http://pngu.mgh.harvard.edu/~purcell/plink/rfunc.shtml). This allows us
> to write a simple R script (myscript.R) which is called by plink (plink
> --file mydata --R myscript.R). plink reads the file mydata (which is in
> plink format) and iteratively, SNP by SNP, trasfer all the data to a
> script myscript.R. This script contains a function
> Rplink(PHENO,GENO,CLUSTER,COVAR) which will take every SNP (GENO
> variable) and store it in a *flv format through calling DatABEL functions.
>
> The whole process of conversion will look like this:
>
> 1) User asks GenA convert plink file to GenA file
> 2) GenA looks weather the plink is installed. If it is not installed,
> then GenA goes to a plink site and download/install it itself (use an R
> function "download.file" from "utils" package)
> 3) GenA run a simple line: system('plink --file mydata --R myscript.R')
> 4) Rplink function (from myscript.R) gets every SNP and stote it in *flv
> format. This function creates an flv file and then open and close it for
> saving every single SNP.
> 5) Work is Done
>
>
> I'm not sure how portable it is to download and run plink. Also, the
> plink page says: Currently, there is only support for R-plugins for
> Linux-based and Mac OS PLINK distributions.
>
>
> The only issue is how fast the converssion will run: how much time does
> it take to open a filvector file, store one SNP and close it? I can not
> find a DatABEL R function for adding SNP to a flv file. Is there a C
> DatABEL function which can do it?
>
>
> Wouldn't it be easier/possible to use plink to export to text (.csv) and
> then use filevector's txt2fvf binary (of course this could be done from
> R using system())?
>
> I'm also wondering if going per SNP is really necessary. If I understand
> it correctly the R script (myscript.R) has to have a function called:
> Rplink <- function(PHENO,GENO,CLUSTER,COVAR)
> where GENO is the matrix of genotypes. So we could write that into a
> DatABEL file at once. Of course you may want to do this per chromosome
> to reduce memory consumption (not sure how plink/R would handle large
> data sets).
>
>

-- 
-----------------------------------------------------
Yurii S. Aulchenko

[ LinkedIn <http://nl.linkedin.com/in/yuriiaulchenko> ] [
Twitter<http://twitter.com/YuriiAulchenko>] [
Blog <http://yurii-aulchenko.blogspot.nl/> ]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/genabel-devel/attachments/20131122/0262edab/attachment.html>


More information about the genabel-devel mailing list