[GenABEL-dev] function for conversion a plink format file to a GenABEL format file

Yury Aulchenko yurii.aulchenko at gmail.com
Mon Nov 18 12:48:55 CET 2013


I would say that in principle DatABEL::text2databel is the "natural" way to go from text-files to DatABEL-files

The problem is that 'regular' text input may be allele by allele, not genotype by genotype... (e.g. data are in format "A G", or "A/G", not "0" or "1" or "2"). 

Y

On Nov 15, 2013, at 17:48 PM, L.C. Karssen <lennart at karssen.org> wrote:

> Hi Maksim,
> 
> On 15-11-13 05:53, Maksim Struchalin wrote:
>> An easy way to write a function for conversion a plink format file to a
>> GenABEL format file:
>> 
>> Use plink support of 'plug-in' functions
> 
> Nice find. I didn't know that existed.
> 
>> (http://pngu.mgh.harvard.edu/~purcell/plink/rfunc.shtml). This allows us
>> to write a simple R script (myscript.R) which is called by plink (plink
>> --file mydata --R myscript.R). plink reads the file mydata (which is in
>> plink format) and iteratively, SNP by SNP, trasfer all the data to a
>> script myscript.R. This script contains a function
>> Rplink(PHENO,GENO,CLUSTER,COVAR) which will take every SNP (GENO
>> variable) and store it in a *flv format through calling DatABEL functions.
>> 
>> The whole process of conversion will look like this:
>> 
>> 1) User asks GenA convert plink file to GenA file
>> 2) GenA looks weather the plink is installed. If it is not installed,
>> then GenA goes to a plink site and download/install it itself (use an R
>> function "download.file" from "utils" package)
>> 3) GenA run a simple line: system('plink --file mydata --R myscript.R')
>> 4) Rplink function (from myscript.R) gets every SNP and stote it in *flv
>> format. This function creates an flv file and then open and close it for
>> saving every single SNP.
>> 5) Work is Done
> 
> I'm not sure how portable it is to download and run plink. Also, the
> plink page says: Currently, there is only support for R-plugins for
> Linux-based and Mac OS PLINK distributions.
> 
>> 
>> The only issue is how fast the converssion will run: how much time does
>> it take to open a filvector file, store one SNP and close it? I can not
>> find a DatABEL R function for adding SNP to a flv file. Is there a C
>> DatABEL function which can do it?
> 
> Wouldn't it be easier/possible to use plink to export to text (.csv) and
> then use filevector's txt2fvf binary (of course this could be done from
> R using system())?
> 
> I'm also wondering if going per SNP is really necessary. If I understand
> it correctly the R script (myscript.R) has to have a function called:
> Rplink <- function(PHENO,GENO,CLUSTER,COVAR)
> where GENO is the matrix of genotypes. So we could write that into a
> DatABEL file at once. Of course you may want to do this per chromosome
> to reduce memory consumption (not sure how plink/R would handle large
> data sets).
> 
> I agree completely with Maarten that opening a filevector file for each
> SNP will be an I/O killer.
> 
> 
> Lennart.
> 
>> 
>> best,
>> Maksim
>> _______________________________________________
>> genabel-devel mailing list
>> genabel-devel at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
> 
> -- 
> -----------------------------------------------------------------
> L.C. Karssen
> Utrecht
> The Netherlands
> 
> lennart at karssen.org
> http://blog.karssen.org
> 
> Stuur mij aub geen Word of Powerpoint bestanden!
> Zie http://www.gnu.org/philosophy/no-word-attachments.nl.html
> ------------------------------------------------------------------
> 
> _______________________________________________
> genabel-devel mailing list
> genabel-devel at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/genabel-devel/attachments/20131118/1c4196d6/attachment.html>


More information about the genabel-devel mailing list