[GenABEL-dev] function for conversion a plink format file to a GenABEL format file
Yury Aulchenko
yurii.aulchenko at gmail.com
Mon Nov 18 12:48:55 CET 2013
I would say that in principle DatABEL::text2databel is the "natural" way to go from text-files to DatABEL-files
The problem is that 'regular' text input may be allele by allele, not genotype by genotype... (e.g. data are in format "A G", or "A/G", not "0" or "1" or "2").
Y
On Nov 15, 2013, at 17:48 PM, L.C. Karssen <lennart at karssen.org> wrote:
> Hi Maksim,
>
> On 15-11-13 05:53, Maksim Struchalin wrote:
>> An easy way to write a function for conversion a plink format file to a
>> GenABEL format file:
>>
>> Use plink support of 'plug-in' functions
>
> Nice find. I didn't know that existed.
>
>> (http://pngu.mgh.harvard.edu/~purcell/plink/rfunc.shtml). This allows us
>> to write a simple R script (myscript.R) which is called by plink (plink
>> --file mydata --R myscript.R). plink reads the file mydata (which is in
>> plink format) and iteratively, SNP by SNP, trasfer all the data to a
>> script myscript.R. This script contains a function
>> Rplink(PHENO,GENO,CLUSTER,COVAR) which will take every SNP (GENO
>> variable) and store it in a *flv format through calling DatABEL functions.
>>
>> The whole process of conversion will look like this:
>>
>> 1) User asks GenA convert plink file to GenA file
>> 2) GenA looks weather the plink is installed. If it is not installed,
>> then GenA goes to a plink site and download/install it itself (use an R
>> function "download.file" from "utils" package)
>> 3) GenA run a simple line: system('plink --file mydata --R myscript.R')
>> 4) Rplink function (from myscript.R) gets every SNP and stote it in *flv
>> format. This function creates an flv file and then open and close it for
>> saving every single SNP.
>> 5) Work is Done
>
> I'm not sure how portable it is to download and run plink. Also, the
> plink page says: Currently, there is only support for R-plugins for
> Linux-based and Mac OS PLINK distributions.
>
>>
>> The only issue is how fast the converssion will run: how much time does
>> it take to open a filvector file, store one SNP and close it? I can not
>> find a DatABEL R function for adding SNP to a flv file. Is there a C
>> DatABEL function which can do it?
>
> Wouldn't it be easier/possible to use plink to export to text (.csv) and
> then use filevector's txt2fvf binary (of course this could be done from
> R using system())?
>
> I'm also wondering if going per SNP is really necessary. If I understand
> it correctly the R script (myscript.R) has to have a function called:
> Rplink <- function(PHENO,GENO,CLUSTER,COVAR)
> where GENO is the matrix of genotypes. So we could write that into a
> DatABEL file at once. Of course you may want to do this per chromosome
> to reduce memory consumption (not sure how plink/R would handle large
> data sets).
>
> I agree completely with Maarten that opening a filevector file for each
> SNP will be an I/O killer.
>
>
> Lennart.
>
>>
>> best,
>> Maksim
>> _______________________________________________
>> genabel-devel mailing list
>> genabel-devel at lists.r-forge.r-project.org
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
>
> --
> -----------------------------------------------------------------
> L.C. Karssen
> Utrecht
> The Netherlands
>
> lennart at karssen.org
> http://blog.karssen.org
>
> Stuur mij aub geen Word of Powerpoint bestanden!
> Zie http://www.gnu.org/philosophy/no-word-attachments.nl.html
> ------------------------------------------------------------------
>
> _______________________________________________
> genabel-devel mailing list
> genabel-devel at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/genabel-devel/attachments/20131118/1c4196d6/attachment.html>
More information about the genabel-devel
mailing list