[GenABEL-dev] function for conversion a plink format file to a GenABEL format file
Maarten Kooyman
kooyman at gmail.com
Fri Nov 15 10:17:58 CET 2013
Hi Maksim,
This sound like a user friendly addition to convert Plink to DatABEL format.
I think open the same file for each SNP is not the way to go. If you have
a 1M array data and you want to convert it , there will be 1 million open
and close operations send to the machine. I do not have any experience with
this kind of numbers of open and close calls, but it sounds to me as an
nice try thrash your system. I could not found a lot of information about
the weight it puts on the system but here is a small piece of text about
it:
http://en.wikibooks.org/wiki/Optimizing_C%2B%2B/General_optimization_techniques/Input/Output#Open_files
A small warning for checking plink exists: there is also a utility from the
putty suite (used for ssh under Windows) called plink. On Debian systems
the plink executable is called "p-link" by default.
Kind regards,
Maarten
On Fri, Nov 15, 2013 at 5:53 AM, Maksim Struchalin
<m.v.struchalin at mail.ru>wrote:
> An easy way to write a function for conversion a plink format file to a
> GenABEL format file:
>
> Use plink support of 'plug-in' functions (http://pngu.mgh.harvard.edu/~
> purcell/plink/rfunc.shtml). This allows us to write a simple R script
> (myscript.R) which is called by plink (plink --file mydata --R myscript.R).
> plink reads the file mydata (which is in plink format) and iteratively, SNP
> by SNP, trasfer all the data to a script myscript.R. This script contains a
> function Rplink(PHENO,GENO,CLUSTER,COVAR) which will take every SNP (GENO
> variable) and store it in a *flv format through calling DatABEL functions.
>
> The whole process of conversion will look like this:
>
> 1) User asks GenA convert plink file to GenA file
> 2) GenA looks weather the plink is installed. If it is not installed, then
> GenA goes to a plink site and download/install it itself (use an R function
> "download.file" from "utils" package)
> 3) GenA run a simple line: system('plink --file mydata --R myscript.R')
> 4) Rplink function (from myscript.R) gets every SNP and stote it in *flv
> format. This function creates an flv file and then open and close it for
> saving every single SNP.
> 5) Work is Done
>
> The only issue is how fast the converssion will run: how much time does it
> take to open a filvector file, store one SNP and close it? I can not find a
> DatABEL R function for adding SNP to a flv file. Is there a C DatABEL
> function which can do it?
>
> best,
> Maksim
> _______________________________________________
> genabel-devel mailing list
> genabel-devel at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/genabel-devel/attachments/20131115/896351bf/attachment.html>
More information about the genabel-devel
mailing list