[GenABEL-dev] function for conversion a plink format file to a GenABEL format file
L.C. Karssen
lennart at karssen.org
Fri Nov 15 17:48:24 CET 2013
Hi Maksim,
On 15-11-13 05:53, Maksim Struchalin wrote:
> An easy way to write a function for conversion a plink format file to a
> GenABEL format file:
>
> Use plink support of 'plug-in' functions
Nice find. I didn't know that existed.
> (http://pngu.mgh.harvard.edu/~purcell/plink/rfunc.shtml). This allows us
> to write a simple R script (myscript.R) which is called by plink (plink
> --file mydata --R myscript.R). plink reads the file mydata (which is in
> plink format) and iteratively, SNP by SNP, trasfer all the data to a
> script myscript.R. This script contains a function
> Rplink(PHENO,GENO,CLUSTER,COVAR) which will take every SNP (GENO
> variable) and store it in a *flv format through calling DatABEL functions.
>
> The whole process of conversion will look like this:
>
> 1) User asks GenA convert plink file to GenA file
> 2) GenA looks weather the plink is installed. If it is not installed,
> then GenA goes to a plink site and download/install it itself (use an R
> function "download.file" from "utils" package)
> 3) GenA run a simple line: system('plink --file mydata --R myscript.R')
> 4) Rplink function (from myscript.R) gets every SNP and stote it in *flv
> format. This function creates an flv file and then open and close it for
> saving every single SNP.
> 5) Work is Done
I'm not sure how portable it is to download and run plink. Also, the
plink page says: Currently, there is only support for R-plugins for
Linux-based and Mac OS PLINK distributions.
>
> The only issue is how fast the converssion will run: how much time does
> it take to open a filvector file, store one SNP and close it? I can not
> find a DatABEL R function for adding SNP to a flv file. Is there a C
> DatABEL function which can do it?
Wouldn't it be easier/possible to use plink to export to text (.csv) and
then use filevector's txt2fvf binary (of course this could be done from
R using system())?
I'm also wondering if going per SNP is really necessary. If I understand
it correctly the R script (myscript.R) has to have a function called:
Rplink <- function(PHENO,GENO,CLUSTER,COVAR)
where GENO is the matrix of genotypes. So we could write that into a
DatABEL file at once. Of course you may want to do this per chromosome
to reduce memory consumption (not sure how plink/R would handle large
data sets).
I agree completely with Maarten that opening a filevector file for each
SNP will be an I/O killer.
Lennart.
>
> best,
> Maksim
> _______________________________________________
> genabel-devel mailing list
> genabel-devel at lists.r-forge.r-project.org
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/genabel-devel
--
-----------------------------------------------------------------
L.C. Karssen
Utrecht
The Netherlands
lennart at karssen.org
http://blog.karssen.org
Stuur mij aub geen Word of Powerpoint bestanden!
Zie http://www.gnu.org/philosophy/no-word-attachments.nl.html
------------------------------------------------------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 230 bytes
Desc: OpenPGP digital signature
URL: <http://lists.r-forge.r-project.org/pipermail/genabel-devel/attachments/20131115/74ad29c4/attachment.sig>
More information about the genabel-devel
mailing list