[Genabel-commits] r1265 - tutorials/OmicABEL
noreply at r-forge.r-project.org
noreply at r-forge.r-project.org
Mon Jul 1 14:33:29 CEST 2013
Author: yurii
Date: 2013-07-01 14:33:29 +0200 (Mon, 01 Jul 2013)
New Revision: 1265
Modified:
tutorials/OmicABEL/exampleOfUse.org
Log:
update of the OmicABEL example/tutorial (stressing the use of DOUBLE)
Modified: tutorials/OmicABEL/exampleOfUse.org
===================================================================
--- tutorials/OmicABEL/exampleOfUse.org 2013-07-01 08:56:30 UTC (rev 1264)
+++ tutorials/OmicABEL/exampleOfUse.org 2013-07-01 12:33:29 UTC (rev 1265)
@@ -4,17 +4,69 @@
#+property: exports both
#+property: eval never
-* Outline
+* Important note on data format for OmicABEL
+
+The example of use provided below make use of the data provided
+together with the GenABEL-package. Note that when working with your
+data you do NOT need to have the data in the GenABEL-package format,
+and you do not need to follow this procedure to get data usable
+for OmicABEL; this is a simple example explaining the input and
+output files and usage of OmicABEL.
+
+The major fact you need to remember is that OmicABEL makes use of
+the 'filevector' (aka 'DatABEL') "DOUBLE" format for input.
+Hence, again, you do NOT have to have your data in GenABEL format,
+but you do need to get files into filevector/DatABEL "DOUBLE".
+
+The major issue is getting the (usually vast amounts of) genotypic
+data in right format. To get to right format, in real life we
+recommend that you use one of the GenABEL conversion procedures to
+convert your data from IMPUTE, MACH, or MiniMac to DatABEL/filevector.
+The corresponding GenABEL-package functions are =impute2databel=,
+=mach2databel=, and =minimac2databel=. Make, however, sure that you
+use the dataOutType = "DOUBLE" argument when argument when doing
+the genotype data conversion! For example (in R),
+
+#+begin_src R :eval never
+mach2databel(imputedgenofile = "f1.mldose", mlinfofile = "f1.mlinfo",
+ outfile="f1", dataOutType = "DOUBLE")
+#+end_src
+
+ This argument is added to the GenABEL-package since
+version >=1.7-7.
+
+If you have already have genotypic data in filevector/DatABEL format,
+but these are in the "FLOAT" format (default option for the =xxx2databel=
+procedures), you can use the =float2double= utility provided together
+with OmicABEL to convert to "DOUBLE". For example, if you have your
+genotypic data in (filevector-FLOAT) files =myData.fvi= and =myData.fvd=,
+you can convert them to DOUBLE by using (from command line):
+
+#+begin_src sh :eval never
+float2double myData myDataDouble
+#+end_src
+
+After which you will get files =myDataDouble.fvi= and
+=myDataDouble.fvd= (note that the size of these files is roughly
+double the size of floats - take care you have enough HDD space).
+
+
+* Outline of the example
In this example, we will use the data set distributed with GenABEL to
-show the use of OmicABEL. Hence you need to have [[http://www.genabel.org/packages/GenABEL][GenABEL package]]
-installed on your system. You will also need the [[http://www.genabel.org/packages/DatABEL][DatABEL package]] for
+show the use of OmicABEL. Hence you need to have
+[[http://www.genabel.org/packages/GenABEL][GenABEL package]]
+installed on your system. You will also need the
+[[http://www.genabel.org/packages/DatABEL][DatABEL package]] for
data manipulations and 'mvtnorm' package (simulation of traits)
installed.
-For conversion of files to the FaST-LMM format, you will need [[http://pngu.mgh.harvard.edu/~purcell/plink/download.shtml][PLINK]]
-installed, you will also need [[http://research.microsoft.com/en-us/um/redmond/projects/MSCompBio/Fastlmm/][FaST-LMM]] if you'd like to run the
-comparison.
+For conversion of files to the FaST-LMM format, you will need
+[[http://pngu.mgh.harvard.edu/~purcell/plink/download.shtml][PLINK]]
+installed, you will also need
+[[http://research.microsoft.com/en-us/um/redmond/projects/MSCompBio/Fastlmm/][FaST-LMM]]
+if you'd like to run the comparison.
+
* Prepare the data for analysis
:PROPERTIES:
:session: generateData
@@ -188,19 +240,6 @@
: Ncells 364098 19.5 667722 35.7 467875 25.0
: Vcells 3105774 23.7 17089802 130.4 20039530 152.9
-*IMPORTANT* Note that in real life you are most likley to use one
-of the GenABEL conversion procedures to convert your data from IMPUTE,
-MACH, or MiniMac to DatABEL/filevector. The functions are
-impute2databel, mach2databel, and minimac2databel.
-
-Make sure you use the dataOutType = "DOUBLE"
-argument when doing genotype data conversion! - OmicABEL currently
-accepts only DOUBLE format for all inputs.
-
-If you have already converted the data using the "FLOAT" type, you
-will need to convert that to "DOUBLE" (send us an email, we are planning
-to write a simple float2double converter).
-
** Export the data in format for FaST-LMM
We are going to compare the OmicABEL results with FaST-LMM, and
therefore will export the results in a format usable for FaST-LMM as
@@ -217,7 +256,8 @@
#+begin_src R
falmmCov <- cbind(1:nids(df),idnames(df),phdata(df)[,c("sex","age")])
falmmCov[1:3,]
-write.table(falmmCov,file="plink.cov",col.names=FALSE,row.names=FALSE,quote=FALSE)
+write.table(falmmCov,file="plink.cov",col.names=FALSE,row.names=FALSE,
+ quote=FALSE)
#+end_src
#+RESULTS:
@@ -229,7 +269,8 @@
Export phenotypes
#+begin_src R
falmmPhe <- cbind(1:nids(df),idnames(df),myPhenos)
-write.table(falmmPhe,file="plink.phe",col.names=FALSE,row.names=FALSE,quote=FALSE)
+write.table(falmmPhe,file="plink.phe",col.names=FALSE,row.names=FALSE,
+ quote=FALSE)
#+end_src
#+RESULTS:
@@ -240,7 +281,8 @@
nms <- paste(1:nids(df),idnames(df))
colnames(falmmRel) <- nms
falmmRel <- cbind(var=nms,falmmRel)
-write.table(falmmRel,file="plink.sim",col.names=TRUE,row.names=FALSE,quote=FALSE,sep="\t")
+write.table(falmmRel,file="plink.sim",col.names=TRUE,row.names=FALSE,
+ quote=FALSE,sep="\t")
#+end_src
#+RESULTS:
@@ -385,7 +427,8 @@
(this takes about 20 seconds)
-Easy, ergh?! Note that time does not add up - doing $N$ phenotypes is much faster then doing $N$ time one phenotype!
+Easy, ergh?! Note that time does not add up - doing $N$ phenotypes
+is much faster then doing $N$ time one phenotype!
** Extract the data in text format
More information about the Genabel-commits
mailing list