[Genabel-commits] r1265 - tutorials/OmicABEL

Mon Jul 1 14:33:29 CEST 2013

Author: yurii
Date: 2013-07-01 14:33:29 +0200 (Mon, 01 Jul 2013)
New Revision: 1265

Modified:
   tutorials/OmicABEL/exampleOfUse.org
Log:
update of the OmicABEL example/tutorial (stressing the use of DOUBLE)

Modified: tutorials/OmicABEL/exampleOfUse.org
===================================================================

--- tutorials/OmicABEL/exampleOfUse.org	2013-07-01 08:56:30 UTC (rev 1264)
+++ tutorials/OmicABEL/exampleOfUse.org	2013-07-01 12:33:29 UTC (rev 1265)
@@ -4,17 +4,69 @@
 #+property: exports both
 #+property: eval never
 
-* Outline
+* Important note on data format for OmicABEL
+
+The example of use provided below make use of the data provided 
+together with the GenABEL-package. Note that when working with your 
+data you do NOT need to have the data in the GenABEL-package format, 
+and you do not need to follow this procedure to get data usable
+for OmicABEL; this is a simple example explaining the input and 
+output files and usage of OmicABEL. 
+
+The major fact you need to remember is that OmicABEL makes use of 
+the 'filevector' (aka 'DatABEL') "DOUBLE" format for input.  
+Hence, again, you do NOT have to have your data in GenABEL format, 
+but you do need to get files into filevector/DatABEL "DOUBLE". 
+ 
+The major issue is getting the (usually vast amounts of) genotypic 
+data in right format. To get to right format, in real life we 
+recommend that you use one of the GenABEL conversion procedures to 
+convert your data from IMPUTE, MACH, or MiniMac to DatABEL/filevector. 
+The corresponding GenABEL-package functions are =impute2databel=, 
+=mach2databel=, and =minimac2databel=. Make, however, sure that you 
+use the dataOutType = "DOUBLE" argument when argument when doing 
+the genotype data conversion! For example (in R), 
+
+#+begin_src R :eval never
+mach2databel(imputedgenofile = "f1.mldose", mlinfofile = "f1.mlinfo", 
+	outfile="f1", dataOutType = "DOUBLE")
+#+end_src
+
+ This argument is added to the GenABEL-package since 
+version >=1.7-7. 
+
+If you have already have genotypic data in filevector/DatABEL format, 
+but these are in the "FLOAT" format (default option for the =xxx2databel= 
+procedures), you can use the =float2double= utility provided together 
+with OmicABEL to convert to "DOUBLE". For example, if you have your 
+genotypic data in (filevector-FLOAT) files =myData.fvi= and =myData.fvd=, 
+you can convert them to DOUBLE by using (from command line): 
+
+#+begin_src sh :eval never
+float2double myData myDataDouble 
+#+end_src
+
+After which you will get files =myDataDouble.fvi= and 
+=myDataDouble.fvd= (note that the size of these files is roughly 
+double the size of floats - take care you have enough HDD space). 
+
+
+* Outline of the example
 In this example, we will use the data set distributed with GenABEL to
-show the use of OmicABEL. Hence you need to have [[http://www.genabel.org/packages/GenABEL][GenABEL package]]
-installed on your system. You will also need the [[http://www.genabel.org/packages/DatABEL][DatABEL package]] for
+show the use of OmicABEL. Hence you need to have 
+[[http://www.genabel.org/packages/GenABEL][GenABEL package]]
+installed on your system. You will also need the 
+[[http://www.genabel.org/packages/DatABEL][DatABEL package]] for
 data manipulations and 'mvtnorm' package (simulation of traits) 
 installed. 
 
-For conversion of files to the FaST-LMM format, you will need [[http://pngu.mgh.harvard.edu/~purcell/plink/download.shtml][PLINK]]
-installed, you will also need [[http://research.microsoft.com/en-us/um/redmond/projects/MSCompBio/Fastlmm/][FaST-LMM]] if you'd like to run the
-comparison. 
+For conversion of files to the FaST-LMM format, you will need 
+[[http://pngu.mgh.harvard.edu/~purcell/plink/download.shtml][PLINK]]
+installed, you will also need 
+[[http://research.microsoft.com/en-us/um/redmond/projects/MSCompBio/Fastlmm/][FaST-LMM]] 
+if you'd like to run the comparison. 
 
+
 * Prepare the data for analysis
   :PROPERTIES:
   :session: generateData
@@ -188,19 +240,6 @@
 : Ncells  364098 19.5     667722  35.7   467875  25.0
 : Vcells 3105774 23.7   17089802 130.4 20039530 152.9
 
-*IMPORTANT* Note that in real life you are most likley to use one 
-of the GenABEL conversion procedures to convert your data from IMPUTE, 
-MACH, or MiniMac to DatABEL/filevector. The functions are 
-impute2databel, mach2databel, and minimac2databel. 
-
-Make sure you use the dataOutType = "DOUBLE"  
-argument when doing genotype data conversion! - OmicABEL currently 
-accepts only DOUBLE format for all inputs.
-
-If you have already converted the data using the "FLOAT" type, you 
-will need to convert that to "DOUBLE" (send us an email, we are planning  
-to write a simple float2double converter).
-
 ** Export the data in format for FaST-LMM
 We are going to compare the OmicABEL results with FaST-LMM, and
 therefore will export the results in a format usable for FaST-LMM as
@@ -217,7 +256,8 @@
 #+begin_src R
 falmmCov <- cbind(1:nids(df),idnames(df),phdata(df)[,c("sex","age")])
 falmmCov[1:3,]
-write.table(falmmCov,file="plink.cov",col.names=FALSE,row.names=FALSE,quote=FALSE)
+write.table(falmmCov,file="plink.cov",col.names=FALSE,row.names=FALSE,
+	quote=FALSE)
 #+end_src
 
 #+RESULTS:
@@ -229,7 +269,8 @@
 Export phenotypes
 #+begin_src R
 falmmPhe <- cbind(1:nids(df),idnames(df),myPhenos)
-write.table(falmmPhe,file="plink.phe",col.names=FALSE,row.names=FALSE,quote=FALSE)
+write.table(falmmPhe,file="plink.phe",col.names=FALSE,row.names=FALSE,
+	quote=FALSE)
 #+end_src
 
 #+RESULTS:
@@ -240,7 +281,8 @@
 nms <- paste(1:nids(df),idnames(df))
 colnames(falmmRel) <- nms
 falmmRel <- cbind(var=nms,falmmRel)
-write.table(falmmRel,file="plink.sim",col.names=TRUE,row.names=FALSE,quote=FALSE,sep="\t")
+write.table(falmmRel,file="plink.sim",col.names=TRUE,row.names=FALSE,
+	quote=FALSE,sep="\t")
 #+end_src
 
 #+RESULTS:
@@ -385,7 +427,8 @@
 
 (this takes about 20 seconds)
 
-Easy, ergh?! Note that time does not add up - doing $N$ phenotypes is much faster then doing $N$ time one phenotype!
+Easy, ergh?! Note that time does not add up - doing $N$ phenotypes 
+is much faster then doing $N$ time one phenotype!
 
 ** Extract the data in text format