[Genabel-commits] r1213 - pkg/ProbABEL/doc

Mon May 13 13:21:32 CEST 2013

Author: lckarssen
Date: 2013-05-13 13:21:31 +0200 (Mon, 13 May 2013)
New Revision: 1213

Modified:
   pkg/ProbABEL/doc/ProbABEL_manual.tex
Log:
Small fixes to the ProbABEL documentation. 
(Copyright for this change lies with the ErasmusMC.)


Modified: pkg/ProbABEL/doc/ProbABEL_manual.tex
===================================================================

--- pkg/ProbABEL/doc/ProbABEL_manual.tex	2013-05-13 07:41:14 UTC (rev 1212)
+++ pkg/ProbABEL/doc/ProbABEL_manual.tex	2013-05-13 11:21:31 UTC (rev 1213)
@@ -13,6 +13,7 @@
 \usepackage{titleref}
 \usepackage{amsmath}
 \usepackage{makeidx}
+\usepackage[dvipsnames]{xcolor}
 \usepackage[pdftex,hyperfootnotes=false,pdfpagelabels]{hyperref}
 \hypersetup{%
   linktocpage=false, % If true the page numbers in the toc are links
@@ -23,7 +24,8 @@
   bookmarksopen=true, bookmarksopenlevel=1, hypertexnames=true, %
   pdfhighlight=/O, %hyperfootnotes=true,%nesting=true,%frenchlinks,%
   pdfauthor={\textcopyright\ Y.~Aulchenko, M.~Struchalin, L.C.~Karssen},
-  pdfsubject={ProbABEL manual}
+  pdfsubject={ProbABEL manual},
+  colorlinks=true, urlcolor=MidnightBlue, linkcolor=blue %
 }
 % get the links to the figures and tables right:
 \usepackage[all]{hypcap} % to be loaded after hyperref package
@@ -137,19 +139,19 @@
 
 \verbatiminput{test.mlinfo}
 
-Note that header line is present in the file. The file describes
+Note that a header line is present in the file. The file describes
 five SNPs.
 
 \subsection{Genomic predictor file}
 \label{ssec:dosein}
 
 Again, in the simplest scenario this is an MLDOSE or MLPROB file
-generated by MaCH and \texttt{minimac}.  Such file starts with two special
-columns plus, for each of the SNPs under consideration, a column
-containing the estimated allele 1 dose (MLDOSE).  In an MLPROB file,
-two columns for each SNP correspond to posterior probability that
-person has two ($P_{A_1A_1}$) or one ($P_{A_1A_2}$) copies of allele
-1.  The first ``special'' column is made of the sequential id,
+generated by MaCH and/or \texttt{minimac}.  Such file starts with two
+special columns plus, for each of the SNPs under consideration, a
+column containing the estimated allele 1 dose (MLDOSE).  In an MLPROB
+file, two columns for each SNP correspond to posterior probability
+that person has two ($P_{A_1A_1}$) or one ($P_{A_1A_2}$) copies of
+allele 1.  The first ``special'' column is made of the sequential id,
 followed by an arrow followed by study ID (the one specified in the
 MaCH input files). The second column contains the method keyword
 (e.g.~``MLDOSE'').
@@ -293,27 +295,27 @@
 
 Usage: /tmp/PAInst/bin/palogist options
 Options:
-	 --pheno   : phenotype file name
-	 --info    : information (e.g. MLINFO) file name
-	 --dose    : predictor (e.g. MLDOSE/MLPROB) file name
-	 --map     : [optional] map file name
-	 --nids    : [optional] number of people to analyse
-	 --chrom   : [optional] chromosome (to be passed to output)
-	 --out     : [optional] output file name (default is regression.out.txt)
-	 --skipd   : [optional] how many columns to skip in the predictor
-		      (dose/prob) file (default 2)
-	 --ntraits : [optional] how many traits are analysed (default 1)
-	 --ngpreds : [optional] how many predictor columns per marker
-		      (default 1 = MLDOSE; else use 2 for MLPROB)
-	 --separat : [optional] character to separate fields (default is space)
-	 --score   : use score test
-	 --no-head : do not report header line
-	 --allcov  : report estimates for all covariates (large outputs!)
-	 --interaction: Which covariate to use for interaction with SNP analysis (default is no interaction, 0)
-	 --interaction_only: like previous but without covariate acting in interaction with SNP (default is no interaction, 0)
-	 --mmscore : score test in samples of related individuals. File with inverse of variance-covariance matrix (for palinear) or inverse correlation (for palogist) as input parameter
-	 --robust  : report robust (aka sandwich, aka Hubert-White) standard errors
-	 --help    : print help
+         --pheno   : phenotype file name
+         --info    : information (e.g. MLINFO) file name
+         --dose    : predictor (e.g. MLDOSE/MLPROB) file name
+         --map     : [optional] map file name
+         --nids    : [optional] number of people to analyse
+         --chrom   : [optional] chromosome (to be passed to output)
+         --out     : [optional] output file name (default is regression.out.txt)
+         --skipd   : [optional] how many columns to skip in the predictor
+                      (dose/prob) file (default 2)
+         --ntraits : [optional] how many traits are analysed (default 1)
+         --ngpreds : [optional] how many predictor columns per marker
+                      (default 1 = MLDOSE; else use 2 for MLPROB)
+         --separat : [optional] character to separate fields (default is space)
+         --score   : use score test
+         --no-head : do not report header line
+         --allcov  : report estimates for all covariates (large outputs!)
+         --interaction: Which covariate to use for interaction with SNP analysis (default is no interaction, 0)
+         --interaction_only: like previous but without covariate acting in interaction with SNP (default is no interaction, 0)
+         --mmscore : score test in samples of related individuals. File with inverse of variance-covariance matrix (for palinear) or inverse correlation (for palogist) as input parameter
+         --robust  : report robust (aka sandwich, aka Hubert-White) standard errors
+         --help    : print help
 \end{verbatim}
 %%		--interaction_only: like previos but without covariate acting in
 %%                    interaction with SNP
@@ -336,7 +338,7 @@
 an analysis of height by running
 \begin{verbatim}
 user at server:~/ProbABEL/examples/$ ../bin/palinear -p height.txt \
-				 -d test.mldose -i test.mlinfo
+                                 -d test.mldose -i test.mlinfo
 \end{verbatim}
 Output from the analysis will be directed to the
 \texttt{regression.out.csv} file.
@@ -344,7 +346,7 @@
 The analysis of a binary trait (e.g.~chd) can be run with
 \begin{verbatim}
 user at server:~/ProbABEL/examples/$ ../bin/palogist -p logist_data.txt \
-				 -d test.mldose -i test.mlinfo
+                                 -d test.mldose -i test.mlinfo
 \end{verbatim}
 
 To run a Cox proportional hazards model\footnote{Please note that in
@@ -355,7 +357,7 @@
   directory of the installation package.}, try
 \begin{verbatim}
 user at server:~/ProbABEL/examples/$ ../bin/pacoxph -p coxph_data.txt \
-				 -d test.mldose -i test.mlinfo
+                                 -d test.mldose -i test.mlinfo
 \end{verbatim}
 
 Please have a look at the shell script files \texttt{example\_qt.sh},
@@ -367,8 +369,8 @@
 genetic predictors per SNP, e.g.~you can run linear model with
 \begin{verbatim}
 user at server:~/ProbABEL/examples/$ ../bin/palinear -p height.txt \
-				 -d test.mlprob -i test.mlinfo \
-				 --ngpreds=2
+                                 -d test.mlprob -i test.mlinfo \
+                                 --ngpreds=2
 \end{verbatim}
 
 \subsection{Advanced analysis options}
@@ -529,7 +531,7 @@
 expectation
 \begin{equation}
   E[\mathbf{Y}] = \mathbf{X}\, \boldsymbol{\beta}
-\label{expectation}
+\label{eq:expectation}
 \end{equation}
 and variance-covariance matrix
 $$
@@ -652,15 +654,15 @@
 logarithm of every value contained in the vector $\pi$.
 
 \subsubsection{Robust variance-covariance matrix of parameter estimates}
-For a linear model, these are computed using formula
+For a linear model, these are computed using the equation
 $$
 \var_r = (\mathbf{X}^T\mathbf{X})^{-1} (\mathbf{X}^T\mathbf{R}\mathbf{X})
 (\mathbf{X}^T\mathbf{X})^{-1},
 $$
 where $\mathbf{R}$ is a diagonal matrix containing squares of residuals
 of $\mathbf{Y}$. The
-same formula may be used for ``standard'' analysis, in which case
-the elements of the $\mathbf{R}$ matrix are constant, namely mean
+same equation may be used for ``standard'' analysis, in which case
+the elements of the $\mathbf{R}$ matrix are constant, namely the mean
 residual sum of squares (the estimate of $\sigma^2$).
 
 Similar to that, the robust matrix is computed for logistic regression with
@@ -692,7 +694,7 @@
 E[\mathbf{Y}] = \mathbf{X} \mathbf{\beta},
 $$
 identical to that defined for a linear model
-(cf.~section~\ref{expectation}). To account for correlations between
+(cf.~Eq.~\ref{eq:expectation}). To account for correlations between
 the phenotypes of relatives which may be induced by family relations
 the variance-covariance matrix is defined to be proportional to the
 linear combination of the identity matrix $\mathbf{I}$ and the
@@ -714,17 +716,18 @@
 previously (Aulchenko \emph{et al}., 2007).
 
 \subsubsection{Two-step score test for association}
-A two-step score test approach is therefore used to decrease the computational
-burden. Let us first re-define the expectation of the trait by splitting the
-design matrix in two parts, the ''base'' part $\mathbf{X}_x$, which includes all
-terms not changing across all SNP models fit in GWAS (e.g. effects of sex, age, etc.),
-and the part including SNP information, $\mathbf{X_g}$:
+A two-step score test approach is therefore used to decrease the
+computational burden. Let us first re-define the expectation of the
+trait by splitting the design matrix in two parts, the ``base'' part
+$\mathbf{X}_x$, which includes all terms not changing across all SNP
+models fit in GWAS (e.g.\ effects of sex, age, etc.), and the part
+including SNP information, $\mathbf{X_g}$:
 $$
 E[\mathbf{Y}] = \mathbf{X}_x \mathbf{\beta}_x +
 \mathbf{X}_g \mathbf{\beta}_g.
 $$
-Note that the latter design matrix may include not only the main SNP effect, but
-e.g.\ SNP by environment interaction terms.
+Note that the latter design matrix may include not only the main SNP
+effect, but e.g.\ SNP by environment interaction terms.
 
 In the first step, a linear mixed model not including SNP effects
 $$