[Genabel-commits] r1295 - pkg/ProbABEL/doc
noreply at r-forge.r-project.org
noreply at r-forge.r-project.org
Thu Aug 15 10:03:11 CEST 2013
Author: lckarssen
Date: 2013-08-15 10:03:11 +0200 (Thu, 15 Aug 2013)
New Revision: 1295
Modified:
pkg/ProbABEL/doc/ProbABEL_manual.tex
Log:
Updates to the ProbABEL documentation. Mostly small corrections, and a bit more text about the \chi^2 column in the output.
The ErasmusMC holds the copyright for this change.
Modified: pkg/ProbABEL/doc/ProbABEL_manual.tex
===================================================================
--- pkg/ProbABEL/doc/ProbABEL_manual.tex 2013-08-14 15:56:11 UTC (rev 1294)
+++ pkg/ProbABEL/doc/ProbABEL_manual.tex 2013-08-15 08:03:11 UTC (rev 1295)
@@ -60,13 +60,13 @@
SNP or microsatellite typing, we would normally know the genotype of
a particular person at a particular locus with very high degree of
confidence, and, in case of biallelic marker, can state whether
-genotype is $AA$, $AB$ or $BB$.
+the genotype is $AA$, $AB$ or $BB$.
-On the contrary, when dealing with imputed or high-throughput
-sequencing data, for many of the genomic loci we are quite uncertain
-about the genotypic status of the person. Instead of dealing with
+On the other hand, when dealing with imputed or high-throughput
+sequencing data, the genotypic status of the person is known with a
+much lower confidence. Instead of dealing with
known genotypes we work with a probability distribution that is based
-on observed information, and we have estimates that true underlying
+on observed information, and we have estimates that the true underlying
genotype is either $AA$, $AB$ or $BB$. The degree of confidence about
the real status is measured with the probability distribution
$\{P(AA), P(AB), P(BB)\}$.
@@ -90,9 +90,9 @@
outcome of interest onto estimated genotypic probabilities.
The \PA{} package was designed to perform such regression
-in a fast, memory-efficient and consequently genome-wide feasible manner.
-Currently, \PA{} implements linear, logistic regression,
-and Cox proportional hazards models. The corresponding analysis
+in a fast, memory-efficient and, consequently, genome-wide feasible manner.
+Currently, \PA{} implements linear and logistic regression,
+as well as the Cox proportional hazards model. The corresponding analysis
programs are called \texttt{palinear}, \texttt{palogist},
and \texttt{pacoxph}.
@@ -109,8 +109,8 @@
The dose/probability file may be supplied in filevector format
in which case \PA{} will operate much faster, and
-in low-RAM mode (approx. $\approx$ 128 MB). See the R libraries \GA{} and
-\DA{} on how to convert MaCH and IMPUTE files to
+in low-RAM mode (approx.~128 MB). See the R libraries \GA{} and
+\DA{} on how to convert MaCH and IMPUTE2 files to
filevector format (functions: \texttt{mach2databel()} and
\texttt{impute2databel()}, respectively).
@@ -137,7 +137,6 @@
to be found in \texttt{ProbABEL/examples/test.mlinfo})
\verbatiminput{test.mlinfo}
-
Note that a header line is present in the file. The file describes
five SNPs.
@@ -166,18 +165,20 @@
\textbf{The order of SNPs in the SNP information file and DOSE or PROB
file must be the same}. This should be the case if you just used
MaCH/\texttt{minimac} outputs.
+Consequently, the number of columns in the genomic predictor file
+must be the same as the number of lines in the SNP information file
+plus one in the case of a DOSE file. Similarly, for a PROB file the
+number of columns must be equal to two times the number of SNPs plus
+1.
-Therefore, by all means, the number of columns in the genomic predictor file
-must be the same as the number of lines in the SNP information file plus one.
-
The dose/probability file may be supplied in filevector format
(\texttt{.fvi} and \texttt{.fvd} files) in which case
\texttt{ProbABEL} will operate much faster, and in low-RAM mode
(approx.~128 MB). On the command line simply specify the \texttt{.fvi}
file as argument for the \texttt{--dose} option
(cf.~section~\ref{sec:runanalysis} for more information on the options
-accepted by \texttt{ProbABEL}). See the R libraries GenABEL and
-DatABEL on how to convert MaCH and IMPUTE files to
+accepted by \texttt{ProbABEL}). See the R libraries \GA{} and
+\DA{} on how to convert MaCH and IMPUTE files to
filevector format (functions: \texttt{mach2databel()} and
\texttt{impute2databel()}, respectively).
@@ -199,17 +200,16 @@
analysis! E.g.~coding missing as '-999.9' will result in an analysis which
will consider -999.9 as indeed a true measurements of the trait/covariates.
-In the case of linear or logistic regression (programs \texttt{palinear} and
-\texttt{palogist}, respectively), the second column specifies the trait
-under analysis, while the third, fourth, etc.~provide information on
-covariates to be included into analysis.
-An example few lines of phenotypic information file designed for
-linear regression analysis follow here (also
-to be found in \texttt{examples/height.txt})
+In the case of linear or logistic regression (programs
+\texttt{palinear} and \texttt{palogist}, respectively), the second
+column specifies the trait under analysis, while the third, fourth,
+etc.~provide information on covariates to be included into analysis.
+As an example, a few lines of a phenotypic information file designed
+for linear regression analysis follow here (also to be found in
+\texttt{examples/height.txt})
\verbatiminput{short_height.txt}
-
-Note again that the order of IDs is the same between the MLDOSE file
+Note again that the order of IDs is the same in the MLDOSE file
and the phenotypic data file. The model specified by this file is
\begin{equation*}
\textrm{height} \sim \mu + \textrm{sex} + \textrm{age},
@@ -245,7 +245,6 @@
\texttt{examples/coxph\_data.txt})
\verbatiminput{short_coxph_data.txt}
-
You can see that for the first ten people, the event occurs for three of
them, while for the other seven there is no event during the follow-up
time, as indicated by the ``chd'' column. Follow-up time is specified in the preceding
@@ -422,7 +421,6 @@
%model and only the interaction term in the \PA{} analysis.
\subsection{Running multiple analyses at once: \texttt{probabel.pl}}
-
The Perl script \texttt{bin/probabel.pl} represents a handy wraper for
\PA{} functions. To start using it the configuration file
\texttt{etc/probabel\_config.cfg.example} needs to be edited and
@@ -486,8 +484,12 @@
find the value specified by this option. If \texttt{--map} option was
used, in the subsequent column you will find map location taken from
the map-file. The subsequent columns provide coefficients of
-regression of the phenotype onto genotype, corresponding standard
-errors, and Wald $\chi^2$ test value.
+regression of the phenotype onto the genotype ($\beta$), corresponding
+standard errors ($\text{SE}_\beta$), and the $\chi^2$ test value based
+on the likelihood ratio test. Note that for the additive, recessive,
+dominant and overdominant genetic models this is a $\chi^2$ of 1
+degree of freedom, whereas for the genotypic model this is a $\chi^2$
+of 2df.
\section{Preparing input files}
@@ -498,7 +500,7 @@
\section{Memory use and performance}
Maximum likelihood regression is implemented in
-\PA{}. With 6,000 people and 2.5 millions SNPs, a
+\PA{}. With 6,000 people and 2.5 million SNPs, a
genome-wide scan is completed in less that an hour for a linear model
with 1-2 covariates and overnight for logistic regression or the Cox
proportional hazards model (figures for a PC bought back in 2007).
@@ -507,15 +509,15 @@
text dose/probability files, e.g. for large chromosomes, such as
chromosome one consumed up to 5 GB of RAM with 6,000 people.
-We suggest that dose/probability file is to be supplied in filevector format
-in which case \PA{} will operate about 2-3 times faster, and
-in low-RAM mode (approx.~128 MB). See the R libraries \GA{} and
-\DA{} on how to convert MaCH and IMPUTE files to
-filevector format (functions: \texttt{mach2databel()} and
+We suggest that the genotype dosage/probability file is to be supplied
+in filevector format in which case \PA{} will operate about 2-3 times
+faster, and in low-RAM mode (approx.~128 MB). See the R libraries
+\GA{} and \DA{} on how to convert MaCH and IMPUTE files to filevector
+format (functions: \texttt{mach2databel()} and
\texttt{impute2databel()}, respectively).
-When the \texttt{--mmscore} option is used, the analysis may take
-quite some time.
+When the \texttt{--mmscore} option is used, the analysis takes
+more time.
\section{Methodology}
\label{sec:methodology}
More information about the Genabel-commits
mailing list