[Genabel-commits] r846 - pkg/ProbABEL/doc
noreply at r-forge.r-project.org
noreply at r-forge.r-project.org
Wed Feb 22 11:01:52 CET 2012
Author: lckarssen
Date: 2012-02-22 11:01:52 +0100 (Wed, 22 Feb 2012)
New Revision: 846
Added:
pkg/ProbABEL/doc/pacoxph.1
pkg/ProbABEL/doc/palogist.1
Modified:
pkg/ProbABEL/doc/Makefile.am
pkg/ProbABEL/doc/ProbABEL_manual.tex
pkg/ProbABEL/doc/palinear.1
Log:
- Added man-pages for palogist and pacoxph
- Updated the ProbABEL manual: some layout changes of equations, corrected some paths to files.
Modified: pkg/ProbABEL/doc/Makefile.am
===================================================================
--- pkg/ProbABEL/doc/Makefile.am 2012-02-22 01:14:22 UTC (rev 845)
+++ pkg/ProbABEL/doc/Makefile.am 2012-02-22 10:01:52 UTC (rev 846)
@@ -6,7 +6,7 @@
test.mlinfo test_regression.R COPYING LICENSE INSTALL CHANGES.LOG \
TODO
-man_MANS = palinear.1
+man_MANS = palinear.1 palogist.1 pacoxph.1
EXTRA_DIST = $(man_MANS)
if HAVE_PDFLATEX
Modified: pkg/ProbABEL/doc/ProbABEL_manual.tex
===================================================================
--- pkg/ProbABEL/doc/ProbABEL_manual.tex 2012-02-22 01:14:22 UTC (rev 845)
+++ pkg/ProbABEL/doc/ProbABEL_manual.tex 2012-02-22 10:01:52 UTC (rev 846)
@@ -1,9 +1,9 @@
\title{ProbABEL manual}
\author{
-Maksim Struchalin$^{1}$, Lennart Karssen$^{1}$, Yurii Aulchenko$^{1,2}$ \\
+Maksim Struchalin$^{1}$, Lennart Karssen$^{1}$, Yurii Aulchenko$^{1,2}$ \\
\\
-$^{1}$Erasmus MC Rotterdam \\
-$^{2}$Institute of Cytology and Genetics SD RAS
+$^{1}$Erasmus MC Rotterdam \\
+$^{2}$Institute of Cytology and Genetics SD RAS
}
\date{\today}
@@ -179,7 +179,7 @@
that \textbf{both the total number and the order of these IDs are
exactly the same as in the genomic predictor (MLDOSE) file described in
previous section}. This is not difficult to arrange using e.g.~\texttt{R};
-an example is given in the \texttt{ProbABEL/examples} directory.
+an example is given in the \texttt{examples} directory.
\textbf{Missing data should be coded with 'NA', 'N' or 'NaN' codes.} Any
other coding will be converted to some number which will be used in
@@ -192,13 +192,16 @@
covariates to be included into analysis.
An example few lines of phenotypic information file designed for
linear regression analysis follow here (also
-to be found in \texttt{ProbABEL/examples/height.txt})
+to be found in \texttt{examples/height.txt})
\verbatiminput{short_height.txt}
Note again that the order of IDs is the same between the MLDOSE file
and the phenotypic data file. The model specified by this file is
-$height \sim \mu + \textrm{sex} + \textrm{age}$, where $\mu$ is the intercept.
+\begin{equation*}
+\textrm{height} \sim \mu + \textrm{sex} + \textrm{age},
+\end{equation*}
+where $\mu$ is the intercept.
Clearly, you can for example include \texttt{sex $\times$ age} interaction terms by
specifying another column having a product of sex and age here.
@@ -206,13 +209,17 @@
For logistic regression, it is assumed that in the second column cases are
coded as ``1'' and controls as ``0''. An couple of example lines of a phenotypic
information file designed for logistic regression analysis follow here (also
-to be found in \texttt{ProbABEL/examples/logist\_data.txt})
+to be found in \texttt{examples/logist\_data.txt})
\verbatiminput{short_logist_data.txt}
You can see that in the first 10 people, there are three cases, as indicated
by ''chd'' equal to one. The model specified by this file
-is $\textrm{chd} \sim \mu + \textrm{sex} + \textrm{age} + \textrm{othercov}$.
+is
+\begin{equation*}
+\textrm{chd} \sim \mu + \textrm{sex} + \textrm{age} +
+\textrm{other cov}.
+\end{equation*}
In case of the Cox proportional hazards model, the composition of the
phenotypic input file is a bit different. In the second column and
@@ -222,7 +229,7 @@
covariates to be included into the analysis. An example few lines of
a phenotypic information file designed for the Cox proportional hazards model
analysis follow here (also to be found in
-\texttt{ProbABEL/examples/coxph\_data.txt})
+\texttt{examples/coxph\_data.txt})
\verbatiminput{short_coxph_data.txt}
@@ -231,15 +238,18 @@
time, as indicated by the ``chd'' column. Follow-up time is specified in the preceding
column. The covariates included into the model are age (presumably
at baseline), sex and ``othercov''; thus the model, in terms of
-\texttt{R/survival} is \\ $\textrm{Surv}(\textrm{fuptime\_chd},
-\textrm{chd}) \sim \textrm{sex} + \textrm{age} + \textrm{othercov}$.
+\texttt{R/survival} is
+\begin{equation*}
+\textrm{Surv}(\textrm{fuptime\_chd}, \textrm{chd})
+\sim \textrm{sex} + \textrm{age} + \textrm{other cov}.
+\end{equation*}
\subsection{Optional map file}
If you would like map information (e.g.~base pair position) to
be included in your outputs, you can supply a map file. These follow
HapMap "legend" file format. For example, for the five SNPs we considered
the map-file may look like (example can be found in
-\texttt{ProbABEL/examples/test.map})
+\texttt{examples/test.map})
\verbatiminput{test.map}
@@ -252,42 +262,41 @@
To run linear regression, you should use the program called
\texttt{palinear}; for logistic analysis use \texttt{palogist}, and
for the Cox proportional hazards model use \texttt{pacoxph} (all are
-found in the \texttt{ProbABEL/bin/} directory after you have compiled
+found in the \texttt{bin/} directory after you have compiled
the program).
There are in total 11 command line options you can specify to the
-\PA{} analysis functions \texttt{palinear} or
-\texttt{palogist}. If you run either program without any argument, you
-will get a short explanation to command line options:
+\PA{} analysis functions \texttt{palinear} or \texttt{palogist}. If
+you run either program without any argument or with the
+\texttt{--help} option, you will get a short explanation to command
+line options:
\begin{verbatim}
user at server:~$ palogist
+ProbABEL v. 0.2.0-beta (C) Yurii Aulchenko, Lennart C. Karssen, Maksim Struchalin, EMCR
-Usage: ../bin/palogist options
-
+Usage: /tmp/PAInst/bin/palogist options
Options:
- --pheno : phenotype file name
- --info : information (e.g. MLINFO) file name
- --dose : predictor (e.g. MLDOSE/MLPROB) file name
- --map : [optional] map file name
- --nids : [optional] number of people to analyse
- --chrom : [optional] chromosome (to be passed to output)
- --out : [optional] output file name (default is regression.out.txt)
- --skipd : [optional] how many columns to skip in predictor
- (dose/prob) file (default 2)
- --ntraits : [optional] how many traits are analysed (default 1)
- --ngpreds : [optional] how many predictor columns per marker
- (default 1 = MLDOSE; else use 2 for MLPROB)
- --separat : [optional] character to separate fields (default is space)
- --score : use score test
- --no-head : do not report header line
- --allcov : report estimates for all covariates (large outputs!)
- --interaction : which covariate to use for interaction with SNP
- (default is no ineraction, 0)
- --mmscore : score test for association between a trait and genetic
- polymorphism, in samples of related individuals
- --robust : report robust (aka sandwich, aka Hubert-White) standard
- errors
- --help : print help
+ --pheno : phenotype file name
+ --info : information (e.g. MLINFO) file name
+ --dose : predictor (e.g. MLDOSE/MLPROB) file name
+ --map : [optional] map file name
+ --nids : [optional] number of people to analyse
+ --chrom : [optional] chromosome (to be passed to output)
+ --out : [optional] output file name (default is regression.out.txt)
+ --skipd : [optional] how many columns to skip in the predictor
+ (dose/prob) file (default 2)
+ --ntraits : [optional] how many traits are analysed (default 1)
+ --ngpreds : [optional] how many predictor columns per marker
+ (default 1 = MLDOSE; else use 2 for MLPROB)
+ --separat : [optional] character to separate fields (default is space)
+ --score : use score test
+ --no-head : do not report header line
+ --allcov : report estimates for all covariates (large outputs!)
+ --interaction: Which covariate to use for interaction with SNP analysis (default is no interaction, 0)
+ --interaction_only: like previous but without covariate acting in interaction with SNP (default is no interaction, 0)
+ --mmscore : score test in samples of related individuals. File with inverse of variance-covariance matrix (for palinear) or inverse correlation (for palogist) as input parameter
+ --robust : report robust (aka sandwich, aka Hubert-White) standard errors
+ --help : print help
\end{verbatim}
%% --interaction_only: like previos but without covariate acting in
%% interaction with SNP
@@ -306,24 +315,24 @@
\texttt{--info} (or \texttt{-i}),
specifying the SNP information file described in sub-section \ref{ssec:infoin}.
-If you change to the \texttt{ProbABEL/examples} directory you can run
+If you change to the \texttt{examples} directory you can run
an analysis of height by running
\begin{verbatim}
-user at server:~/ProbABEL/examples/$ ../bin/palinear -p height.txt
--d test.mldose -i test.mlinfo
+user at server:~/ProbABEL/examples/$ ../bin/palinear -p height.txt \
+ -d test.mldose -i test.mlinfo
\end{verbatim}
Output from the analysis will be directed to the
\texttt{regression.out.csv} file.
The analysis of a binary trait (e.g.~chd) can be run with
\begin{verbatim}
-user at server:~/ProbABEL/examples/$ ../bin/palogist -p logist_data.txt
+user at server:~/ProbABEL/examples/$ ../bin/palogist -p logist_data.txt \
-d test.mldose -i test.mlinfo
\end{verbatim}
To run a Cox proportional hazards model, try
\begin{verbatim}
-user at server:~/ProbABEL/examples/$ ../bin/pacoxph -p coxph_data.txt
+user at server:~/ProbABEL/examples/$ ../bin/pacoxph -p coxph_data.txt \
-d test.mldose -i test.mlinfo
\end{verbatim}
@@ -335,8 +344,8 @@
with the \texttt{-d} option and also specify that there are two
genetic predictors per SNP, e.g.~you can run linear model with
\begin{verbatim}
-user at server:~/ProbABEL/examples/$ ../bin/palinear -p height.txt
- -d test.mlprob -i test.mlinfo
+user at server:~/ProbABEL/examples/$ ../bin/palinear -p height.txt \
+ -d test.mlprob -i test.mlinfo \
--ngpreds=2
\end{verbatim}
@@ -368,7 +377,7 @@
covariates.
An example of how a polygenic object estimated by \GA{} can be used
-with ProbABEL is provided in \texttt{ProbABEL/examples/mmscore.R}
+with ProbABEL is provided in \texttt{examples/mmscore.R}
Though technically \texttt{--mmscore} allows for inclusion of multiple
covariates, these should be kept to minimum as this is a score test. We
@@ -379,28 +388,30 @@
%Option \texttt{--interaction\_only} is like \texttt{--interaction} but does not
%include in the model the main effect of the covariate, which is acting in
%interaction with SNP. This option is useful when running \texttt{--mmscore},
-%in whch case the main effect should normally estimated in the polygenic
+%in which case the main effect should normally estimated in the polygenic
%model and only the interaction term in the \PA{} analysis.
\subsection{Running multiple analyses at once: \texttt{probabel.pl}}
-The Perl script \texttt{bin/probabel.pl\_example} represents a handy
-wraper for \PA{} functions. To start using it the
-configuration file \texttt{bin/probabel\_config.cfg\_example} needs to
-be edited. The configuration file consists of five columns. Each
-column except the first is a pattern for files produced by
-\texttt{MACH} (imputation software). The column named ``cohort'' is an
-identifying name of a population (``ERGO'' in this example), the
-column ``mlinfo\_path'' is the full path to mlinfo files, including a
-pattern where the chromosome number has been replaced by
-\texttt{\_.\_chr\_.\_}. The columns ``mldose\_path'',
-``mlprobe\_path'' and ``legend\_path'' are paths and patterns for
-``mldose'', ``mlprob'' and ``legend'' files, respectively. These also
-need to include the pattern for the chromosome as used in the column
-for the ``mlinfo'' files. Probably you also have to change the variable
-\texttt{\$config} in the script to point to the full path of the
-configuration file and the variable \texttt{@anprog} to point full
-path to the \PA{} scripts.
+The Perl script \texttt{bin/probabel.pl} represents a handy wraper for
+\PA{} functions. To start using it the configuration file
+\texttt{etc/probabel\_config.cfg.example} needs to be edited. The
+configuration file consists of five columns. Each column except the
+first is a pattern for files produced by \texttt{MACH} (imputation
+software). The column named ``cohort'' is an identifying name of a
+population (``ERGO'' in this example), the column ``mlinfo\_path'' is
+the full path to mlinfo files, including a pattern where the
+chromosome number has been replaced by \texttt{\_.\_chr\_.\_}. The
+columns ``mldose\_path'', ``mlprobe\_path'' and ``legend\_path'' are
+paths and patterns for ``mldose'', ``mlprob'' and ``legend'' files,
+respectively. These also need to include the pattern for the
+chromosome as used in the column for the ``mlinfo'' files. The
+\texttt{make install} installation procedure should have set all paths
+in the script correctly. If that is not the case you will have to
+change the variable \texttt{\$config} in the script to point to the
+full path of the configuration file and the variables
+\texttt{\$base\_path} and \texttt{@anprog} to point the full path to
+the \PA{} scripts.
\section{Output file format}
@@ -444,7 +455,7 @@
\section{Preparing input files}
-In the \texttt{ProbABEL/bin} directory you can find the
+In the \texttt{bin} directory you can find the
\texttt{prepare\_data.R} file -- an R script that arranges phenotypic
data in right format. Please read this script for details.
@@ -455,7 +466,7 @@
with 1-2 covariates and overnight for logistic regression or the Cox
proportional hazards model (figures for a PC bought back in 2007).
-Memory may be an issue with \PA{} if you use
+Memory may be an issue with \PA{} if you use
MACH text dose/probability files, e.g. for large chromosomes,
such as chromosome one consumed up to 5 GB of RAM with 6,000 people.
@@ -466,7 +477,8 @@
filevector format (functions: \texttt{mach2databel()} and
\texttt{impute2databel()}, respectively).
-When '--mmscore' option is used, the analysis may take quite some time.
+When the \texttt{--mmscore} option is used, the analysis may take
+quite some time.
\section{Methodology}
\label{sec:methodology}
@@ -769,10 +781,8 @@
\section{How to cite}
-If you used \PA{} for
-your analysis please give a link to the \texttt{GenABEL project} home
-page
-
+If you used \PA{} for your analysis please give a link to the
+\texttt{GenABEL project} home page
\begin{quote}
\url{http://www.genabel.org/}
\end{quote}
@@ -784,13 +794,13 @@
\end{quote}
A proper reference may look like
\begin{quote}
-For the analysis of imputed data, we used the \PA{}
+For the analysis of imputed data, we used the \PA{}
from the \texttt{GenABEL} suite of programs (Aulchenko \emph{et al.}, 2010).
\end{quote}
If you have used the Cox proportional hazard model, please mention the
\texttt{R} package \texttt{survival} by Thomas Lumley. Additionally
-to the above citation, please tell that
+to the above citation, please include
\begin{quote}
The Cox proportional hazards model implemented in \PA{}
makes use of the source code of the \texttt{R} package ''\texttt{survival}''
Added: pkg/ProbABEL/doc/pacoxph.1
===================================================================
--- pkg/ProbABEL/doc/pacoxph.1 (rev 0)
+++ pkg/ProbABEL/doc/pacoxph.1 2012-02-22 10:01:52 UTC (rev 846)
@@ -0,0 +1,85 @@
+.TH pacoxph 1 "23 February 2012"
+.SH NAME
+pacoxph \- Perform Genome-Wide Association Analysis using a linear model
+.SH SYNOPSIS
+.B pacoxph
+.RI "[ " "command-line options" " ]"
+.SH DESCRIPTION
+.I pacoxph
+runs a linear regression on large imputed data sets in an efficient way.
+.SH Options
+.SS Required command line options
+.TP
+.BI "\-p, \-\^\-pheno" " FILE"
+Read phenotype data from
+.I FILE
+.TP
+.BI "\-i, \-\^\-info" " FILE"
+Read SNP information from
+.I FILE
+(e.g. MLINFO file)
+.TP
+.BI "\-d, \-\^\-dose" " FILE"
+SNP predictor (e.g. MLDOSE/MLPROB) file name
+.SS Optional command line options
+.TP
+.BI "\-\^\-map" " FILE"
+Map file name, containing base pair positions for each SNP.
+.TP
+.BI "\-\^\-nids" " NUMBER"
+Number of people to analyse
+.TP
+.BI "\-\^\-chrom" " FILE"
+Chromosome (to be passed to output)
+.TP
+.BI "\-\^\-out" " FILE"
+Output file name (default is
+.B regression.out.txt
+)
+.TP
+.BI "\-\^\-skipd" " NUMBER"
+How many columns to skip in predictor (dose/prob) file (default is 2)
+.TP
+.BI "\-\^\-ntraits" " NUMBER"
+How many traits are analysed (default is 2)
+.TP
+.BI "\-\^\-ngpreds" " NUMBER"
+How many predictor columns per marker (default 1 = MLDOSE; else use 2 for MLPROB)
+.TP
+.B "\-\^\-separat" " FILE"
+Character to separate fields (default is space)
+.TP
+.B \-\^\-score
+Use the score test
+.TP
+.B \-\^\-no-head
+Do not report header line in the output
+.TP
+.B \-\^\-allcov
+Report estimates for all covariates (large outputs!)
+.TP
+.B \-\^\-interaction
+Which covariate to use for interaction with SNP analysis (default is no interaction, 0)
+.TP
+.B \-\^\-interaction_only
+Like
+.B \-\^\-interaction
+but without covariate acting in interaction with SNP (default is no interaction, 0)
+.TP
+.BI "\-\^\-mmscore" " FILE"
+Score test in samples of related individuals. File with inverse of variance-covariance matrix (for pacoxph) or inverse correlation (for palogist) as input parameter
+.TP
+.B \-\^\-robust
+Report robust (a.k.a. sandwich, a.k.a. Hubert-White) standard errors
+.TP
+.B \-\^\-help
+Print help
+
+.SH "SEE ALSO"
+palinear(1), palogist(1)
+.SH BUGS
+Unfortunately
+.B pacoxph
+is in a buggy state at the moment. It cannot use files in DatABEL format.
+.SH AUTHORS
+Lennart C. Karssen
Modified: pkg/ProbABEL/doc/palinear.1
===================================================================
--- pkg/ProbABEL/doc/palinear.1 2012-02-22 01:14:22 UTC (rev 845)
+++ pkg/ProbABEL/doc/palinear.1 2012-02-22 10:01:52 UTC (rev 846)
@@ -1,4 +1,4 @@
-.TH palinear 1 "22 February 2012"
+.TH palinear 1 "23 February 2012"
.SH NAME
palinear \- Perform Genome-Wide Association Analysis using a linear model
.SH SYNOPSIS
@@ -6,20 +6,20 @@
.RI "[ " "command-line options" " ]"
.SH DESCRIPTION
.I palinear
-runs a linear regression in an efficient way.
+runs a linear regression on large imputed data sets in an efficient way.
.SH Options
.SS Required command line options
.TP
-.BI "\-\^\-pheno" " FILE"
+.BI "\-p, \-\^\-pheno" " FILE"
Read phenotype data from
.I FILE
.TP
-.BI "\-\^\-info" " FILE"
+.BI "\-i, \-\^\-info" " FILE"
Read SNP information from
.I FILE
(e.g. MLINFO file)
.TP
-.BI "\-\^\-dose" " FILE"
+.BI "\-d, \-\^\-dose" " FILE"
SNP predictor (e.g. MLDOSE/MLPROB) file name
.SS Optional command line options
.TP
@@ -70,7 +70,7 @@
Score test in samples of related individuals. File with inverse of variance-covariance matrix (for palinear) or inverse correlation (for palogist) as input parameter
.TP
.B \-\^\-robust
-Report robust (aka sandwich, aka Hubert-White) standard errors
+Report robust (a.k.a. sandwich, a.k.a. Hubert-White) standard errors
.TP
.B \-\^\-help
Print help
Added: pkg/ProbABEL/doc/palogist.1
===================================================================
--- pkg/ProbABEL/doc/palogist.1 (rev 0)
+++ pkg/ProbABEL/doc/palogist.1 2012-02-22 10:01:52 UTC (rev 846)
@@ -0,0 +1,82 @@
+.TH palogist 1 "23 February 2012"
+.SH NAME
+palogist \- Perform Genome-Wide Association Analysis using a linear model
+.SH SYNOPSIS
+.B palogist
+.RI "[ " "command-line options" " ]"
+.SH DESCRIPTION
+.I palogist
+runs a linear regression on large imputed data sets in an efficient way.
+.SH Options
+.SS Required command line options
+.TP
+.BI "\-p, \-\^\-pheno" " FILE"
+Read phenotype data from
+.I FILE
+.TP
+.BI "\-i, \-\^\-info" " FILE"
+Read SNP information from
+.I FILE
+(e.g. MLINFO file)
+.TP
+.BI "\-d, \-\^\-dose" " FILE"
+SNP predictor (e.g. MLDOSE/MLPROB) file name
+.SS Optional command line options
+.TP
+.BI "\-\^\-map" " FILE"
+Map file name, containing base pair positions for each SNP.
+.TP
+.BI "\-\^\-nids" " NUMBER"
+Number of people to analyse
+.TP
+.BI "\-\^\-chrom" " FILE"
+Chromosome (to be passed to output)
+.TP
+.BI "\-\^\-out" " FILE"
+Output file name (default is
+.B regression.out.txt
+)
+.TP
+.BI "\-\^\-skipd" " NUMBER"
+How many columns to skip in predictor (dose/prob) file (default is 2)
+.TP
+.BI "\-\^\-ntraits" " NUMBER"
+How many traits are analysed (default is 1)
+.TP
+.BI "\-\^\-ngpreds" " NUMBER"
+How many predictor columns per marker (default 1 = MLDOSE; else use 2 for MLPROB)
+.TP
+.B "\-\^\-separat" " FILE"
+Character to separate fields (default is space)
+.TP
+.B \-\^\-score
+Use the score test
+.TP
+.B \-\^\-no-head
+Do not report header line in the output
+.TP
+.B \-\^\-allcov
+Report estimates for all covariates (large outputs!)
+.TP
+.B \-\^\-interaction
+Which covariate to use for interaction with SNP analysis (default is no interaction, 0)
+.TP
+.B \-\^\-interaction_only
+Like
+.B \-\^\-interaction
+but without covariate acting in interaction with SNP (default is no interaction, 0)
+.TP
+.BI "\-\^\-mmscore" " FILE"
+Score test in samples of related individuals. File with inverse of variance-covariance matrix (for palogist) or inverse correlation (for palogist) as input parameter
+.TP
+.B \-\^\-robust
+Report robust (a.k.a. sandwich, a.k.a. Hubert-White) standard errors
+.TP
+.B \-\^\-help
+Print help
+
+.SH "SEE ALSO"
+palinear(1), pacoxph(1)
+.SH BUGS
+.SH AUTHORS
+Lennart C. Karssen
More information about the Genabel-commits
mailing list