[Genabel-commits] r1681 - in branches/ProbABEL-pvals/ProbABEL: checks checks/R-tests doc src

noreply at r-forge.r-project.org noreply at r-forge.r-project.org
Mon Apr 7 17:44:07 CEST 2014


Author: lckarssen
Date: 2014-04-07 17:44:06 +0200 (Mon, 07 Apr 2014)
New Revision: 1681

Modified:
   branches/ProbABEL-pvals/ProbABEL/checks/R-tests/Makefile.am
   branches/ProbABEL-pvals/ProbABEL/checks/R-tests/run_models_in_R_pacox.R
   branches/ProbABEL-pvals/ProbABEL/checks/run_diff.sh
   branches/ProbABEL-pvals/ProbABEL/doc/ChangeLog
   branches/ProbABEL-pvals/ProbABEL/doc/INSTALL
   branches/ProbABEL-pvals/ProbABEL/doc/ProbABEL_manual.tex
   branches/ProbABEL-pvals/ProbABEL/doc/pacoxph.1
   branches/ProbABEL-pvals/ProbABEL/doc/palinear.1
   branches/ProbABEL-pvals/ProbABEL/doc/palogist.1
   branches/ProbABEL-pvals/ProbABEL/doc/probabel.1
   branches/ProbABEL-pvals/ProbABEL/src/eigen_mematrix.cpp
   branches/ProbABEL-pvals/ProbABEL/src/reg1.cpp
Log:
Merged changes from trunk (r1680) into the ProbABEL p-values branch.


Modified: branches/ProbABEL-pvals/ProbABEL/checks/R-tests/Makefile.am
===================================================================
--- branches/ProbABEL-pvals/ProbABEL/checks/R-tests/Makefile.am	2014-04-07 15:17:31 UTC (rev 1680)
+++ branches/ProbABEL-pvals/ProbABEL/checks/R-tests/Makefile.am	2014-04-07 15:44:06 UTC (rev 1681)
@@ -30,6 +30,11 @@
 ## The palogist R test still doesn't run correctly.
 XFAIL_TESTS = run_R_test_palogist.sh
 
+## The pacoxph R test fails on SNP 6 when EIGEN is not enabled.
+if !WITH_EIGEN
+XFAIL_TESTS += run_R_test_pacox.sh
+endif
+
 EXTRA_DIST = $(check_SCRIPTS) $(R_test_files)
 
 

Modified: branches/ProbABEL-pvals/ProbABEL/checks/R-tests/run_models_in_R_pacox.R
===================================================================
--- branches/ProbABEL-pvals/ProbABEL/checks/R-tests/run_models_in_R_pacox.R	2014-04-07 15:17:31 UTC (rev 1680)
+++ branches/ProbABEL-pvals/ProbABEL/checks/R-tests/run_models_in_R_pacox.R	2014-04-07 15:44:06 UTC (rev 1681)
@@ -1,5 +1,8 @@
 cat("Checking Cox PH regression...\n")
-library(survival)
+if (!require(survival)) {
+    cat("The R package 'survival' is not installed. Skipping Cox PH checks\n")
+    q()
+}
 
 args <- commandArgs(TRUE)
 srcdir <- args[1]

Modified: branches/ProbABEL-pvals/ProbABEL/checks/run_diff.sh
===================================================================
--- branches/ProbABEL-pvals/ProbABEL/checks/run_diff.sh	2014-04-07 15:17:31 UTC (rev 1680)
+++ branches/ProbABEL-pvals/ProbABEL/checks/run_diff.sh	2014-04-07 15:44:06 UTC (rev 1681)
@@ -20,7 +20,7 @@
 
     blanks="                                                                      "
 
-    if diff "$file1" "$file2" $args; then
+    if diff $args "$file1" "$file2"; then
         echo -e "${name}${blanks:${#name}} OK"
     else
         echo -e "${name}${blanks:${#name}} FAILED"

Modified: branches/ProbABEL-pvals/ProbABEL/doc/ChangeLog
===================================================================
--- branches/ProbABEL-pvals/ProbABEL/doc/ChangeLog	2014-04-07 15:17:31 UTC (rev 1680)
+++ branches/ProbABEL-pvals/ProbABEL/doc/ChangeLog	2014-04-07 15:44:06 UTC (rev 1681)
@@ -1,8 +1,13 @@
-***** v.0.4.3
-* Speed up of a factor of X after simplifying the way filevector data is
-  read in.
+***** v.0.4.3 (2014.04.01)
+* Speed-up of a factor of ~ 2 for linear, logistic and Cox regression when
+  using filevector input files.
+* Fixed bug #5404: "ProbABEL's R check for Cox regression doesn't check if
+  the survival package is installed".
+* Fixed bug #5403: "The ProbABEL manual doesn't contain any information on
+  how to install ProbABEL"
 
-***** v.0.4.2
+
+***** v.0.4.2 (2014.01.02)
 * The 'probabel.pl' script is now simply renamed to 'probabel' (a user
   shouldn't care what scripting language we use). For at least several
   releases to come, the old script name will still exist (as a link to the
@@ -30,6 +35,7 @@
 * For developers: a start has been made on documenting the internal
   functions using Doxygen.
 
+
 ***** v.0.4.1 (2013.08.29)
 * Fix bug #4854: When using mmscore, there is one (nan) column missing in
   the output for low-frequency SNPs. Also includes a simplification of the

Modified: branches/ProbABEL-pvals/ProbABEL/doc/INSTALL
===================================================================
--- branches/ProbABEL-pvals/ProbABEL/doc/INSTALL	2014-04-07 15:17:31 UTC (rev 1680)
+++ branches/ProbABEL-pvals/ProbABEL/doc/INSTALL	2014-04-07 15:44:06 UTC (rev 1681)
@@ -1,5 +1,8 @@
 These instructions show how to build ProbABEL.
 
+The ProbABEL manual (in .tex or .pdf format) also contains detailed
+(complementary) instructions on how to obtain and install ProbABEL.
+
 * Dependencies
   ProbABEL can be compiled without depending on other
   libraries. However, when the Eigen library is present

Modified: branches/ProbABEL-pvals/ProbABEL/doc/ProbABEL_manual.tex
===================================================================
--- branches/ProbABEL-pvals/ProbABEL/doc/ProbABEL_manual.tex	2014-04-07 15:17:31 UTC (rev 1680)
+++ branches/ProbABEL-pvals/ProbABEL/doc/ProbABEL_manual.tex	2014-04-07 15:44:06 UTC (rev 1681)
@@ -11,13 +11,46 @@
   $^{2}${\small Erasmus MC, Rotterdam}\\
   $^{3}${\small Institute of Cytology and Genetics SD RAS, Novosibirsk}
 }
-\date{January 30, 2014}
+\date{April 1, 2014}
 
+
+\usepackage[utf8]{inputenc}
+\usepackage[T1]{fontenc}
+\usepackage{textcomp}
+
+\usepackage[svgnames]{xcolor}
+\definecolor{webgreen}{rgb}{0,.5,0}
+
 \usepackage{verbatim}
+
+\usepackage{listings}
+\lstloadlanguages{Bash}
+\definecolor{lstbgcolor}{rgb}{0.9,0.9,0.9}
+\lstset{
+  tabsize=4,
+  rulecolor=,
+  basicstyle=\ttfamily,
+  upquote=true,
+  columns=fixed,
+  showstringspaces=false,
+  extendedchars=true,
+  breaklines=true,
+  breakatwhitespace,
+  prebreak = \raisebox{0ex}[0ex][0ex]{\ensuremath{\hookleftarrow}},
+  frame=single,
+  showtabs=false,
+  showspaces=false,
+  showstringspaces=false,
+  keywordstyle=\color[rgb]{0,0,1},
+  commentstyle=\color[rgb]{0,0.4,0},
+  stringstyle=\color[rgb]{0.5,0,1},
+  basicstyle=\footnotesize\ttfamily,
+  backgroundcolor=\color{lstbgcolor},
+}
+
 \usepackage{titleref}
 \usepackage{amsmath}
 \usepackage{makeidx}
-\usepackage[dvipsnames]{xcolor}
 \usepackage[pdftex,hyperfootnotes=false,pdfpagelabels]{hyperref}
 \hypersetup{%
   linktocpage=false, % If true the page numbers in the toc are links
@@ -29,7 +62,7 @@
   pdfhighlight=/O, %hyperfootnotes=true,%nesting=true,%frenchlinks,%
   pdfauthor={\textcopyright\ Y.~Aulchenko, M.~Struchalin, L.C.~Karssen},
   pdfsubject={ProbABEL manual},
-  colorlinks=true, urlcolor=MidnightBlue, linkcolor=blue %
+  colorlinks=true, urlcolor=blue, linkcolor=blue, citecolor=webgreen %
 }
 % get the links to the figures and tables right:
 \usepackage[all]{hypcap} % to be loaded after hyperref package
@@ -109,6 +142,141 @@
 GenABEL project bug tracker at
 \url{https://r-forge.r-project.org/tracker/index.php?group_id=505&atid=2058}.
 
+\section{Obtaining and installing \PA}
+\label{sec:obtaininstall}
+\PA{} is a tool that is mostly used on computers running the Linux
+operating system. We try to publish binary packages for Windows as
+well, but these aren't tested. We strongly suggest using \PA{} on
+Linux.
+
+\subsection{Precompiled packages}
+\PA{} can be obtained in several ways:
+\begin{itemize}
+\item If you are using Ubuntu Linux and have administrative rights on
+  the machine you can add the GenABEL PPA to your APT configuration
+  and install it from there. The PPA can be found at
+  \url{https://launchpad.net/~l.c.karssen/+archive/genabel-ppa}. Instructions
+  on how to add the PPA can also be found there.
+\item If your computer runs Debian Linux\footnote{At the moment \PA{}
+    is only available in Debian testing and unstable.} (and you have
+  administrative rights on it), you can install ProbABEL like this:
+  \begin{lstlisting}
+user at server:~$ apt-get install probabel
+  \end{lstlisting}
+\item Zip files with pre-compiled binaries (if available) can be found
+  on the ProbABEL web page
+  (\url{http://www.genabel.org/packages/ProbABEL}).
+\item If you don't fall in any of the aforementioned
+  categories\footnote{We know that many people have use Red Hat Linux,
+    CentOS, Scientific Linux or any other Red Hat
+    derivative. Unfortunately we haven't got \texttt{rpm} files
+    yet. Any help in creating those will be highly appreciated}, you
+  can install \PA{} manually by downloading the source code of the
+  latest version from the website and compiling it yourself. This will
+  be explained in section~\ref{sec:obtain}.
+\end{itemize}
+
+
+\subsection{Obtaining the source code and compiling it yourself}
+\label{sec:obtain}
+If you can't use any of the aforementioned pre-compiled packages, you
+can download the source code of \PA{} yourself, compile it and run it
+from your own home directory. This section details the steps you need
+to take. More information can be found in the \texttt{doc/INSTALL}.
+
+On the \href{http://www.genabel.org/packages/probabel}{\PA{}} website
+you can find the link to the latest version of the source code of \PA{}
+in a \texttt{tar.gz} file\footnote{The \texttt{tar.gz} file archive
+  format is the most commonly used format for distributing source code
+  on Linux/UNIX systems. These are compressed files, similar to
+  \texttt{zip} files.}. A \texttt{.asc} file with the same base name
+as the source code archive is also provided. This file contains a
+so-called GPG signature of the \texttt{tar.gz} file. Using this file
+and the \texttt{gpg} tool you can verify the authenticity of the
+source code by typing this command on the command line of a Linux
+shell\footnote{The \$ sign indicates the end of the command line
+  prompt. You don't need to type it.}:
+\begin{lstlisting}[]
+user at server:~$ gpg --verify probabel-0.4.3.tar.gz.asc
+gpg: Signature made Thu Jan  2 02:38:25 2014 CET using DSA key ID DA9CD509
+gpg: Good signature from "L.C. Karssen (GPG key for personal stuff) <lennart at karssen.org>"
+gpg:                 aka "L.C. Karssen (My GMail address) <l.c.karssen at gmail.com>"
+\end{lstlisting}
+Notice the ``Good signature'' message and the fact that the package was
+signed by Lennart Karssen, the ProbABEL maintainer. If a malicious
+hacker would have replaced the source code file (for example with one
+including a virus), he won't be able to sign the package using the
+same key (with key ID DA9CD509). If, for some reason, the
+\texttt{tar.gz} file has changed (e.g.~by such a hacker or because
+the file didn't get downloaded correctly) you will see output like
+this (notice the ``BAD signature'' message):
+\begin{lstlisting}[]
+user at server:~$ gpg --verify probabel-0.4.2.tar.gz.asc
+gpg: Signature made Thu Jan  2 02:38:25 2014 CET using DSA key ID DA9CD509
+gpg: BAD signature from "L.C. Karssen (GPG key for personal stuff) <lennart at karssen.org>"
+user at server:~$
+\end{lstlisting}
+
+Before continuing, it is important to mention that \PA{} can make use
+of the EIGEN library\footnote{EIGEN is a library for fast matrix
+  multiplication.}. We strongly recommend compiling \PA with EIGEN as
+it will speed up your analyses considerably. Moreover, we plan to
+remove the non-EIGEN part of the code in a future release. So, go to
+\url{http://eigen.tuxfamily.org} and download the \texttt{tar.gz} file
+of the latest version of EIGEN (3.2.1 at the time of writing). Extract
+the files:
+\begin{lstlisting}
+user at server:~$ tar -xzf 3.2.1.tar.gz
+\end{lstlisting}
+This will create a directory called \texttt{eigen-eigen} followed by a
+series of letters and digits. For simplicity we rename it to EIGEN
+\begin{lstlisting}
+user at server:~$ mv eigen-eigen-6b38706d90a9 EIGEN
+\end{lstlisting}
+
+Now it's time to extract the \PA{} source code and move into the
+directory that is created:
+\begin{lstlisting}
+user at server:~$ tar -xzf probabel-0.4.3.tar.gz
+user at server:~$ cd probabel-0.4.3
+\end{lstlisting}
+With the following command we will indicate where the EIGEN files can
+be found and where we want to install \PA{}. Let's install in a
+subdirectory of your home directory,
+e.g.~\texttt{/home/yourusername/ProbABEL}:
+\begin{lstlisting}
+user at server:~$ ./configure \
+   --prefix=/home/yourusername/ProbABEL/ \
+   --with-eigen-include-path=/home/yourusername/EIGEN
+\end{lstlisting}
+This will be followed by a series of checks to see if all tools
+required for compilation and installation are present on your
+system. If you don't see any warnings you can continue to
+compile\footnote{Compilation is the process of converting the source
+  files containing human readable program code to a files with machine
+  readable instructions.} the code using the \texttt{make}
+command\footnote{If you work on a machine with multiple processors (or
+  processor cores), which should be the case on modern servers, but
+  also on most PCs, you can speed up the process by adding this number
+  to the \texttt{-j} option. For example for four cores run
+  \texttt{make -j4}.} The next step will check the compiled code,
+after wich you install the program, documentation and examples to the
+directory you specified previously with the \texttt{--prefix} argument
+to the \texttt{./configure} command.
+\begin{lstlisting}
+user at server:~$ make
+user at server:~$ make check
+user at server:~$ make install
+\end{lstlisting}
+Note that each of these steps will scroll a lot of output on the
+screen. Please watch it for any warnings or errors. Please ask any
+questions on \href{http://forum.genabel.org/}{our support forum}.
+
+If all went well you will find the executable programs
+(\texttt{palinear}, \texttt{palogist}, and \texttt{pacoxph}) in the
+directory \texttt{/home/yourusername/ProbABEL/bin/}. You are now ready
+to analyse your data!
+
 \section{Input files}
 \PA{} takes three files as input: a file containing SNP
 information (e.g.~the MLINFO file of MaCH), a file with genome- or
@@ -337,33 +505,29 @@
 However, for a simple run only three options are mandatory, which
 specify the necessary files needed to run the regression analysis.
 
-These options are
-\texttt{--dose} (or \texttt{-d}),
-specifying the genomic predictor/MLDOSE file described in section \ref{ssec:dosein};
-\texttt{--pheno} (or \texttt{-p}),
-specifying the phenotypic data file described in section \ref{ssec:phenoin}; and
-\texttt{--info} (or \texttt{-i}),
-specifying the SNP information file described in section \ref{ssec:infoin}.
+These options are \texttt{--dose} (or \texttt{-d}), specifying the
+genomic predictor/MLDOSE file described in section \ref{ssec:dosein};
+\texttt{--pheno} (or \texttt{-p}), specifying the phenotypic data file
+described in section \ref{ssec:phenoin}; and \texttt{--info} (or
+\texttt{-i}), specifying the SNP information file described in section
+\ref{ssec:infoin}.
 
 If you change to the \texttt{examples} directory you can run
 an analysis of height by running
 \begin{verbatim}
-user at server:~/ProbABEL/examples/$ ../bin/palinear -p height.txt \
-                                 -d test.mldose -i test.mlinfo
+palinear -p height.txt -d gtdata/test.mldose -i gtdata/test.mlinfo
 \end{verbatim}
-Output from the analysis will be directed to the
+Output from the analysis will be stored in the
 \texttt{regression.out.csv} file.
-
 The analysis of a binary trait (e.g.~chd) can be run with
 \begin{verbatim}
-user at server:~/ProbABEL/examples/$ ../bin/palogist -p logist_data.txt \
-                                 -d test.mldose -i test.mlinfo
+palogist -p logist_data.txt -d gtdata/test.mldose \
+    -i gtdata/test.mlinfo
 \end{verbatim}
-
 To run a Cox proportional hazards model, try
 \begin{verbatim}
-user at server:~/ProbABEL/examples/$ ../bin/pacoxph -p coxph_data.txt \
-                                 -d test.mldose -i test.mlinfo
+pacoxph -p coxph_data.txt -d gtdata/test.mldose \
+    -i gtdata/test.mlinfo
 \end{verbatim}
 
 Please have a look at the shell script files \texttt{example\_qt.sh},
@@ -372,13 +536,22 @@
 
 To run an analysis with MLPROB files, you need specify the MLPROB file
 with the \texttt{-d} option and also specify that there are two
-genetic predictors per SNP, e.g.~you can run linear model with
+genetic predictors per SNP, e.g.~you can run a linear model with
 \begin{verbatim}
-user at server:~/ProbABEL/examples/$ ../bin/palinear -p height.txt \
-                                 -d test.mlprob -i test.mlinfo \
-                                 --ngpreds=2
+palinear -p height.txt -d gtdata/test.mlprob -i gtdata/test.mlinfo \
+    --ngpreds=2
 \end{verbatim}
 
+When using genomic predictor files (dosages or probabilities) stored
+in filevector (a.k.a.~DatABEL) format (i.e.~a combination of
+\texttt{.fvi} and \texttt{.fvd} files) you can specify these like you
+would with ordinary text files. This is how the previous example would
+change:
+\begin{verbatim}
+palinear -p height.txt -d gtdata/test.mlprob.fvi -i gtdata/test.mlinfo \
+    --ngpreds=2
+\end{verbatim}
+
 \subsection{Advanced analysis options}
 The option \texttt{--interaction} allows you to include interaction
 between SNPs and any covariate. If for example your model is

Modified: branches/ProbABEL-pvals/ProbABEL/doc/pacoxph.1
===================================================================
--- branches/ProbABEL-pvals/ProbABEL/doc/pacoxph.1	2014-04-07 15:17:31 UTC (rev 1680)
+++ branches/ProbABEL-pvals/ProbABEL/doc/pacoxph.1	2014-04-07 15:44:06 UTC (rev 1681)
@@ -1,4 +1,4 @@
-.TH pacoxph 1 "2 January 2014" "ProbABEL 0.4.3"
+.TH pacoxph 1 "01 April 2014" "ProbABEL 0.4.3"
 .SH NAME
 pacoxph \- Perform Genome-Wide Association Analysis using a linear model
 .SH SYNOPSIS

Modified: branches/ProbABEL-pvals/ProbABEL/doc/palinear.1
===================================================================
--- branches/ProbABEL-pvals/ProbABEL/doc/palinear.1	2014-04-07 15:17:31 UTC (rev 1680)
+++ branches/ProbABEL-pvals/ProbABEL/doc/palinear.1	2014-04-07 15:44:06 UTC (rev 1681)
@@ -1,4 +1,4 @@
-.TH palinear 1 "2 January 2014" "ProbABEL 0.4.3"
+.TH palinear 1 "01 April 2014" "ProbABEL 0.4.3"
 .SH NAME
 palinear \- Perform Genome-Wide Association Analysis using a linear model
 .SH SYNOPSIS

Modified: branches/ProbABEL-pvals/ProbABEL/doc/palogist.1
===================================================================
--- branches/ProbABEL-pvals/ProbABEL/doc/palogist.1	2014-04-07 15:17:31 UTC (rev 1680)
+++ branches/ProbABEL-pvals/ProbABEL/doc/palogist.1	2014-04-07 15:44:06 UTC (rev 1681)
@@ -1,4 +1,4 @@
-.TH palogist 1 "2 January 2014" "ProbABEL 0.4.3"
+.TH palogist 1 "01 April 2014" "ProbABEL 0.4.3"
 .SH NAME
 palogist \- Perform Genome-Wide Association Analysis using a linear model
 .SH SYNOPSIS

Modified: branches/ProbABEL-pvals/ProbABEL/doc/probabel.1
===================================================================
--- branches/ProbABEL-pvals/ProbABEL/doc/probabel.1	2014-04-07 15:17:31 UTC (rev 1680)
+++ branches/ProbABEL-pvals/ProbABEL/doc/probabel.1	2014-04-07 15:44:06 UTC (rev 1681)
@@ -1,4 +1,4 @@
-.TH ProbABEL 1 "2 January 2014" "ProbABEL 0.4.3"
+.TH ProbABEL 1 "01 April 2014" "ProbABEL 0.4.3"
 .SH NAME
 probabel \- Wrapper around the three ProbABEL binaries, simplifying their use
 .SH SYNOPSIS

Modified: branches/ProbABEL-pvals/ProbABEL/src/eigen_mematrix.cpp
===================================================================
--- branches/ProbABEL-pvals/ProbABEL/src/eigen_mematrix.cpp	2014-04-07 15:17:31 UTC (rev 1680)
+++ branches/ProbABEL-pvals/ProbABEL/src/eigen_mematrix.cpp	2014-04-07 15:44:06 UTC (rev 1681)
@@ -208,12 +208,12 @@
 //        delete[] data;
     if (nr <= 0)
     {
-        std::cerr << "mematrix(): number of rows smaller then 1\n";
+        std::cerr << "mematrix(): number of rows less than 1\n";
         exit(1);
     }
     if (nc <= 0)
     {
-        std::cerr << "mematrix(): number of columns smaller then 1\n";
+        std::cerr << "mematrix(): number of columns less than 1\n";
         exit(1);
     }
     nrow = nr;

Modified: branches/ProbABEL-pvals/ProbABEL/src/reg1.cpp
===================================================================
--- branches/ProbABEL-pvals/ProbABEL/src/reg1.cpp	2014-04-07 15:17:31 UTC (rev 1680)
+++ branches/ProbABEL-pvals/ProbABEL/src/reg1.cpp	2014-04-07 15:44:06 UTC (rev 1681)
@@ -719,7 +719,8 @@
             double emu = eMu.get(i, 0);
             double value = emu;
             double zval;
-            value = exp(value) / (1. + exp(value));
+            double expval = exp(value);
+            value = expval / (1. + expval);
             residuals[i] = (rdata.Y).get(i, 0) - value;
             eMu.put(value, i, 0);
             W.put(value * (1. - value), i, 0);
@@ -778,11 +779,13 @@
             beta.print();
         }
         // std::cout << "beta:\n"; beta.print();
-        // compute likelihood
+
+        // Compute the likelihood.
         prevlik = loglik;
-        loglik = 0.;
-        for (int i = 0; i < eMu.nrow; i++)
+        loglik = 0;
+        for (int i = 0; i < eMu.nrow; i++) {
             loglik += rdata.Y[i] * eMu_us[i] - log(1. + exp(eMu_us[i]));
+        }
 
         delta = fabs(1. - (prevlik / loglik));
         niter++;



More information about the Genabel-commits mailing list