[Genabel-commits] r1725 - in pkg/ProbABEL: doc src

Sun May 11 14:11:19 CEST 2014

Author: lckarssen
Date: 2014-05-11 14:11:19 +0200 (Sun, 11 May 2014)
New Revision: 1725

Modified:
   pkg/ProbABEL/doc/ProbABEL_manual.tex
   pkg/ProbABEL/src/reg1.cpp
Log:
src/reg1.cpp: Added Doxygen documentation for the mmscore_regression() function, as well as a description of what the function does. Includes minor rewrite of the actual code, renaming variables and removing one call to .transpose().
Also changed and added some (partial) Doxygen documentation to other functions.

doc/ProbABEL_manual.tex: Added a number to an equation I reference in the Doxygen documentation



Modified: pkg/ProbABEL/doc/ProbABEL_manual.tex
===================================================================

--- pkg/ProbABEL/doc/ProbABEL_manual.tex	2014-05-09 13:06:44 UTC (rev 1724)
+++ pkg/ProbABEL/doc/ProbABEL_manual.tex	2014-05-11 12:11:19 UTC (rev 1725)
@@ -936,13 +936,13 @@
 
 In the second step, the unbiased estimates of the fixed effects of the
 terms involving SNP are obtained with
-$$
+\begin{equation}
 \hat{\beta}_g = (\mathbf{X}^T_g
 \mathbf{V}^{-1}_{\hat{\sigma}^2,\hat{h}^2}
 \mathbf{X}_g)^{-1}
 \mathbf{X}^T_g \mathbf{V}^{-1}_{\hat{\sigma}^2,\hat{h}^2}
 \mathbf{R}_{\hat{\beta}_x},
-$$
+\end{equation}
 where $\mathbf{V}^{-1}_{\hat{\sigma}^2,\hat{h}^2}$ is the inverse
 variance-covariance matrix at the point of the MLE estimates of
 $\hat{h}^2_x$ and $\hat{\sigma}^2_x$, and $\mathbf{R}_{\hat{\beta}_x}

Modified: pkg/ProbABEL/src/reg1.cpp
===================================================================
--- pkg/ProbABEL/src/reg1.cpp	2014-05-09 13:06:44 UTC (rev 1724)
+++ pkg/ProbABEL/src/reg1.cpp	2014-05-11 12:11:19 UTC (rev 1725)
@@ -1,3 +1,12 @@
+/**
+ * \file   reg1.cpp
+ * \author The ProbABEL team
+ *
+ * \brief File containing the parts of the code for linear and
+ * logistic regression.
+ *
+ * For CoxPH regression look in the file coxph_data.h.
+ */
 /*
  *
  * Copyright (C) 2009--2014 Various members of the GenABEL team. See
@@ -28,13 +37,14 @@
  *
  *
  * if ngpreds==1 (dose data):
- * model 0 = additive 1 df
+ * \li model 0 = additive (1 df)
+ *
  * if ngpreds==2 (prob data):
- * model 0 = 2 df
- * model 1 = additive 1 df
- * model 2 = dominant 1 df
- * model 3 = recessive 1 df
- * model 4 = over-dominant 1 df
+ * \li model 0 = 2 df
+ * \li model 1 = additive (1 df)
+ * \li model 2 = dominant (1 df)
+ * \li model 3 = recessive (1 df)
+ * \li model 4 = over-dominant (1 df)
  * @param X Design matrix, including SNP column
  * @param model
  * @param interaction
@@ -347,6 +357,45 @@
 }
 
 
+/**
+ * \brief Solve the linear system in case the --mmscore option was
+ * specified.
+ *
+ * Specifying the --mmscore command line option requires a file name
+ * argument as well. This file should contain the inverse
+ * variance-covariance matrix file. This function is run when Linear
+ * regression is done in combination with the mmscore option. It
+ * solves the 'mmscore' equation as specified in Eq. (5) in section
+ * 8.2.1 of the ProbABEL manual:
+ * \f[
+ *    \hat{\beta}_g = (\mathbf{X}^T_g
+ *    \mathbf{V}^{-1}_{\hat{\sigma}^2,\hat{h}^2}
+ *    \mathbf{X}_g)^{-1}
+ *    \mathbf{X}^T_g \mathbf{V}^{-1}_{\hat{\sigma}^2,\hat{h}^2}
+ *    \mathbf{R}_{\hat{\beta}_x},
+ * \f]
+ * where \f$\mathbf{V}^{-1}_{\hat{\sigma}^2,\hat{h}^2}\f$ is the
+ * inverse variance-covariance matrix, and
+ * \f$\mathbf{R}_{\hat{\beta}_x}\f$ is the vector containing the
+ * residuals obtained from the base regression model i.e. the
+ * phenotype. In this function, the phenotype is stored in the
+ * variable \c Y.
+ *
+ * @param X The design matrix \f$X_g\f$. \c X should only contain the
+ * parts involving genotype data (including any interactions involving
+ * a genetic term), all other covariates should have been regressed out
+ * before running ProbABEL.
+ * @param W_masked The inverse variance-covariance matrix
+ * \f$\mathbf{V}^{-1}_{\hat{\sigma}^2,\hat{h}^2}\f$.
+ * @param Ch Reference to the LDLT Cholesky decomposition of the
+ * matrix to be inverted to get \f$\hat\beta_g\f$:
+ * \f[
+ * \mathbf{X}^T_g
+ *    \mathbf{V}^{-1}_{\hat{\sigma}^2,\hat{h}^2}
+ *    \mathbf{X}_g.
+ * \f]
+ * On return this variable contains said matrix.
+ */
 void linear_reg::mmscore_regression(const mematrix<double>& X,
                                     const masked_matrix& W_masked,
                                     LDLT<MatrixXd>& Ch) {
@@ -357,12 +406,18 @@
      side has more rows: this introduces an additional transpose, but can be
      neglected compared to the speedup this brings(about a factor 2 for the
      palinear with 1 predictor)
+
+     This function solves the system
+        (X^T W X) beta = X^T W Y.
+     Since W is symmetric (WX)^T = X^T W, so this can be rewritten as
+        (WX)^T X beta = (WX)^T Y,
+     which is solved using LDLT Cholesky decomposition.
      */
-    MatrixXd tXW = W_masked.masked_data->data * X.data;
-    MatrixXd xWx = tXW.transpose() * X.data;
-    Ch = LDLT<MatrixXd>(xWx);
-    VectorXd beta_vec = Ch.solve(tXW.transpose() * Y);
-    sigma2 = (Y - tXW * beta_vec).squaredNorm();
+    MatrixXd WX = W_masked.masked_data->data * X.data;
+    MatrixXd XWT = WX.transpose();
+    Ch = LDLT<MatrixXd>(XWT * X.data);
+    VectorXd beta_vec = Ch.solve(XWT * Y);
+    sigma2 = (Y - WX * beta_vec).squaredNorm();
     beta.data = beta_vec;
 }
 
@@ -438,6 +493,18 @@
 }
 
 
+/**
+ * \brief Estimate the parameters for linear regression.
+ *
+ * @param verbose
+ * @param tol_chol
+ * @param model
+ * @param interaction
+ * @param ngpreds
+ * @param invvarmatrixin
+ * @param robust
+ * @param nullmodel
+ */
 void linear_reg::estimate(const int verbose, const double tol_chol,
                           const int model, const int interaction,
                           const int ngpreds,