[Genabel-commits] r1546 - in branches/ProbABEL-pvals/ProbABEL: . src
noreply at r-forge.r-project.org
noreply at r-forge.r-project.org
Mon Jan 13 15:07:05 CET 2014
Author: lckarssen
Date: 2014-01-13 15:07:04 +0100 (Mon, 13 Jan 2014)
New Revision: 1546
Modified:
branches/ProbABEL-pvals/ProbABEL/configure.ac
branches/ProbABEL-pvals/ProbABEL/src/Makefile.am
branches/ProbABEL-pvals/ProbABEL/src/main.cpp
branches/ProbABEL-pvals/ProbABEL/src/main_functions_dump.cpp
branches/ProbABEL-pvals/ProbABEL/src/main_functions_dump.h
Log:
First commit for adding calculation of p-values to ProbABEL.
This uses the statistical distributions from Boost::Math library (http://www.boost.org/doc/libs/1_55_0/libs/math/doc/html/dist.html). By default this is enabled. To disable this run: ./configure --without-boost. If the header file is not in the default include search path, use ./configure --with-boost-include-path "/some/path/to/boost_headers".
At the moment both the columns for the chi^2 values and the p-values are printed. We may want to disable printing the chi^2 values in the future, possibly enabling them with a command line switch.
The actual calculation is performed by the function pchisq() named like its R counterpart in the main_functions_dump.cpp file. In main.cpp an extra vector is created to contain the chi^2 values in double format (next to the ostringstream that was already there). This vector of doubles (one per model) is then used when calculating the p-values.
Compilation is succesfull, but the two checks that compare output against "known-good" output files fail (because I haven't updated those files yet). Compilation and checks when running ./configure --without-boost work as expected (this should be the old (v0.4.x) behaviour).
TODO: Like before (v0.4.x series) the chi^2 is not calculated when using mmscore and the 2DF model, because that requires the 2DF Wald test, which we haven't implemented yet (main.cpp lines 440--457.)
Modified: branches/ProbABEL-pvals/ProbABEL/configure.ac
===================================================================
--- branches/ProbABEL-pvals/ProbABEL/configure.ac 2014-01-12 20:10:41 UTC (rev 1545)
+++ branches/ProbABEL-pvals/ProbABEL/configure.ac 2014-01-13 14:07:04 UTC (rev 1546)
@@ -84,6 +84,39 @@
AM_CONDITIONAL([WITH_EIGEN], test "x$with_eigen" != "xno")
+# See if we can use the Boost libraries for calculating p-values
+AC_ARG_WITH([boost],
+ AS_HELP_STRING([--with-boost], [Use the Boost Math library for
+ calculating p-values. This is enabled by default]
+ )
+)
+
+if test "x$with_boost" != "xno"; then
+ AC_MSG_NOTICE([building using the Boost Math library enabled])
+
+ AC_ARG_WITH([boost-include-path],
+ [AS_HELP_STRING([--with-boost-include-path],
+ [location of the Boost Math headers, defaults to /usr/include/boost])],
+ [CXXFLAGS+=" -I${withval}"
+ CPPFLAGS+=" -I${withval}"],
+ [CXXFLAGS+=' -I/usr/include/boost'
+ CPPFLAGS+=' -I/usr/include/boost'])
+
+ # Check for the Boost Math header files
+ AC_CHECK_HEADERS([boost/math/distributions.hpp])
+
+ if test x$ac_cv_header_boost_math_distributions_hpp = xno; then
+ AC_MSG_ERROR([Could not find the Boost Math header files. Did \
+you specify --with-boost-include-path correctly? Or use --without-boost \
+to disable the calculation of p-values.])
+ fi
+else
+ AC_MSG_NOTICE([not using the Boost libraries, so no p-values in the \
+output])
+fi
+AM_CONDITIONAL([WITH_BOOST], test "x$with_boost" != "xno")
+
+
# Checks for typedefs, structures, and compiler characteristics.
AC_HEADER_STDBOOL
AC_C_INLINE
Modified: branches/ProbABEL-pvals/ProbABEL/src/Makefile.am
===================================================================
--- branches/ProbABEL-pvals/ProbABEL/src/Makefile.am 2014-01-12 20:10:41 UTC (rev 1545)
+++ branches/ProbABEL-pvals/ProbABEL/src/Makefile.am 2014-01-13 14:07:04 UTC (rev 1546)
@@ -1,13 +1,14 @@
## Process this file with automake to produce Makefile.in
-## Using wildcards in these lists doesn't work. Also GNU make's ($wildcard,) doesn't
-## work. It gives warning message about portability, but in the end doesn't work,
-## I tried :-).
-REGFILES = data.h data.cpp gendata.h gendata.cpp mematrix.h mematri1.h \
- command_line_settings.h command_line_settings.cpp reg1.h usage.h \
- usage.cpp main.cpp utilities.h utilities.cpp phedata.h phedata.cpp \
- cholesky.h cholesky.cpp regdata.h regdata.cpp \
- maskedmatrix.cpp maskedmatrix.h reg1.cpp main_functions_dump.h main_functions_dump.cpp
+## Using wildcards in these lists doesn't work. Also GNU make's
+## ($wildcard,) doesn't work. It gives warning message about
+## portability, but in the end doesn't work, I tried :-).
+REGFILES = data.h data.cpp gendata.h gendata.cpp mematrix.h \
+ mematri1.h command_line_settings.h command_line_settings.cpp reg1.h \
+ usage.h usage.cpp main.cpp utilities.h utilities.cpp phedata.h \
+ phedata.cpp cholesky.h cholesky.cpp regdata.h regdata.cpp \
+ maskedmatrix.cpp maskedmatrix.h reg1.cpp main_functions_dump.h \
+ main_functions_dump.cpp
EIGENFILES = eigen_mematrix.h eigen_mematrix.cpp
@@ -77,6 +78,12 @@
pacoxph_CXXFLAGS += -DEIGEN
endif
+if WITH_BOOST
+palinear_CXXFLAGS += -DWITH_BOOST
+palogist_CXXFLAGS += -DWITH_BOOST
+pacoxph_CXXFLAGS += -DWITH_BOOST
+endif
+
extract_snp_SOURCES = extract-snp.cpp $(FVSRC) $(FVHEADERS)
## Install these scripts in the bin directory as well:
Modified: branches/ProbABEL-pvals/ProbABEL/src/main.cpp
===================================================================
--- branches/ProbABEL-pvals/ProbABEL/src/main.cpp 2014-01-12 20:10:41 UTC (rev 1545)
+++ branches/ProbABEL-pvals/ProbABEL/src/main.cpp 2014-01-13 14:07:04 UTC (rev 1546)
@@ -202,11 +202,16 @@
int start_pos, end_pos;
+
+ // Output streams for the regression results we want to print in
+ // the outpute file(s).
std::vector<std::ostringstream *> beta_sebeta;
- // Han Chen
std::vector<std::ostringstream *> covvalue;
- // Oct 26, 2009
+ std::vector<double*> chi2val; // vector that contains the chi2 values
std::vector<std::ostringstream *> chi2;
+#if WITH_BOOST
+ std::vector<std::ostringstream *> pval;
+#endif
// Create string streams for betas, SEs, etc. These are used to
// later store the various output values that will be written to
@@ -224,6 +229,11 @@
chi2.push_back(new std::ostringstream());
chi2[i]->precision(6);
// *chi2[i] << scientific;
+ chi2val.push_back(new double);
+#if WITH_BOOST
+ pval.push_back(new std::ostringstream());
+ pval[i]->precision(6);
+#endif
}
@@ -408,17 +418,21 @@
input_var.getNgpreds(),
true, 1, mli, csnp);
#endif
- *chi2[model] << 2. * (loglik - new_null_rd.loglik);
+ *chi2val[model] = 2. *
+ (loglik - new_null_rd.loglik);
+ *chi2[model] << *chi2val[model];
}
else
{
// No missing SNP data, we can compute the LRT
- *chi2[model] << 2. * (loglik - null_loglik);
+ *chi2val[model] = 2. * (loglik - null_loglik);
+ *chi2[model] << *chi2val[model];
}
} else
{
// We want score test output
- *chi2[model] << rd.chi2_score;
+ *chi2val[model] = rd.chi2_score;
+ *chi2[model] << *chi2val[model];
}
} // END if( inv.sigma == NULL )
else if (input_var.getInverseFilename() != NULL)
@@ -432,12 +446,14 @@
* equation just below Eq.(4) in the ProbABEL
* paper. TODO LCK
*/
+ *chi2val[model] = NAN;
*chi2[model] << "nan";
}
else
{
double Z = rd.beta[start_pos] / rd.sebeta[start_pos];
- *chi2[model] << Z * Z;
+ *chi2val[model] = Z * Z;
+ *chi2[model] << *chi2val[model];
}
}
} // END first part of if(poly); allele not too rare
@@ -493,6 +509,7 @@
}
#endif
// Oct 26, 2009
+ *chi2val[model] = NAN;
*chi2[model] << "nan";
} else
{ // ngpreds==1 (and SNP is rare)
@@ -508,9 +525,16 @@
#endif
// Oct 26, 2009
} // END if getInverseFilename == NULL
+ *chi2val[model] = NAN;
*chi2[model] << "nan";
} // END ngpreds == 1 (and SNP is rare)
} // END else: SNP is rare
+
+#if WITH_BOOST
+ // Calulate p-values based on the chi^2 values
+ int chi2df = 1;
+ *pval[model] << pchisq(*chi2val[model], chi2df);
+#endif
} // END of model cycle
@@ -528,8 +552,13 @@
<< input_var.getSep();
}
#endif
- *outfile[model] << chi2[model]->str()
- << "\n";
+ *outfile[model] << chi2[model]->str();
+
+#if WITH_BOOST
+ *outfile[model] << input_var.getSep() << pval[model]->str();
+#endif
+
+ *outfile[model] << endl;
} // END for loop over all models
}
else // Dose data: only additive model. Only one output file
@@ -541,10 +570,15 @@
*outfile[0] << covvalue[0]->str() << input_var.getSep();
}
#endif
- *outfile[0] << chi2[0]->str() << "\n";
+ *outfile[0] << chi2[0]->str();
+
+#if WITH_BOOST
+ *outfile[0] << input_var.getSep() << pval[0]->str();
+#endif
+
+ *outfile[0] << endl;
} // End ngpreds == 1 when writing output files
-
// Clean chi2 and other streams
for (int model = 0; model < maxmod; model++)
{
@@ -553,6 +587,9 @@
covvalue[model]->str("");
// Oct 26, 2009
chi2[model]->str("");
+#if WITH_BOOST
+ pval[model]->str("");
+#endif
}
update_progress_to_cmd_line(csnp, nsnps);
@@ -582,12 +619,14 @@
delete *it;
++it;
}
+
it = covvalue.begin();
while (it != covvalue.end())
{
delete *it;
++it;
}
+
it = chi2.begin();
while (it != chi2.end())
{
@@ -595,5 +634,14 @@
++it;
}
+#if WITH_BOOST
+ it = pval.begin();
+ while (it != pval.end())
+ {
+ delete *it;
+ ++it;
+ }
+#endif
+
return (0);
}
Modified: branches/ProbABEL-pvals/ProbABEL/src/main_functions_dump.cpp
===================================================================
--- branches/ProbABEL-pvals/ProbABEL/src/main_functions_dump.cpp 2014-01-12 20:10:41 UTC (rev 1545)
+++ branches/ProbABEL-pvals/ProbABEL/src/main_functions_dump.cpp 2014-01-13 14:07:04 UTC (rev 1546)
@@ -15,6 +15,10 @@
#include <iomanip>
#include <vector>
+#if WITH_BOOST
+#include <boost/math/distributions.hpp>
+#endif
+
#include "maskedmatrix.h"
#include "phedata.h"
#include "data.h"
@@ -55,6 +59,7 @@
std::cout << std::setprecision(6);
}
+
/**
* Open an output file for each model when using probability data
* (npgreds == 2). This function creates the _2df.out.txt etc. files.
@@ -238,6 +243,9 @@
#endif
}
*outfile[0] << input_var.getSep() << "chi2_SNP";
+#if WITH_BOOST
+ *outfile[0] << input_var.getSep() << "pval_SNP";
+#endif
*outfile[0] << "\n";
} // ngpreds == 1
else if (input_var.getNgpreds() == 2) // prob data: all models
@@ -315,11 +323,25 @@
//Oct 26, 2009
}
}
- *outfile[0] << input_var.getSep() << "chi2_SNP_2df\n";
- *outfile[1] << input_var.getSep() << "chi2_SNP_A1\n";
- *outfile[2] << input_var.getSep() << "chi2_SNP_domA1\n";
- *outfile[3] << input_var.getSep() << "chi2_SNP_recA1\n";
- *outfile[4] << input_var.getSep() << "chi2_SNP_odomA1\n";
+ *outfile[0] << input_var.getSep() << "chi2_SNP_2df";
+ *outfile[1] << input_var.getSep() << "chi2_SNP_A1";
+ *outfile[2] << input_var.getSep() << "chi2_SNP_domA1";
+ *outfile[3] << input_var.getSep() << "chi2_SNP_recA1";
+ *outfile[4] << input_var.getSep() << "chi2_SNP_odomA1";
+
+#ifdef WITH_BOOST
+ *outfile[0] << input_var.getSep() << "pval_SNP_2df";
+ *outfile[1] << input_var.getSep() << "pval_SNP_A1";
+ *outfile[2] << input_var.getSep() << "pval_SNP_domA1";
+ *outfile[3] << input_var.getSep() << "pval_SNP_recA1";
+ *outfile[4] << input_var.getSep() << "pval_SNP_odomA1";
+#endif
+
+ *outfile[0] << endl;
+ *outfile[1] << endl;
+ *outfile[2] << endl;
+ *outfile[3] << endl;
+ *outfile[4] << endl;
} // End: ngpreds == 2
else
{
@@ -426,3 +448,34 @@
return start_pos;
}
+
+
+#ifdef WITH_BOOST
+/**
+ * Calculate the p-value based on the chi^2 distribution.
+ *
+ * \param chi2 chi^2 value for which the p-value should be
+ * calculated.
+ * \param df degrees of freedom of the chi^2 distribution
+ */
+double pchisq(const double chi2, const int df)
+{
+ double pval;
+
+ if (!std::isnan(chi2))
+ {
+ // Initialise the distribution
+ boost::math::chi_squared chi2dist(df);
+
+ /* Use the complement here (in R we would also set
+ * lower.tail=FALSE)
+ */
+ pval = boost::math::cdf(complement(chi2dist, chi2));
+ }
+ else
+ {
+ pval = NAN;
+ }
+ return pval;
+}
+#endif
Modified: branches/ProbABEL-pvals/ProbABEL/src/main_functions_dump.h
===================================================================
--- branches/ProbABEL-pvals/ProbABEL/src/main_functions_dump.h 2014-01-12 20:10:41 UTC (rev 1545)
+++ branches/ProbABEL-pvals/ProbABEL/src/main_functions_dump.h 2014-01-13 14:07:04 UTC (rev 1546)
@@ -35,4 +35,6 @@
int get_start_position(const cmdvars& input_var, const int model,
const int number_of_rows_or_columns);
+
+double pchisq(const double chi2, const int df);
#endif /* MAIN_FUNCTIONS_DUMP_H_ */
More information about the Genabel-commits
mailing list