[Genabel-commits] r990 - in pkg/ProbABEL: . doc src tests tests/known_good_results
noreply at r-forge.r-project.org
noreply at r-forge.r-project.org
Mon Oct 29 13:11:27 CET 2012
Author: lckarssen
Date: 2012-10-29 13:11:27 +0100 (Mon, 29 Oct 2012)
New Revision: 990
Added:
pkg/ProbABEL/tests/
pkg/ProbABEL/tests/Makefile.am
pkg/ProbABEL/tests/check_probabel.pl_chunk.sh
pkg/ProbABEL/tests/known_good_results/
pkg/ProbABEL/tests/known_good_results/height_add.out.txt
Modified:
pkg/ProbABEL/Makefile.am
pkg/ProbABEL/configure.ac
pkg/ProbABEL/doc/CHANGES.LOG
pkg/ProbABEL/src/probabel.pl
pkg/ProbABEL/src/probabel_config.cfg.example
Log:
- probabel.pl and probabel_config.cfg now accept _._chunk_._ as well as _._chr_._ as variables. This allows the user to have his imputed data (dose, prob, info and map files) split up into pieces.
- Added tests directory that will contain tests that are too complicated to be called examples. This also contains a directory 'know_good_results' so we can check for regressions.
- Preparing for release of v.0.2.1 (changed version number in configure.ac)
- Minor code layout fixes in configure.ac
Copyright for these changes lies with the Erasmus MC, Rotterdam.
Modified: pkg/ProbABEL/Makefile.am
===================================================================
--- pkg/ProbABEL/Makefile.am 2012-10-28 11:14:21 UTC (rev 989)
+++ pkg/ProbABEL/Makefile.am 2012-10-29 12:11:27 UTC (rev 990)
@@ -1,4 +1,4 @@
## Process this file with automake to produce Makefile.in
AUTOMAKE_OPTIONS = foreign
-SUBDIRS = src doc examples
+SUBDIRS = src doc examples tests
Modified: pkg/ProbABEL/configure.ac
===================================================================
--- pkg/ProbABEL/configure.ac 2012-10-28 11:14:21 UTC (rev 989)
+++ pkg/ProbABEL/configure.ac 2012-10-29 12:11:27 UTC (rev 990)
@@ -2,8 +2,8 @@
# Process this file with autoconf to produce a configure script.
AC_PREREQ([2.67])
-AC_INIT(ProbABEL, 0.2.0, genabel-devel at r-forge.wu-wien.ac.at)
-AM_INIT_AUTOMAKE(ProbABEL, 0.2.0)
+AC_INIT(ProbABEL, 0.2.1-beta, genabel-devel at r-forge.wu-wien.ac.at)
+AM_INIT_AUTOMAKE(ProbABEL, 0.2.1-beta)
AC_CONFIG_SRCDIR([src/data.h])
AC_CONFIG_HEADERS([src/config.h])
@@ -28,7 +28,8 @@
# Checks for header files.
AC_FUNC_ALLOCA
-AC_CHECK_HEADERS([float.h inttypes.h libintl.h limits.h stddef.h stdint.h stdlib.h string.h sys/param.h wchar.h wctype.h])
+AC_CHECK_HEADERS([float.h inttypes.h libintl.h limits.h stddef.h stdint.h \
+ stdlib.h string.h sys/param.h wchar.h wctype.h])
# Checks for typedefs, structures, and compiler characteristics.
AC_HEADER_STDBOOL
@@ -52,6 +53,12 @@
AC_PROG_SED
# Check for presence of pdfLaTeX
+AC_CHECK_PROG(CUT, cut, cut)
+if test -z "$CUT"; then
+ AC_MSG_WARN([Can't find the 'cut' binary. Some tests won't run correctly])
+fi
+
+# Check for presence of pdfLaTeX
AC_CHECK_PROG(PDFLATEX, pdflatex, pdflatex)
if test -z "$PDFLATEX"; then
AC_MSG_WARN([Unable to create PDF version of the user manual])
@@ -61,7 +68,8 @@
# Since pacoxph is buggy it needs to be enabled explixitly
AC_ARG_ENABLE([pacoxph],
- [AS_HELP_STRING([--enable-pacoxph], [build the pacoxph binary (still contains lots of bugs)])],
+ [AS_HELP_STRING([--enable-pacoxph], [build the pacoxph binary (still \
+contains lots of bugs)])],
[pacoxph=${enableval}],
[pacoxph=no])
@@ -77,6 +85,7 @@
src/Makefile
doc/Makefile
examples/Makefile
+ tests/Makefile
])
# Create output files
AC_OUTPUT
Modified: pkg/ProbABEL/doc/CHANGES.LOG
===================================================================
--- pkg/ProbABEL/doc/CHANGES.LOG 2012-10-28 11:14:21 UTC (rev 989)
+++ pkg/ProbABEL/doc/CHANGES.LOG 2012-10-29 12:11:27 UTC (rev 990)
@@ -1,9 +1,10 @@
-*****
-* Fixed bug #2295: the inverse variance-covariance matrix (used with the
- --mmscore option) was incorrectly subsetted when NAs are present for
- one or more SNP dosages. As a result the invvarmatrix that was
- actually used in the regression contained rows and columns of
- zeroes. Thanks to Maarten Kooyman for reporting this bug.
+***** v.0.2.1 (2012)
+* Fixed bug #2295: the inverse variance-covariance matrix (used with
+ the --mmscore option) was incorrectly subsetted when NAs are present
+ for one or more SNP dosages (so this is not an issue for people using
+ imputed data). As a result the invvarmatrix that was actually used in
+ the regression contained rows and columns of zeroes. Thanks to Maarten
+ Kooyman for reporting this bug.
* Fixed bug #1186: When .map file is missing (but --map option was
given), the wrong error message was displayed. Thanks to Nicola
@@ -18,7 +19,16 @@
probabel.pl can now also run Y chromosome analysis and the help
message has been updated.
+* probabel.pl and probabel_config.cfg now also accept chunks, where
+ dose, prob, info and map files are split into multiple chunks. This
+ is now the default for people following the 1000 genomes imputation
+ cookbook for MaCH/minimac (the recipe uses the chunkchromosome tool
+ to split the data into smaller pieces, speeding up imputation on
+ computer clusters). See probabel_config.cfg.example for an
+ example. (Lennart)
+
+
***** v.0.2.0 (2012.06.10)
* The v.0.1-9e fix for working with prob files in pacoxph has been
forward-ported to this branch as well (Lennart and Yurii).
Modified: pkg/ProbABEL/src/probabel.pl
===================================================================
--- pkg/ProbABEL/src/probabel.pl 2012-10-28 11:14:21 UTC (rev 989)
+++ pkg/ProbABEL/src/probabel.pl 2012-10-29 12:11:27 UTC (rev 990)
@@ -23,6 +23,7 @@
# Separators in the config file
my $separator_cfg = ",";
my $chr_replacement = "_._chr_._";
+my $chunk_replacement = "_._chunk_._";
# Set file locations
my $base_path = "./";
@@ -43,7 +44,7 @@
#==========================================================================
# Read config file
-open(CFG, "$config") or die "Reading configuration file failed: $!" .
+open(CFG, "$config") or die "Reading configuration file $config failed: $!" .
"\nDid you forget to edit and rename the probabel_config.cfg.example file?\n";
<CFG>; #skip the first line (header)
@@ -214,24 +215,62 @@
# Commands for the autosomes
for($chr=$startchr; $chr<=$endchr; $chr++) {
- $mlinfo_arg = $mlinfo;
- $mlinfo_arg =~ s/$chr_replacement/$chr/g;
- $mldose_arg = $mldose;
- $mldose_arg =~ s/$chr_replacement/$chr/g;
+ my $nrchunks = 0;
+ # Find out the number of chunks for the current chromosome
+ my $infofiles = $mlinfo;
+ $infofiles =~ s/$chr_replacement/$chr/g;
+ $infofiles =~ s/$chunk_replacement/*/g;
+ $nrchunks = `ls $infofiles | wc -l`;
+ print "Nr. of chunks: $nrchunks";
- $legend_arg = $legend;
- $legend_arg =~ s/$chr_replacement/$chr/g;
+ # Loop over all chunks
+ for (my $chunk=1; $chunk <= $nrchunks; $chunk++)
+ {
+ if($hadhead==0) {
+ $head="";
+ $hadhead=1;
+ } else {
+ $head="--no-head";
+ }
+ $mlinfo_arg = $mlinfo;
+ $mlinfo_arg =~ s/$chr_replacement/$chr/g;
+ $mlinfo_arg =~ s/$chunk_replacement/$chunk/g;
- if($hadhead==0) {
- $head="";
- $hadhead=1;
- } else {
- $head="--no-head";
+ $mldose_arg = $mldose;
+ $mldose_arg =~ s/$chr_replacement/$chr/g;
+ $mldose_arg =~ s/$chunk_replacement/$chunk/g;
+
+ $legend_arg = $legend;
+ $legend_arg =~ s/$chr_replacement/$chr/g;
+ $legend_arg =~ s/$chunk_replacement/$chunk/g;
+
+ system "$prog -p $phename.PHE --ngpreds $model_option_num -i $mlinfo_arg -d $mldose_arg -m $legend_arg --chrom $chr -o $phename.chunk$chunk.chr$chr $head $keys";
+
+ if($model_option_num==2)
+ {
+ `cat $phename.chunk${chunk}.chr${chr}$_2df_file_postfix >> ${phename}.${chr}${_2df_file_postfix}`;
+ `rm $phename.chunk${chunk}.chr${chr}$_2df_file_postfix`;
+
+ `cat $phename.chunk${chunk}.chr${chr}$_add_file_postfix >> ${phename}.${chr}${_add_file_postfix}`;
+ `rm $phename.chunk${chunk}.chr${chr}$_add_file_postfix`;
+
+ `cat $phename.chunk${chunk}.chr${chr}$_domin_file_postfix >> ${phename}.${chr}${_domin_file_postfix}`;
+ `rm $phename.chunk${chunk}.chr${chr}$_domin_file_postfix`;
+
+ `cat $phename.chunk${chunk}.chr${chr}$_recess_file_postfix >> ${phename}.${chr}${_recess_file_postfix}`;
+ `rm $phename.chunk${chunk}.chr${chr}$_recess_file_postfix`;
+
+ `cat $phename.chunk${chunk}.chr${chr}$_over_domin_file_postfix >> ${phename}.${chr}${_over_domin_file_postfix}`;
+ `rm $phename.chunk${chunk}.chr${chr}$_over_domin_file_postfix`;
+ } else {
+ `cat $phename.chunk${chunk}.chr${chr}$_add_file_postfix >> $phename.${chr}${_add_file_postfix}`;
+ print "cat $phename.chunk${chunk}.chr${chr}$_add_file_postfix >> $phename.chr${chr}${_add_file_postfix}\n";
+ `rm $phename.chunk${chunk}.chr${chr}$_add_file_postfix`;
+ print "rm $phename.chunk${chunk}.chr${chr}$_add_file_postfix\n";
+ }
}
- print `$prog -p $phename.PHE --ngpreds $model_option_num -i $mlinfo_arg -d $mldose_arg -m $legend_arg --chrom $chr -o $phename.$chr $head $keys`;
-
if($model_option_num==2)
{
`cat $phename.${chr}$_2df_file_postfix >> ${phename}${_2df_file_postfix}`;
Modified: pkg/ProbABEL/src/probabel_config.cfg.example
===================================================================
--- pkg/ProbABEL/src/probabel_config.cfg.example 2012-10-28 11:14:21 UTC (rev 989)
+++ pkg/ProbABEL/src/probabel_config.cfg.example 2012-10-29 12:11:27 UTC (rev 990)
@@ -17,9 +17,10 @@
#
# mldose_path: /data/Cohort1/MachFiles/_._chr_._/mach_._chr_._-out.mldose
#
-# For chromosome 1 this will be replaced with: /data/Cohort1/MachFiles/1/1-out.mldose
+# For chromosome 1 this will be replaced with:
+# /data/Cohort1/MachFiles/1/mach1-out.mldose
#
#
-ERGO,script_test/merlin._._chr_._.collected.ped.out.mlinfo,script_test/merlin._._chr_._.collected.ped.out.mldose,script_test/merlin._._chr_._.collected.ped.out.mlprob,script_test/genotypes_chr_._chr_.__CEU_r22_nr.b36_fwd_legend.txt
-ERGOPLUS,/mnt/space/ergoplus/ergoplus.chr_._chr_._.mach.mlinfo,/mnt/space/ergoplus/ergoplus.chr_._chr_._.mach.mldose,/mnt/space/ergoplus/ergoplus.chr_._chr_._.mach.mlprob,/mnt/impute/impute_b36/collected/genotypes_chr_._chr_.__CEU_r22_nr.b36_fwd_legend.txt
+STUDY_1,script_test/merlin._._chr_._.collected.ped.out.mlinfo,script_test/merlin._._chr_._.collected.ped.out.mldose,script_test/merlin._._chr_._.collected.ped.out.mlprob,script_test/genotypes_chr_._chr_.__CEU_r22_nr.b36_fwd_legend.txt
+STUDY_2,/tmp/PAtest/chunk_._chunk_._-chr_._chr_._.mlinfo,/tmp/PAtest/chunk_._chunk_._-chr_._chr_._.mldose,/tmp/PAtest/chunk_._chunk_._-chr_._chr_._.mlprob,/tmp/PAtest/genotypes_chr_._chr_.__CEU_r22_nr.b36_fwd_legend.txt
Property changes on: pkg/ProbABEL/tests
___________________________________________________________________
Added: svn:ignore
+ Makefile.in
Added: pkg/ProbABEL/tests/Makefile.am
===================================================================
--- pkg/ProbABEL/tests/Makefile.am (rev 0)
+++ pkg/ProbABEL/tests/Makefile.am 2012-10-29 12:11:27 UTC (rev 990)
@@ -0,0 +1,23 @@
+## Process this file with automake to produce Makefile.in
+
+AUTOMAKE_OPTIONS = foreign color-tests
+
+testsfiles = check_probabel.pl_chunk.sh
+known_results=known_good_results/height_add.out.txt
+testsdir = $(pkgdatadir)/tests
+dist_tests_DATA = $(testsfiles) $(known_results)
+
+TESTS_ENVIRONMENT = sh
+check_SCRIPTS = check_probabel.pl_chunk.sh
+
+TESTS = $(check_SCRIPTS)
+
+chunk_files=chunk1.chr1.dose chunk2.chr1.dose chunk1.chr2.dose \
+ chunk2.chr2.dose chunk1.chr1.map chunk2.chr1.map chunk1.chr2.map \
+ chunk2.chr2.map chunk1.chr1.info chunk2.chr1.info chunk1.chr2.info \
+ chunk2.chr2.info chr1.dose chr2.dose chr1.info chr2.info chr1.map \
+ chr2.map probabel.pl probabel_config.cfg height.PHE \
+ height_add.out.txt
+
+
+CLEANFILES = $(chunk_files)
Added: pkg/ProbABEL/tests/check_probabel.pl_chunk.sh
===================================================================
--- pkg/ProbABEL/tests/check_probabel.pl_chunk.sh (rev 0)
+++ pkg/ProbABEL/tests/check_probabel.pl_chunk.sh 2012-10-29 12:11:27 UTC (rev 990)
@@ -0,0 +1,97 @@
+#!/bin/bash
+# L.C. Karssen
+# This script is used to test whether probabel.pl works correctly when
+# input is cut up in chunks
+
+echo "Testing probabel.pl..."
+
+if [ -z ${srcdir} ]; then
+ srcdir="."
+fi
+exampledir="${srcdir}/../examples/"
+padir="${srcdir}/../src/"
+results="${srcdir}/known_good_results/"
+
+dosefile="$exampledir/test.mldose"
+infofile="$exampledir/test.mlinfo"
+mapfile="$exampledir/test.map"
+phenofile="$exampledir/height.txt"
+outfile="height_add.out.txt"
+
+probabel="${padir}/probabel.pl"
+probabelcfg="${padir}/probabel_config.cfg.example"
+chunksep="_._chunk_._"
+chrsep="_._chr_._"
+
+# Prepare probabel.pl and the config file
+sed 's;"./";"../src/";g' $probabel > probabel.pl
+chmod a+x probabel.pl
+cp $probabelcfg probabel_config.cfg
+chmod +w probabel_config.cfg # Need this for make distcheck to work
+cp $phenofile height.PHE
+
+base="chr${chrsep}"
+echo "TestCohortNoChunk,$base.info,$base.dose,$base.prob,$base.map" \
+ >> probabel_config.cfg
+
+base="chunk${chunksep}.chr${chrsep}"
+echo "TestCohortChunk,$base.info,$base.dose,$base.prob,$base.map" \
+ >> probabel_config.cfg
+
+
+# ------------------ No chunks test -------------------
+rm -f $outfile
+
+# Split the dose and info files up into two chromosomes with some
+# chunks
+cut -d" " -f1,2,3,4 $dosefile > chr1.dose
+cut -d" " -f1,2,5-7 $dosefile > chr2.dose
+
+sed -n '1,3p' $infofile > chr1.info
+sed -n '1p' $infofile > chr2.info
+sed -n '4,6p' $infofile >> chr2.info
+
+sed -n '1,3p' $mapfile > chr1.map
+sed -n '1p' $mapfile > chr2.map
+sed -n '4,6p' $mapfile >> chr2.map
+
+# Run an analysis
+./probabel.pl 1 2 linear TestCohortNoChunk --additive height
+
+# Final check:
+echo "Checking output without chunks:"
+diff $outfile $results/$outfile
+
+
+# ------------------ Chunks test ----------------------
+rm -f $outfile
+
+# Split the dose and info files up into two chromosomes with some
+# chunks
+cut -d" " -f1,2,3 $dosefile > chunk1.chr1.dose
+cut -d" " -f1,2,4 $dosefile > chunk2.chr1.dose
+cut -d" " -f1,2,5,6 $dosefile > chunk1.chr2.dose
+cut -d" " -f1,2,7 $dosefile > chunk2.chr2.dose
+
+sed -n '1,2p' $infofile > chunk1.chr1.info
+sed -n '1p' $infofile > chunk2.chr1.info
+sed -n '3p' $infofile >> chunk2.chr1.info
+sed -n '1p' $infofile > chunk1.chr2.info
+sed -n '4,5p' $infofile >> chunk1.chr2.info
+sed -n '1p' $infofile > chunk2.chr2.info
+sed -n '6p' $infofile >> chunk2.chr2.info
+
+sed -n '1,2p' $mapfile > chunk1.chr1.map
+sed -n '1p' $mapfile > chunk2.chr1.map
+sed -n '3p' $mapfile >> chunk2.chr1.map
+sed -n '1p' $mapfile > chunk1.chr2.map
+sed -n '4,5p' $mapfile >> chunk1.chr2.map
+sed -n '1p' $mapfile > chunk2.chr2.map
+sed -n '6p' $mapfile >> chunk2.chr2.map
+
+# Run an analysis
+./probabel.pl 1 2 linear TestCohortChunk --additive height
+
+# Final check:
+echo "Checking output with chunks:"
+diff $outfile $results/$outfile
Property changes on: pkg/ProbABEL/tests/check_probabel.pl_chunk.sh
___________________________________________________________________
Added: svn:executable
+ *
Added: pkg/ProbABEL/tests/known_good_results/height_add.out.txt
===================================================================
--- pkg/ProbABEL/tests/known_good_results/height_add.out.txt (rev 0)
+++ pkg/ProbABEL/tests/known_good_results/height_add.out.txt 2012-10-29 12:11:27 UTC (rev 990)
@@ -0,0 +1,6 @@
+name A1 A2 Freq1 MAF Quality Rsq n Mean_predictor_allele chrom position beta_SNP_add sebeta_SNP_add loglik
+rs7247199 G A 0.5847 0.415 0.9299 0.8666 182 0.56444 1 204938 -0.218693 0.734966 -428.971
+rs8102643 C T 0.5847 0.415 0.9308 0.8685 182 0.564412 1 207859 -0.218352 0.734214 -428.971
+rs8102615 T A 0.5006 0.4702 0.9375 0.8932 181 0.498997 2 211970 0.280289 0.728449 -427.081
+rs8105536 G A 0.5783 0.4213 0.9353 0.8832 182 0.561132 2 212033 -0.216918 0.7259 -428.971
+rs2312724 T C 0.9122 0.0877 0.9841 0.9232 182 0.929684 2 217034 2.23335 1.30142 -427.523
More information about the Genabel-commits
mailing list