[Genabel-commits] r990 - in pkg/ProbABEL: . doc src tests tests/known_good_results

noreply at r-forge.r-project.org noreply at r-forge.r-project.org
Mon Oct 29 13:11:27 CET 2012


Author: lckarssen
Date: 2012-10-29 13:11:27 +0100 (Mon, 29 Oct 2012)
New Revision: 990

Added:
   pkg/ProbABEL/tests/
   pkg/ProbABEL/tests/Makefile.am
   pkg/ProbABEL/tests/check_probabel.pl_chunk.sh
   pkg/ProbABEL/tests/known_good_results/
   pkg/ProbABEL/tests/known_good_results/height_add.out.txt
Modified:
   pkg/ProbABEL/Makefile.am
   pkg/ProbABEL/configure.ac
   pkg/ProbABEL/doc/CHANGES.LOG
   pkg/ProbABEL/src/probabel.pl
   pkg/ProbABEL/src/probabel_config.cfg.example
Log:
- probabel.pl and probabel_config.cfg now accept _._chunk_._ as well as _._chr_._ as variables. This allows the user to have his imputed data (dose, prob, info and map files) split up into pieces. 

- Added tests directory that will contain tests that are too complicated to be called examples. This also contains a directory 'know_good_results' so we can check for regressions.  

- Preparing for release of v.0.2.1 (changed version number in configure.ac)

- Minor code layout fixes in configure.ac


Copyright for these changes lies with the Erasmus MC, Rotterdam.



Modified: pkg/ProbABEL/Makefile.am
===================================================================
--- pkg/ProbABEL/Makefile.am	2012-10-28 11:14:21 UTC (rev 989)
+++ pkg/ProbABEL/Makefile.am	2012-10-29 12:11:27 UTC (rev 990)
@@ -1,4 +1,4 @@
 ## Process this file with automake to produce Makefile.in
 
 AUTOMAKE_OPTIONS = foreign
-SUBDIRS = src doc examples
+SUBDIRS = src doc examples tests

Modified: pkg/ProbABEL/configure.ac
===================================================================
--- pkg/ProbABEL/configure.ac	2012-10-28 11:14:21 UTC (rev 989)
+++ pkg/ProbABEL/configure.ac	2012-10-29 12:11:27 UTC (rev 990)
@@ -2,8 +2,8 @@
 # Process this file with autoconf to produce a configure script.
 
 AC_PREREQ([2.67])
-AC_INIT(ProbABEL, 0.2.0, genabel-devel at r-forge.wu-wien.ac.at)
-AM_INIT_AUTOMAKE(ProbABEL, 0.2.0)
+AC_INIT(ProbABEL, 0.2.1-beta, genabel-devel at r-forge.wu-wien.ac.at)
+AM_INIT_AUTOMAKE(ProbABEL, 0.2.1-beta)
 AC_CONFIG_SRCDIR([src/data.h])
 AC_CONFIG_HEADERS([src/config.h])
 
@@ -28,7 +28,8 @@
 
 # Checks for header files.
 AC_FUNC_ALLOCA
-AC_CHECK_HEADERS([float.h inttypes.h libintl.h limits.h stddef.h stdint.h stdlib.h string.h sys/param.h wchar.h wctype.h])
+AC_CHECK_HEADERS([float.h inttypes.h libintl.h limits.h stddef.h stdint.h \
+	 		  stdlib.h string.h sys/param.h wchar.h wctype.h])
 
 # Checks for typedefs, structures, and compiler characteristics.
 AC_HEADER_STDBOOL
@@ -52,6 +53,12 @@
 AC_PROG_SED
 
 # Check for presence of pdfLaTeX
+AC_CHECK_PROG(CUT, cut, cut)
+if test -z "$CUT"; then
+  AC_MSG_WARN([Can't find the 'cut' binary. Some tests won't run correctly])
+fi
+
+# Check for presence of pdfLaTeX
 AC_CHECK_PROG(PDFLATEX, pdflatex, pdflatex)
 if test -z "$PDFLATEX"; then
   AC_MSG_WARN([Unable to create PDF version of the user manual])
@@ -61,7 +68,8 @@
 
 # Since pacoxph is buggy it needs to be enabled explixitly
 AC_ARG_ENABLE([pacoxph],
-    [AS_HELP_STRING([--enable-pacoxph], [build the pacoxph binary (still contains lots of bugs)])],
+    [AS_HELP_STRING([--enable-pacoxph], [build the pacoxph binary (still \
+contains lots of bugs)])],
     [pacoxph=${enableval}],
     [pacoxph=no])
 
@@ -77,6 +85,7 @@
 	src/Makefile
 	doc/Makefile
 	examples/Makefile
+	tests/Makefile
 ])
 # Create output files
 AC_OUTPUT

Modified: pkg/ProbABEL/doc/CHANGES.LOG
===================================================================
--- pkg/ProbABEL/doc/CHANGES.LOG	2012-10-28 11:14:21 UTC (rev 989)
+++ pkg/ProbABEL/doc/CHANGES.LOG	2012-10-29 12:11:27 UTC (rev 990)
@@ -1,9 +1,10 @@
-*****
-* Fixed bug #2295: the inverse variance-covariance matrix (used with the
-  --mmscore option) was incorrectly subsetted when NAs are present for
-  one or more SNP dosages. As a result the invvarmatrix that was
-  actually used in the regression contained rows and columns of
-  zeroes. Thanks to Maarten Kooyman for reporting this bug.
+***** v.0.2.1 (2012)
+* Fixed bug #2295: the inverse variance-covariance matrix (used with
+  the --mmscore option) was incorrectly subsetted when NAs are present
+  for one or more SNP dosages (so this is not an issue for people using
+  imputed data). As a result the invvarmatrix that was actually used in
+  the regression contained rows and columns of zeroes. Thanks to Maarten
+  Kooyman for reporting this bug.
 
 * Fixed bug #1186: When .map file is missing (but --map option was
   given), the wrong error message was displayed. Thanks to Nicola
@@ -18,7 +19,16 @@
   probabel.pl can now also run Y chromosome analysis and the help
   message has been updated.
 
+* probabel.pl and probabel_config.cfg now also accept chunks, where
+  dose, prob, info and map files are split into multiple chunks. This
+  is now the default for people following the 1000 genomes imputation
+  cookbook for MaCH/minimac (the recipe uses the chunkchromosome tool
+  to split the data into smaller pieces, speeding up imputation on
+  computer clusters). See probabel_config.cfg.example for an
+  example. (Lennart)
 
+
+
 ***** v.0.2.0 (2012.06.10)
 * The v.0.1-9e fix for working with prob files in pacoxph has been
   forward-ported to this branch as well (Lennart and Yurii).

Modified: pkg/ProbABEL/src/probabel.pl
===================================================================
--- pkg/ProbABEL/src/probabel.pl	2012-10-28 11:14:21 UTC (rev 989)
+++ pkg/ProbABEL/src/probabel.pl	2012-10-29 12:11:27 UTC (rev 990)
@@ -23,6 +23,7 @@
 # Separators in the config file
 my $separator_cfg = ",";
 my $chr_replacement = "_._chr_._";
+my $chunk_replacement = "_._chunk_._";
 
 # Set file locations
 my $base_path = "./";
@@ -43,7 +44,7 @@
 
 #==========================================================================
 # Read config file
-open(CFG, "$config") or die "Reading configuration file failed: $!" .
+open(CFG, "$config") or die "Reading configuration file $config failed: $!" .
     "\nDid you forget to edit and rename the probabel_config.cfg.example file?\n";
 
 <CFG>; #skip the first line (header)
@@ -214,24 +215,62 @@
 
 # Commands for the autosomes
 for($chr=$startchr; $chr<=$endchr; $chr++) {
-    $mlinfo_arg = $mlinfo;
-    $mlinfo_arg =~ s/$chr_replacement/$chr/g;
 
-    $mldose_arg = $mldose;
-    $mldose_arg =~ s/$chr_replacement/$chr/g;
+    my $nrchunks = 0;
+    # Find out the number of chunks for the current chromosome
+    my $infofiles = $mlinfo;
+    $infofiles =~ s/$chr_replacement/$chr/g;
+    $infofiles =~ s/$chunk_replacement/*/g;
+    $nrchunks = `ls $infofiles | wc -l`;
+    print "Nr. of chunks: $nrchunks";
 
-    $legend_arg = $legend;
-    $legend_arg =~ s/$chr_replacement/$chr/g;
+    # Loop over all chunks
+    for (my $chunk=1; $chunk <= $nrchunks; $chunk++)
+    {
+	if($hadhead==0) {
+	    $head="";
+	    $hadhead=1;
+	} else {
+	    $head="--no-head";
+	}
+	$mlinfo_arg = $mlinfo;
+	$mlinfo_arg =~ s/$chr_replacement/$chr/g;
+	$mlinfo_arg =~ s/$chunk_replacement/$chunk/g;
 
-    if($hadhead==0) {
-	$head="";
-	$hadhead=1;
-    } else {
-	$head="--no-head";
+	$mldose_arg = $mldose;
+	$mldose_arg =~ s/$chr_replacement/$chr/g;
+	$mldose_arg =~ s/$chunk_replacement/$chunk/g;
+
+	$legend_arg = $legend;
+	$legend_arg =~ s/$chr_replacement/$chr/g;
+	$legend_arg =~ s/$chunk_replacement/$chunk/g;
+
+	system "$prog -p $phename.PHE --ngpreds $model_option_num -i $mlinfo_arg -d $mldose_arg -m $legend_arg --chrom $chr -o $phename.chunk$chunk.chr$chr $head $keys";
+
+	if($model_option_num==2)
+	{
+	    `cat $phename.chunk${chunk}.chr${chr}$_2df_file_postfix >> ${phename}.${chr}${_2df_file_postfix}`;
+	    `rm $phename.chunk${chunk}.chr${chr}$_2df_file_postfix`;
+
+	    `cat $phename.chunk${chunk}.chr${chr}$_add_file_postfix >> ${phename}.${chr}${_add_file_postfix}`;
+	    `rm $phename.chunk${chunk}.chr${chr}$_add_file_postfix`;
+
+	    `cat $phename.chunk${chunk}.chr${chr}$_domin_file_postfix >> ${phename}.${chr}${_domin_file_postfix}`;
+	    `rm $phename.chunk${chunk}.chr${chr}$_domin_file_postfix`;
+
+	    `cat $phename.chunk${chunk}.chr${chr}$_recess_file_postfix >> ${phename}.${chr}${_recess_file_postfix}`;
+	    `rm $phename.chunk${chunk}.chr${chr}$_recess_file_postfix`;
+
+	    `cat $phename.chunk${chunk}.chr${chr}$_over_domin_file_postfix >> ${phename}.${chr}${_over_domin_file_postfix}`;
+	    `rm $phename.chunk${chunk}.chr${chr}$_over_domin_file_postfix`;
+	} else {
+	    `cat $phename.chunk${chunk}.chr${chr}$_add_file_postfix >> $phename.${chr}${_add_file_postfix}`;
+	    print "cat $phename.chunk${chunk}.chr${chr}$_add_file_postfix >> $phename.chr${chr}${_add_file_postfix}\n";
+	    `rm $phename.chunk${chunk}.chr${chr}$_add_file_postfix`;
+	    print "rm $phename.chunk${chunk}.chr${chr}$_add_file_postfix\n";
+	}
     }
 
-    print `$prog -p $phename.PHE --ngpreds $model_option_num -i $mlinfo_arg -d $mldose_arg -m $legend_arg --chrom $chr -o $phename.$chr $head $keys`;
-
     if($model_option_num==2)
     {
 	`cat $phename.${chr}$_2df_file_postfix >> ${phename}${_2df_file_postfix}`;

Modified: pkg/ProbABEL/src/probabel_config.cfg.example
===================================================================
--- pkg/ProbABEL/src/probabel_config.cfg.example	2012-10-28 11:14:21 UTC (rev 989)
+++ pkg/ProbABEL/src/probabel_config.cfg.example	2012-10-29 12:11:27 UTC (rev 990)
@@ -17,9 +17,10 @@
 #
 # mldose_path: /data/Cohort1/MachFiles/_._chr_._/mach_._chr_._-out.mldose
 #
-# For chromosome 1 this will be replaced with: /data/Cohort1/MachFiles/1/1-out.mldose
+# For chromosome 1 this will be replaced with:
+#             /data/Cohort1/MachFiles/1/mach1-out.mldose
 #
 #
 
-ERGO,script_test/merlin._._chr_._.collected.ped.out.mlinfo,script_test/merlin._._chr_._.collected.ped.out.mldose,script_test/merlin._._chr_._.collected.ped.out.mlprob,script_test/genotypes_chr_._chr_.__CEU_r22_nr.b36_fwd_legend.txt
-ERGOPLUS,/mnt/space/ergoplus/ergoplus.chr_._chr_._.mach.mlinfo,/mnt/space/ergoplus/ergoplus.chr_._chr_._.mach.mldose,/mnt/space/ergoplus/ergoplus.chr_._chr_._.mach.mlprob,/mnt/impute/impute_b36/collected/genotypes_chr_._chr_.__CEU_r22_nr.b36_fwd_legend.txt
+STUDY_1,script_test/merlin._._chr_._.collected.ped.out.mlinfo,script_test/merlin._._chr_._.collected.ped.out.mldose,script_test/merlin._._chr_._.collected.ped.out.mlprob,script_test/genotypes_chr_._chr_.__CEU_r22_nr.b36_fwd_legend.txt
+STUDY_2,/tmp/PAtest/chunk_._chunk_._-chr_._chr_._.mlinfo,/tmp/PAtest/chunk_._chunk_._-chr_._chr_._.mldose,/tmp/PAtest/chunk_._chunk_._-chr_._chr_._.mlprob,/tmp/PAtest/genotypes_chr_._chr_.__CEU_r22_nr.b36_fwd_legend.txt


Property changes on: pkg/ProbABEL/tests
___________________________________________________________________
Added: svn:ignore
   + Makefile.in


Added: pkg/ProbABEL/tests/Makefile.am
===================================================================
--- pkg/ProbABEL/tests/Makefile.am	                        (rev 0)
+++ pkg/ProbABEL/tests/Makefile.am	2012-10-29 12:11:27 UTC (rev 990)
@@ -0,0 +1,23 @@
+## Process this file with automake to produce Makefile.in
+
+AUTOMAKE_OPTIONS = foreign color-tests
+
+testsfiles = check_probabel.pl_chunk.sh
+known_results=known_good_results/height_add.out.txt
+testsdir = $(pkgdatadir)/tests
+dist_tests_DATA = $(testsfiles) $(known_results)
+
+TESTS_ENVIRONMENT = sh
+check_SCRIPTS = check_probabel.pl_chunk.sh
+
+TESTS = $(check_SCRIPTS)
+
+chunk_files=chunk1.chr1.dose chunk2.chr1.dose chunk1.chr2.dose		\
+ chunk2.chr2.dose chunk1.chr1.map chunk2.chr1.map chunk1.chr2.map	\
+ chunk2.chr2.map chunk1.chr1.info chunk2.chr1.info chunk1.chr2.info	\
+ chunk2.chr2.info chr1.dose chr2.dose chr1.info chr2.info chr1.map	\
+ chr2.map probabel.pl probabel_config.cfg height.PHE			\
+ height_add.out.txt
+
+
+CLEANFILES = $(chunk_files)

Added: pkg/ProbABEL/tests/check_probabel.pl_chunk.sh
===================================================================
--- pkg/ProbABEL/tests/check_probabel.pl_chunk.sh	                        (rev 0)
+++ pkg/ProbABEL/tests/check_probabel.pl_chunk.sh	2012-10-29 12:11:27 UTC (rev 990)
@@ -0,0 +1,97 @@
+#!/bin/bash
+# L.C. Karssen
+# This script is used to test whether probabel.pl works correctly when
+# input is cut up in chunks
+
+echo "Testing probabel.pl..."
+
+if [ -z ${srcdir} ]; then
+    srcdir="."
+fi
+exampledir="${srcdir}/../examples/"
+padir="${srcdir}/../src/"
+results="${srcdir}/known_good_results/"
+
+dosefile="$exampledir/test.mldose"
+infofile="$exampledir/test.mlinfo"
+mapfile="$exampledir/test.map"
+phenofile="$exampledir/height.txt"
+outfile="height_add.out.txt"
+
+probabel="${padir}/probabel.pl"
+probabelcfg="${padir}/probabel_config.cfg.example"
+chunksep="_._chunk_._"
+chrsep="_._chr_._"
+
+# Prepare probabel.pl and the config file
+sed 's;"./";"../src/";g' $probabel > probabel.pl
+chmod a+x probabel.pl
+cp $probabelcfg probabel_config.cfg
+chmod +w probabel_config.cfg # Need this for make distcheck to work
+cp $phenofile height.PHE
+
+base="chr${chrsep}"
+echo "TestCohortNoChunk,$base.info,$base.dose,$base.prob,$base.map" \
+    >> probabel_config.cfg
+
+base="chunk${chunksep}.chr${chrsep}"
+echo "TestCohortChunk,$base.info,$base.dose,$base.prob,$base.map" \
+    >> probabel_config.cfg
+
+
+# ------------------ No chunks test -------------------
+rm -f $outfile
+
+# Split the dose and info files up into two chromosomes with some
+# chunks
+cut -d" " -f1,2,3,4 $dosefile > chr1.dose
+cut -d" " -f1,2,5-7 $dosefile > chr2.dose
+
+sed -n '1,3p' $infofile >  chr1.info
+sed -n '1p'   $infofile >  chr2.info
+sed -n '4,6p' $infofile >> chr2.info
+
+sed -n '1,3p' $mapfile > chr1.map
+sed -n '1p'   $mapfile >  chr2.map
+sed -n '4,6p' $mapfile >> chr2.map
+
+# Run an analysis
+./probabel.pl 1 2 linear TestCohortNoChunk --additive height
+
+# Final check:
+echo "Checking output without chunks:"
+diff $outfile $results/$outfile
+
+
+# ------------------ Chunks test ----------------------
+rm -f $outfile
+
+# Split the dose and info files up into two chromosomes with some
+# chunks
+cut -d" " -f1,2,3   $dosefile > chunk1.chr1.dose
+cut -d" " -f1,2,4   $dosefile > chunk2.chr1.dose
+cut -d" " -f1,2,5,6 $dosefile > chunk1.chr2.dose
+cut -d" " -f1,2,7   $dosefile > chunk2.chr2.dose
+
+sed -n '1,2p' $infofile >  chunk1.chr1.info
+sed -n '1p'   $infofile >  chunk2.chr1.info
+sed -n '3p'   $infofile >> chunk2.chr1.info
+sed -n '1p'   $infofile >  chunk1.chr2.info
+sed -n '4,5p' $infofile >> chunk1.chr2.info
+sed -n '1p'   $infofile >  chunk2.chr2.info
+sed -n '6p'   $infofile >> chunk2.chr2.info
+
+sed -n '1,2p' $mapfile >  chunk1.chr1.map
+sed -n '1p'   $mapfile >  chunk2.chr1.map
+sed -n '3p'   $mapfile >> chunk2.chr1.map
+sed -n '1p'   $mapfile >  chunk1.chr2.map
+sed -n '4,5p' $mapfile >> chunk1.chr2.map
+sed -n '1p'   $mapfile >  chunk2.chr2.map
+sed -n '6p'   $mapfile >> chunk2.chr2.map
+
+# Run an analysis
+./probabel.pl 1 2 linear TestCohortChunk --additive height
+
+# Final check:
+echo "Checking output with chunks:"
+diff $outfile $results/$outfile


Property changes on: pkg/ProbABEL/tests/check_probabel.pl_chunk.sh
___________________________________________________________________
Added: svn:executable
   + *

Added: pkg/ProbABEL/tests/known_good_results/height_add.out.txt
===================================================================
--- pkg/ProbABEL/tests/known_good_results/height_add.out.txt	                        (rev 0)
+++ pkg/ProbABEL/tests/known_good_results/height_add.out.txt	2012-10-29 12:11:27 UTC (rev 990)
@@ -0,0 +1,6 @@
+name A1 A2 Freq1 MAF Quality Rsq n Mean_predictor_allele chrom position beta_SNP_add sebeta_SNP_add loglik
+rs7247199 G A 0.5847 0.415 0.9299 0.8666 182 0.56444 1 204938 -0.218693 0.734966 -428.971
+rs8102643 C T 0.5847 0.415 0.9308 0.8685 182 0.564412 1 207859 -0.218352 0.734214 -428.971
+rs8102615 T A 0.5006 0.4702 0.9375 0.8932 181 0.498997 2 211970 0.280289 0.728449 -427.081
+rs8105536 G A 0.5783 0.4213 0.9353 0.8832 182 0.561132 2 212033 -0.216918 0.7259 -428.971
+rs2312724 T C 0.9122 0.0877 0.9841 0.9232 182 0.929684 2 217034 2.23335 1.30142 -427.523



More information about the Genabel-commits mailing list