[Genabel-commits] r2047 - tutorials/GenABEL_general

Wed Feb 3 14:24:03 CET 2016

Author: lckarssen
Date: 2016-02-03 14:24:02 +0100 (Wed, 03 Feb 2016)
New Revision: 2047

Modified:
   tutorials/GenABEL_general/ImputedDataAnalysis.Rnw
Log:
Summary: Remove superfluous white space from the Imputed Data Analysis chapter in the GenABEL tutorial


No functional changes.


Modified: tutorials/GenABEL_general/ImputedDataAnalysis.Rnw
===================================================================

--- tutorials/GenABEL_general/ImputedDataAnalysis.Rnw	2016-01-25 10:00:15 UTC (rev 2046)
+++ tutorials/GenABEL_general/ImputedDataAnalysis.Rnw	2016-02-03 13:24:02 UTC (rev 2047)
@@ -1,15 +1,15 @@
 \chapter{Analysis of imputed data: an example}
 
-In this chapter, you will perform an analysis of imputed data 
-set. In this set of 120 individuals, 4500 SNPs are imputed 
-based on information on 500 directly typed SNPs. 
-You will first analyse 500 directly typed SNPs and then proceed 
-to the analysis of imputed data. Finally, you will have a 
-possibility to compare your results to the results of analysis 
-in case all 5000 SNPs were directly typed. 
+In this chapter, you will perform an analysis of imputed data
+set. In this set of 120 individuals, 4500 SNPs are imputed
+based on information on 500 directly typed SNPs.
+You will first analyse 500 directly typed SNPs and then proceed
+to the analysis of imputed data. Finally, you will have a
+possibility to compare your results to the results of analysis
+in case all 5000 SNPs were directly typed.
 
 \section{Analysis of 500 directly typed SNPs}
- 
+
 Load the \GA{} library and load the data set we will use:
 <<>>=
 library(GenABEL)
@@ -17,8 +17,8 @@
 ls()
 @
 
-Here, \texttt{df500} contains the data including 500 directly typed 
-SNPs, and \texttt{rcT} is the vector containing the value of the trait 
+Here, \texttt{df500} contains the data including 500 directly typed
+SNPs, and \texttt{rcT} is the vector containing the value of the trait
 of interest:
 <<>>=
 nids(df500)
@@ -27,32 +27,32 @@
 rcT[1:10]
 @
 
-In all analysis that follow, do disregard the Genomic Control and 
-GC-corrected results: as we will analyse a small region with 
-strong association, the GC can not be applied. 
+In all analysis that follow, do disregard the Genomic Control and
+GC-corrected results: as we will analyse a small region with
+strong association, the GC can not be applied.
 
-Let us start with analysis of directly typed SNPs. For that, 
-we will use \texttt{mlreg} function of \GA{}. This function 
+Let us start with analysis of directly typed SNPs. For that,
+we will use \texttt{mlreg} function of \GA{}. This function
 implements ML-regression and Wald test of significance\footnote{
-In general the score test may be preferred because it is faster and 
+In general the score test may be preferred because it is faster and
 more robust.
-}. 
-This will later on allow us direct comparison with the results of \PA{}, 
+}.
+This will later on allow us direct comparison with the results of \PA{},
 which implements the same testing procedure.
 
-We can run regression of \texttt{rcT} on SNPs region-wise using 
+We can run regression of \texttt{rcT} on SNPs region-wise using
 <<>>=
 qts500 <- mlreg(rcT~1,df500)
 qts500[1:5,]
 @
 
-The summary for the SNPs, which show most significant association 
-can be produced with 
+The summary for the SNPs, which show most significant association
+can be produced with
 <<>>=
 bestHits500 <- descriptives.scan(qts500,top=10)
 bestHits500
 @
-and the plot of the results with 
+and the plot of the results with
 <<>>=
 plot(qts500)
 abline(h=-log10(5e-8))
@@ -63,14 +63,14 @@
 plot(qts500)
 abline(h=-log10(5e-8))
 @
-\caption{ 
+\caption{
 Manhattan plot for 500 directly typed SNPs
 }
 \label{fig:gwa500}
 \end{figure}
 (see figure \ref{fig:gwa500}).
 
-Finally, we can produce descriptive statistics for the 
+Finally, we can produce descriptive statistics for the
 SNPs, which demonstrated genome-wide significance:
 <<>>=
 gwsSnps500 <- rownames(bestHits500)[bestHits500$P1df<=6e-8]
@@ -79,61 +79,61 @@
 @
 
 \begin{Exercise}
-It is known that rare variation in the presence of 
-outliers can generate spurious associations. 
-Do you believe this is a true association in this 
-particular case? What you can do to check whether 
+It is known that rare variation in the presence of
+outliers can generate spurious associations.
+Do you believe this is a true association in this
+particular case? What you can do to check whether
 this is a true association or not?
 \end{Exercise}
 \begin{Answer}
-Firstly, you can check (by producing a cross-plot 
-of genotype vs.~phenotype) if association is indeed 
-due to extreme phenotypic outliers. A related question is 
-whether the distribution is skewed. Additionally, 
-a permutation-based test can help establishing 
-correct $p$-value, taking into account the nature 
-of the data in question.  
- 
-However, to give an ultimate answer, a replication 
-study is needed, in which these rare SNPs are to be 
-typed in a large independent sample. 
+Firstly, you can check (by producing a cross-plot
+of genotype vs.~phenotype) if association is indeed
+due to extreme phenotypic outliers. A related question is
+whether the distribution is skewed. Additionally,
+a permutation-based test can help establishing
+correct $p$-value, taking into account the nature
+of the data in question.
+
+However, to give an ultimate answer, a replication
+study is needed, in which these rare SNPs are to be
+typed in a large independent sample.
 \end{Answer}
 
 
 \section{Analysis of imputed data with \PA{}}
 
-Here, you will analyse imputed data. In the \texttt{RData} 
-directory, you will find the necessary files: 
-\texttt{mach1.mldose.fvi} and \texttt{mach1.mldose.fvd} 
-(these files represent \texttt{mldose} data produced by 
+Here, you will analyse imputed data. In the \texttt{RData}
+directory, you will find the necessary files:
+\texttt{mach1.mldose.fvi} and \texttt{mach1.mldose.fvd}
+(these files represent \texttt{mldose} data produced by
 \texttt{mach1}, converted into \DA{} format\footnote{using \texttt{mach2databel}})
-and \texttt{mach1.out.mlinfo}, which contains information for the imputed SNPs 
+and \texttt{mach1.out.mlinfo}, which contains information for the imputed SNPs
 generated in the process of imputations (such as 'Rsq', etc.).
 
-We will start with producing a phenotypic data file for the 
+We will start with producing a phenotypic data file for the
 use with \PA{}:
 <<>>=
 write.table(data.frame(id=idnames(df500), rcT=rcT),
        file="rcT.PHE", quote=FALSE, row.names=FALSE)
 @
-next, try the command '\texttt{system("head rcT.PHE")}' to check the few first 
+next, try the command '\texttt{system("head rcT.PHE")}' to check the few first
 lines of the file.
 
-At this moment, leave \texttt{R} (or, rather, start new console!), 
-copy the mach-files to the working directory with  
+At this moment, leave \texttt{R} (or, rather, start new console!),
+copy the mach-files to the working directory with
 \begin{verbatim}
 yourname at server> cp RData/mach1* .
 \end{verbatim}
-and run \PA{} analysis with 
+and run \PA{} analysis with
 \begin{verbatim}
 yourname at server> palinear --pheno rcT.PHE --info mach1.out.mlinfo /
                  --dose mach1.mldose.fvi
 \end{verbatim}
-Do not forget to check that you start the analysis in right directory, i.e. 
-all files (\texttt{rcT.PHE}, \texttt{mach1.out.mlinfo}, 
-\texttt{mach1.mldose.fvi}, \texttt{mach1.mldose.fvd}) 
-are present in the working directory (use the '\texttt{ls}' command 
-from the console). 
+Do not forget to check that you start the analysis in right directory, i.e.
+all files (\texttt{rcT.PHE}, \texttt{mach1.out.mlinfo},
+\texttt{mach1.mldose.fvi}, \texttt{mach1.mldose.fvd})
+are present in the working directory (use the '\texttt{ls}' command
+from the console).
 
 <<echo=false,hide=true>>=
 system("cp RData/mach1* .")
@@ -147,7 +147,7 @@
 qtsPal <- read.table("regression_add.out.txt", head=T, strings=F)
 qtsPal[1:5,]
 @
-As you see, there is not $P$-value produced in the \PA{} output, and 
+As you see, there is not $P$-value produced in the \PA{} output, and
 we need to compute it:
 <<>>=
 qtsPal$Chisq <- (qtsPal$beta/qtsPal$sebeta)^2
@@ -166,7 +166,7 @@
 @
 (here, \texttt{map5k} contains map information for all 5000 SNPs).
 
-To compare with the results obtained using 500 directly typed 
+To compare with the results obtained using 500 directly typed
 SNPs only, we can add the points with
 <<>>=
 points(map(qts500),-log10(qts500[,"P1df"]),col="red",pch=19,cex=0.5)
@@ -178,14 +178,14 @@
 points(map(qts500),-log10(qts500[,"P1df"]),col="red",pch=19,cex=0.5)
 abline(h=-log10(5e-8))
 @
-\caption{ 
+\caption{
 Manhattan plot for imputed (black empty circles) and 500 directly typed (red solid circles) SNPs
 }
 \label{fig:gwa500andImp}
 \end{figure}
 (see figure \ref{fig:gwa500andImp}).
 
-Now, we can also check different characteristics of 
+Now, we can also check different characteristics of
 the SNPs, which are claimed to be significant in our analysis:
 <<>>=
 gwsSnpsImp <- qtsPal$name[which(qtsPal[,"P-value"]<=5e-8)]
@@ -195,13 +195,13 @@
 @
 
 \begin{Exercise}
-Please classify the associations obtained into 'true' and 'false' 
-(meaning 'I believe it' and 'I do not believe'). 
-Of cause ultimate answer is replications. Still, the object \texttt{df5k} 
-provides genotypes for all 5000 SNPs, directly typed! So, you can run analysis, 
+Please classify the associations obtained into 'true' and 'false'
+(meaning 'I believe it' and 'I do not believe').
+Of cause ultimate answer is replications. Still, the object \texttt{df5k}
+provides genotypes for all 5000 SNPs, directly typed! So, you can run analysis,
 and cross-check which SNPs are confirmed as significant.
-Make a cross table: SNPs you thought were truly associated vs. 
-SNPs indeed associated in directly typed data set. 
+Make a cross table: SNPs you thought were truly associated vs.
+SNPs indeed associated in directly typed data set.
 \end{Exercise}
 \begin{Answer}
 Here is the sequence of commands leading you to the answer:
@@ -242,7 +242,7 @@
 abline(h=-log10(5e-8))
 abline(v=-log10(5e-8))
 @
-\caption{ 
+\caption{
 Cross-plot of the results from analysis of imputed and directly typed data
 (see answers to exercises).
 }
@@ -253,29 +253,29 @@
 \section{Analysis of imputed data with \MxA{}}
 
 In case you are interested in quantitative traits, \MxA{}
-provides much faster and more flexible analysis tools, 
-compared to the \PA{}\footnote{However, at the moment \MxA{} 
-misses functionality to analyse binary and time-till-event 
-traits}. 
+provides much faster and more flexible analysis tools,
+compared to the \PA{}\footnote{However, at the moment \MxA{}
+misses functionality to analyse binary and time-till-event
+traits}.
 
-Here the above analysis is repeated using \MxA{}. 
+Here the above analysis is repeated using \MxA{}.
 
 Load necessary libraries:
 <<>>=
 library(DatABEL)
 library(MixABEL)
-@ 
+@
 
 Load the genotypic data:
 <<>>=
 imp5k <- databel("mach1.mldose")
-@ 
+@
 
 Run analysis and look up 10 top results:
 <<>>=
 qtsImp <- GWFGLS(rcT~SNP,geno=imp5k,data=NULL)
 qtsImp[order(qtsImp[,"Chisq"],decreasing=T)[1:10],]
-@ 
+@
 
 Look up the information for SNPs with $P$-value less than $6e-8$:
 <<>>=
@@ -291,5 +291,3 @@
 \shipoutAnswer
 
 \setcounter{Exercise}{0}
-
-