[Genabel-commits] r2047 - tutorials/GenABEL_general
noreply at r-forge.r-project.org
noreply at r-forge.r-project.org
Wed Feb 3 14:24:03 CET 2016
Author: lckarssen
Date: 2016-02-03 14:24:02 +0100 (Wed, 03 Feb 2016)
New Revision: 2047
Modified:
tutorials/GenABEL_general/ImputedDataAnalysis.Rnw
Log:
Summary: Remove superfluous white space from the Imputed Data Analysis chapter in the GenABEL tutorial
No functional changes.
Modified: tutorials/GenABEL_general/ImputedDataAnalysis.Rnw
===================================================================
--- tutorials/GenABEL_general/ImputedDataAnalysis.Rnw 2016-01-25 10:00:15 UTC (rev 2046)
+++ tutorials/GenABEL_general/ImputedDataAnalysis.Rnw 2016-02-03 13:24:02 UTC (rev 2047)
@@ -1,15 +1,15 @@
\chapter{Analysis of imputed data: an example}
-In this chapter, you will perform an analysis of imputed data
-set. In this set of 120 individuals, 4500 SNPs are imputed
-based on information on 500 directly typed SNPs.
-You will first analyse 500 directly typed SNPs and then proceed
-to the analysis of imputed data. Finally, you will have a
-possibility to compare your results to the results of analysis
-in case all 5000 SNPs were directly typed.
+In this chapter, you will perform an analysis of imputed data
+set. In this set of 120 individuals, 4500 SNPs are imputed
+based on information on 500 directly typed SNPs.
+You will first analyse 500 directly typed SNPs and then proceed
+to the analysis of imputed data. Finally, you will have a
+possibility to compare your results to the results of analysis
+in case all 5000 SNPs were directly typed.
\section{Analysis of 500 directly typed SNPs}
-
+
Load the \GA{} library and load the data set we will use:
<<>>=
library(GenABEL)
@@ -17,8 +17,8 @@
ls()
@
-Here, \texttt{df500} contains the data including 500 directly typed
-SNPs, and \texttt{rcT} is the vector containing the value of the trait
+Here, \texttt{df500} contains the data including 500 directly typed
+SNPs, and \texttt{rcT} is the vector containing the value of the trait
of interest:
<<>>=
nids(df500)
@@ -27,32 +27,32 @@
rcT[1:10]
@
-In all analysis that follow, do disregard the Genomic Control and
-GC-corrected results: as we will analyse a small region with
-strong association, the GC can not be applied.
+In all analysis that follow, do disregard the Genomic Control and
+GC-corrected results: as we will analyse a small region with
+strong association, the GC can not be applied.
-Let us start with analysis of directly typed SNPs. For that,
-we will use \texttt{mlreg} function of \GA{}. This function
+Let us start with analysis of directly typed SNPs. For that,
+we will use \texttt{mlreg} function of \GA{}. This function
implements ML-regression and Wald test of significance\footnote{
-In general the score test may be preferred because it is faster and
+In general the score test may be preferred because it is faster and
more robust.
-}.
-This will later on allow us direct comparison with the results of \PA{},
+}.
+This will later on allow us direct comparison with the results of \PA{},
which implements the same testing procedure.
-We can run regression of \texttt{rcT} on SNPs region-wise using
+We can run regression of \texttt{rcT} on SNPs region-wise using
<<>>=
qts500 <- mlreg(rcT~1,df500)
qts500[1:5,]
@
-The summary for the SNPs, which show most significant association
-can be produced with
+The summary for the SNPs, which show most significant association
+can be produced with
<<>>=
bestHits500 <- descriptives.scan(qts500,top=10)
bestHits500
@
-and the plot of the results with
+and the plot of the results with
<<>>=
plot(qts500)
abline(h=-log10(5e-8))
@@ -63,14 +63,14 @@
plot(qts500)
abline(h=-log10(5e-8))
@
-\caption{
+\caption{
Manhattan plot for 500 directly typed SNPs
}
\label{fig:gwa500}
\end{figure}
(see figure \ref{fig:gwa500}).
-Finally, we can produce descriptive statistics for the
+Finally, we can produce descriptive statistics for the
SNPs, which demonstrated genome-wide significance:
<<>>=
gwsSnps500 <- rownames(bestHits500)[bestHits500$P1df<=6e-8]
@@ -79,61 +79,61 @@
@
\begin{Exercise}
-It is known that rare variation in the presence of
-outliers can generate spurious associations.
-Do you believe this is a true association in this
-particular case? What you can do to check whether
+It is known that rare variation in the presence of
+outliers can generate spurious associations.
+Do you believe this is a true association in this
+particular case? What you can do to check whether
this is a true association or not?
\end{Exercise}
\begin{Answer}
-Firstly, you can check (by producing a cross-plot
-of genotype vs.~phenotype) if association is indeed
-due to extreme phenotypic outliers. A related question is
-whether the distribution is skewed. Additionally,
-a permutation-based test can help establishing
-correct $p$-value, taking into account the nature
-of the data in question.
-
-However, to give an ultimate answer, a replication
-study is needed, in which these rare SNPs are to be
-typed in a large independent sample.
+Firstly, you can check (by producing a cross-plot
+of genotype vs.~phenotype) if association is indeed
+due to extreme phenotypic outliers. A related question is
+whether the distribution is skewed. Additionally,
+a permutation-based test can help establishing
+correct $p$-value, taking into account the nature
+of the data in question.
+
+However, to give an ultimate answer, a replication
+study is needed, in which these rare SNPs are to be
+typed in a large independent sample.
\end{Answer}
\section{Analysis of imputed data with \PA{}}
-Here, you will analyse imputed data. In the \texttt{RData}
-directory, you will find the necessary files:
-\texttt{mach1.mldose.fvi} and \texttt{mach1.mldose.fvd}
-(these files represent \texttt{mldose} data produced by
+Here, you will analyse imputed data. In the \texttt{RData}
+directory, you will find the necessary files:
+\texttt{mach1.mldose.fvi} and \texttt{mach1.mldose.fvd}
+(these files represent \texttt{mldose} data produced by
\texttt{mach1}, converted into \DA{} format\footnote{using \texttt{mach2databel}})
-and \texttt{mach1.out.mlinfo}, which contains information for the imputed SNPs
+and \texttt{mach1.out.mlinfo}, which contains information for the imputed SNPs
generated in the process of imputations (such as 'Rsq', etc.).
-We will start with producing a phenotypic data file for the
+We will start with producing a phenotypic data file for the
use with \PA{}:
<<>>=
write.table(data.frame(id=idnames(df500), rcT=rcT),
file="rcT.PHE", quote=FALSE, row.names=FALSE)
@
-next, try the command '\texttt{system("head rcT.PHE")}' to check the few first
+next, try the command '\texttt{system("head rcT.PHE")}' to check the few first
lines of the file.
-At this moment, leave \texttt{R} (or, rather, start new console!),
-copy the mach-files to the working directory with
+At this moment, leave \texttt{R} (or, rather, start new console!),
+copy the mach-files to the working directory with
\begin{verbatim}
yourname at server> cp RData/mach1* .
\end{verbatim}
-and run \PA{} analysis with
+and run \PA{} analysis with
\begin{verbatim}
yourname at server> palinear --pheno rcT.PHE --info mach1.out.mlinfo /
--dose mach1.mldose.fvi
\end{verbatim}
-Do not forget to check that you start the analysis in right directory, i.e.
-all files (\texttt{rcT.PHE}, \texttt{mach1.out.mlinfo},
-\texttt{mach1.mldose.fvi}, \texttt{mach1.mldose.fvd})
-are present in the working directory (use the '\texttt{ls}' command
-from the console).
+Do not forget to check that you start the analysis in right directory, i.e.
+all files (\texttt{rcT.PHE}, \texttt{mach1.out.mlinfo},
+\texttt{mach1.mldose.fvi}, \texttt{mach1.mldose.fvd})
+are present in the working directory (use the '\texttt{ls}' command
+from the console).
<<echo=false,hide=true>>=
system("cp RData/mach1* .")
@@ -147,7 +147,7 @@
qtsPal <- read.table("regression_add.out.txt", head=T, strings=F)
qtsPal[1:5,]
@
-As you see, there is not $P$-value produced in the \PA{} output, and
+As you see, there is not $P$-value produced in the \PA{} output, and
we need to compute it:
<<>>=
qtsPal$Chisq <- (qtsPal$beta/qtsPal$sebeta)^2
@@ -166,7 +166,7 @@
@
(here, \texttt{map5k} contains map information for all 5000 SNPs).
-To compare with the results obtained using 500 directly typed
+To compare with the results obtained using 500 directly typed
SNPs only, we can add the points with
<<>>=
points(map(qts500),-log10(qts500[,"P1df"]),col="red",pch=19,cex=0.5)
@@ -178,14 +178,14 @@
points(map(qts500),-log10(qts500[,"P1df"]),col="red",pch=19,cex=0.5)
abline(h=-log10(5e-8))
@
-\caption{
+\caption{
Manhattan plot for imputed (black empty circles) and 500 directly typed (red solid circles) SNPs
}
\label{fig:gwa500andImp}
\end{figure}
(see figure \ref{fig:gwa500andImp}).
-Now, we can also check different characteristics of
+Now, we can also check different characteristics of
the SNPs, which are claimed to be significant in our analysis:
<<>>=
gwsSnpsImp <- qtsPal$name[which(qtsPal[,"P-value"]<=5e-8)]
@@ -195,13 +195,13 @@
@
\begin{Exercise}
-Please classify the associations obtained into 'true' and 'false'
-(meaning 'I believe it' and 'I do not believe').
-Of cause ultimate answer is replications. Still, the object \texttt{df5k}
-provides genotypes for all 5000 SNPs, directly typed! So, you can run analysis,
+Please classify the associations obtained into 'true' and 'false'
+(meaning 'I believe it' and 'I do not believe').
+Of cause ultimate answer is replications. Still, the object \texttt{df5k}
+provides genotypes for all 5000 SNPs, directly typed! So, you can run analysis,
and cross-check which SNPs are confirmed as significant.
-Make a cross table: SNPs you thought were truly associated vs.
-SNPs indeed associated in directly typed data set.
+Make a cross table: SNPs you thought were truly associated vs.
+SNPs indeed associated in directly typed data set.
\end{Exercise}
\begin{Answer}
Here is the sequence of commands leading you to the answer:
@@ -242,7 +242,7 @@
abline(h=-log10(5e-8))
abline(v=-log10(5e-8))
@
-\caption{
+\caption{
Cross-plot of the results from analysis of imputed and directly typed data
(see answers to exercises).
}
@@ -253,29 +253,29 @@
\section{Analysis of imputed data with \MxA{}}
In case you are interested in quantitative traits, \MxA{}
-provides much faster and more flexible analysis tools,
-compared to the \PA{}\footnote{However, at the moment \MxA{}
-misses functionality to analyse binary and time-till-event
-traits}.
+provides much faster and more flexible analysis tools,
+compared to the \PA{}\footnote{However, at the moment \MxA{}
+misses functionality to analyse binary and time-till-event
+traits}.
-Here the above analysis is repeated using \MxA{}.
+Here the above analysis is repeated using \MxA{}.
Load necessary libraries:
<<>>=
library(DatABEL)
library(MixABEL)
-@
+@
Load the genotypic data:
<<>>=
imp5k <- databel("mach1.mldose")
-@
+@
Run analysis and look up 10 top results:
<<>>=
qtsImp <- GWFGLS(rcT~SNP,geno=imp5k,data=NULL)
qtsImp[order(qtsImp[,"Chisq"],decreasing=T)[1:10],]
-@
+@
Look up the information for SNPs with $P$-value less than $6e-8$:
<<>>=
@@ -291,5 +291,3 @@
\shipoutAnswer
\setcounter{Exercise}{0}
-
-
More information about the Genabel-commits
mailing list