[Genabel-commits] r2004 - tutorials/GenABEL_general
noreply at r-forge.r-project.org
noreply at r-forge.r-project.org
Wed Jul 8 22:12:46 CEST 2015
Author: lckarssen
Date: 2015-07-08 22:12:45 +0200 (Wed, 08 Jul 2015)
New Revision: 2004
Modified:
tutorials/GenABEL_general/workgwaaclass.Rnw
Log:
Summary: Removing unnecessary white space from the Introduction to the GenABEL-package chapter of the GenABEL tutorial.
Modified: tutorials/GenABEL_general/workgwaaclass.Rnw
===================================================================
--- tutorials/GenABEL_general/workgwaaclass.Rnw 2015-07-08 20:05:22 UTC (rev 2003)
+++ tutorials/GenABEL_general/workgwaaclass.Rnw 2015-07-08 20:12:45 UTC (rev 2004)
@@ -1,12 +1,12 @@
\chapter{Introduction to the \GA{}}
\label{sec:workgwaaclass}
-In this section, you will become familiar with the
+In this section, you will become familiar with the
\GA{} library, designed for GWA analysis. Compared to the
-\texttt{genetics} package, it provides specific
+\texttt{genetics} package, it provides specific
facilities for storage and manipulation of large amounts
-of data, very fast tests for GWA analysis, and special
-functions to analyse and graphically present the
+of data, very fast tests for GWA analysis, and special
+functions to analyse and graphically present the
results of GWA analysis (thus "analysis of analysis").
Start R and load the \GA{} library using the command
@@ -21,90 +21,90 @@
\section{General description of \texttt{gwaa.data-class}}
\label{subs:gwaaclass}
-The object you have loaded, \texttt{srdta}, belongs to the
+The object you have loaded, \texttt{srdta}, belongs to the
\texttt{gwaa.data} class. This is a special R class developed to facilitate
-GWA analysis.
+GWA analysis.
-In GWA analysis, different types of data are used. These
-include the phenotypic and genotypic data on the study
-participants and chromosome and location of every SNP.
-For every SNP, it is desirable to know the details of
+In GWA analysis, different types of data are used. These
+include the phenotypic and genotypic data on the study
+participants and chromosome and location of every SNP.
+For every SNP, it is desirable to know the details of
coding (how are the alleles coded? -- A, T, G, C? -- as well as
the strand -- '+' or '-', 'top' or 'bot'? -- this coding is for).
-One could attempt to store all phenotypes and genotypes
-together in a single table, using, \eg one row per study subject;
-than the columns will correspond to study phenotypes and SNPs.
+One could attempt to store all phenotypes and genotypes
+together in a single table, using, \eg one row per study subject;
+than the columns will correspond to study phenotypes and SNPs.
For a typical GWA data set, this would lead to a table of few thousands
-rows and few hundreds of thousands to millions of columns. Such a format is generated
-when one downloads HapMap data for a region.
-To store GWA data in such tables internally, within R,
-proves to be inefficient. In \GA{}, a special data class,
+rows and few hundreds of thousands to millions of columns. Such a format is generated
+when one downloads HapMap data for a region.
+To store GWA data in such tables internally, within R,
+proves to be inefficient. In \GA{}, a special data class,
\texttt{gwaa.data-class} is used to store GWA data.
-You may consider an object of \texttt{gwaa.data-class} as
-a 'black box' from which you can get specific data using
-specific functions. If you are interested in the internal
-structure of the \texttt{gwaa.data-class}, you can find
+You may consider an object of \texttt{gwaa.data-class} as
+a 'black box' from which you can get specific data using
+specific functions. If you are interested in the internal
+structure of the \texttt{gwaa.data-class}, you can find
the description in section \ref{subs:gwaaclass_internal}
-(\nameref{subs:gwaaclass_internal}).
+(\nameref{subs:gwaaclass_internal}).
-The data frame, which contains all phenotypic data in the
-study may be accessed using the \texttt{phdata} function.
-Let us have a look at the first few rows of the phenotypic data
+The data frame, which contains all phenotypic data in the
+study may be accessed using the \texttt{phdata} function.
+Let us have a look at the first few rows of the phenotypic data
frame of \texttt{srdta}:
<<>>=
phdata(srdta)[1:5,]
@
-The rows of this data frame correspond to th estudy subjects, and
-the columns correspond to the variables. There are two default
-variables, which are always present in \texttt{phdata}.
-The first of these is ``id'', which contains study subject
-identification code. This identification code can be arbitrary
-character, numer, or alphanumeric combination,
+The rows of this data frame correspond to th estudy subjects, and
+the columns correspond to the variables. There are two default
+variables, which are always present in \texttt{phdata}.
+The first of these is ``id'', which contains study subject
+identification code. This identification code can be arbitrary
+character, numer, or alphanumeric combination,
but every person must be coded with an unique ID.
-The second default variable is ``sex'', where males are coded
+The second default variable is ``sex'', where males are coded
with ones (``1'') and females are coded with zero (``0'').
-It is important to understand that this data frame is not
-supposed to be directly modified by the user, as its
+It is important to understand that this data frame is not
+supposed to be directly modified by the user, as its
structure is coupled to the structure of genotypic data.
If at some point you need to manipulate (add/delete) the phenotypes
-included in \texttt{phdata}, you need to use such \GA{} functions
-as \texttt{add.phdata} and \texttt{del.phdata}
-(see section \ref{subsec:modify_phdata}).
+included in \texttt{phdata}, you need to use such \GA{} functions
+as \texttt{add.phdata} and \texttt{del.phdata}
+(see section \ref{subsec:modify_phdata}).
-%In particular,
-%it is extremely important to remember that one should not
-%directly add subjects to the table, change the values of
-%''id'' and ''sex'', and change the order of subjects in
-%\texttt{phdata} unless this one is really understands
-%the way \GA{} works. One also should not run such data
-%manipulation functions as \texttt{merge}, \texttt{cbind}
-%and \texttt{rbind} -- exactly because they may change the
-%number of subjects or interfere with the order. On the other
-%hand, it is OK to add more variables to the data frame
-%through direct computations, for example, if one wishes to
-%add computed body mass index, it is OK to run the command
-%like
+%In particular,
+%it is extremely important to remember that one should not
+%directly add subjects to the table, change the values of
+%''id'' and ''sex'', and change the order of subjects in
+%\texttt{phdata} unless this one is really understands
+%the way \GA{} works. One also should not run such data
+%manipulation functions as \texttt{merge}, \texttt{cbind}
+%and \texttt{rbind} -- exactly because they may change the
+%number of subjects or interfere with the order. On the other
+%hand, it is OK to add more variables to the data frame
+%through direct computations, for example, if one wishes to
+%add computed body mass index, it is OK to run the command
+%like
The other part of an object of \texttt{gwaa.data-class} is
-\texttt{gtdata}, which contains
-all GWA genetic information in an object of class
-\texttt{snp.data} class. It is not supposed to be
-modified directly by user. The genotypic data can
-be accessed through the \texttt{gtdata} function, \eg
+\texttt{gtdata}, which contains
+all GWA genetic information in an object of class
+\texttt{snp.data} class. It is not supposed to be
+modified directly by user. The genotypic data can
+be accessed through the \texttt{gtdata} function, \eg
<<>>=
gtdata(srdta[1:10, 1:10])
@
-As you can see, these data are of little direct use
-as these are stored in an internal format --
-you need to coerce that to another data type if you
-want to manipulate/analyse these data using
+As you can see, these data are of little direct use
+as these are stored in an internal format --
+you need to coerce that to another data type if you
+want to manipulate/analyse these data using
non-\GA{} functions (see section \ref{subs:gwaa}).
-The number of individuals described in an
-object of \texttt{gwaa.data-class} can be
+The number of individuals described in an
+object of \texttt{gwaa.data-class} can be
accessed through \texttt{nids} function, \eg
<<>>=
nids(srdta)
@@ -115,31 +115,31 @@
nsnps(srdta)
@
-The IDs of the individuals included in the study can be
-accessed via the \texttt{idnames} function, for example
+The IDs of the individuals included in the study can be
+accessed via the \texttt{idnames} function, for example
the IDs of the first 7 individuals in the study are
<<>>=
idnames(srdta)[1:7]
@
-The sex of the individuals can be accessed using the \texttt{male}
+The sex of the individuals can be accessed using the \texttt{male}
function:
<<>>=
male(srdta)[1:7]
@
-where males (heterogametic sex) are assigned with ``1'' and
+where males (heterogametic sex) are assigned with ``1'' and
a homogametic sex (females) are assigned the value ``0''.
-Names of SNPs can be
-accessed using the \texttt{snpnames} function; for
-example the names of the first 10 SNPs in the
+Names of SNPs can be
+accessed using the \texttt{snpnames} function; for
+example the names of the first 10 SNPs in the
\texttt{srdta} are
<<>>=
snpnames(srdta)[1:10]
@
SNP annotation includes (shown for the first 10 SNPs only):
-\begin{itemize}
+\begin{itemize}
\item Chromosome:
<<>>=
chromosome(srdta)[1:10]
@@ -152,18 +152,18 @@
<<>>=
coding(srdta)[1:10]
@
-For every SNP, the coding is represented with a pair of characters,
-for example ``AG''. For an ``AG'' polymorphism, you may expect
-``AA'', ``AG'' and ``GG'' genotypes to be found in your population.
-The order (that is ``AG'' vs.~``GA'') is important --
-the first allele reported is the one
-which will be used as a reference in association analysis, and
-thus the effects are reported for the second allele. You can
+For every SNP, the coding is represented with a pair of characters,
+for example ``AG''. For an ``AG'' polymorphism, you may expect
+``AA'', ``AG'' and ``GG'' genotypes to be found in your population.
+The order (that is ``AG'' vs.~``GA'') is important --
+the first allele reported is the one
+which will be used as a reference in association analysis, and
+thus the effects are reported for the second allele. You can
also access the reference allele with the method \texttt{refallele}
<<>>=
refallele(srdta)[1:10]
@
-and the effective (or 'coded') allelel with
+and the effective (or 'coded') allelel with
<<>>=
effallele(srdta)[1:10]
@
@@ -177,15 +177,15 @@
\begin{summary}
\item \GA{} uses a special data class, \texttt{gwaa.data-class}, to
store GWA data.
-\item To access the content of an object of \texttt{gwaa.data-class},
+\item To access the content of an object of \texttt{gwaa.data-class},
a number of functions is used
\end{summary}
\begin{Exercise}[title=Exploring IDs in \texttt{srdta}]
-Explore \texttt{srdta}.
-\Question How many people are included in the study?
-\Question How many of these are males?
-\Question How many are females?
+Explore \texttt{srdta}.
+\Question How many people are included in the study?
+\Question How many of these are males?
+\Question How many are females?
\Question What is male proportion?
\end{Exercise}
\begin{Answer}
@@ -206,7 +206,7 @@
nids(srdta) - sum(male(srdta))
@
\ldots you could get the same answer like this\footnote{
-This is something covered later in the section \ref{subs:gwaa_subs}
+This is something covered later in the section \ref{subs:gwaa_subs}
("\nameref{subs:gwaa_subs}")
}:
<<>>=
@@ -224,9 +224,9 @@
%\noindent\begin{ex}\label{ex2}
-%In this exercise, you will explore the vector representing
-%the sex of study subjects in \texttt{srdta} example data
-%set supplied together with \GA{}. First, you need to load
+%In this exercise, you will explore the vector representing
+%the sex of study subjects in \texttt{srdta} example data
+%set supplied together with \GA{}. First, you need to load
%\GA{} by typing
%<<results=hide>>=
%library(GenABEL)
@@ -235,26 +235,26 @@
%<<>=
%data(srdta)
%@
-%The vector containing study subjects sex can be accessed
-%through \texttt{srdta at gtdata@male}; this vector's value
-%is one when the corresponding person is male and zero
-%otherwise.
-%Explore \texttt{srdta}.
+%The vector containing study subjects sex can be accessed
+%through \texttt{srdta at gtdata@male}; this vector's value
+%is one when the corresponding person is male and zero
+%otherwise.
+%Explore \texttt{srdta}.
%\begin{enumerate}
-%\item What is the ID and sex of the first person in the data set?
-%\item Of the 22nd person?
-%\item How many males are observed among first hundred subjects?
+%\item What is the ID and sex of the first person in the data set?
+%\item Of the 22nd person?
+%\item How many males are observed among first hundred subjects?
%\item How many FEMALES are among 4th hundred?
%\item What is the male proportion in first 1000 people?
%\item What is the FEMALE proportion in second 1000 (1001:2000) people?
-%\item What is name, chromosome and map position of 33rd maker?
+%\item What is name, chromosome and map position of 33rd maker?
%\item What is distance between markers 25 and 26?
%\end{enumerate}
%\end{ex}
\begin{Exercise}[title=Exploring SNPs in \texttt{srdta}]%\label{ex2.1}
-Explore the SNPs contained in \texttt{srdta} using the functions to
-access SNP names (\texttt{snpnames}) and map (\texttt{map})
+Explore the SNPs contained in \texttt{srdta} using the functions to
+access SNP names (\texttt{snpnames}) and map (\texttt{map})
location
\Question What are names of markers located after 2,490,000 b.p.?
\Question Between 1,100,000 and 1,105,000 b.p.?
@@ -265,7 +265,7 @@
vec <- (map(srdta) > 2490000)
snpnames(srdta)[vec]
@
-The names of markers located
+The names of markers located
between 1,100,000 and 1,105,000 b.p.~are:
<<>>=
vec <- (map(srdta) > 1100000 & map(srdta) < 1105000)
@@ -279,13 +279,13 @@
\section{Accessing and modifying phenotypic data}
\label{subsec:modify_phdata}
-As was already mentioned, the object returned by \texttt{phdata}
-contains phenotypic data and is
-a conventional data frame, wich obligatory includes
-'id' and 'sex' variables, and ordered an a way that it
-couples to the genotypic data.
+As was already mentioned, the object returned by \texttt{phdata}
+contains phenotypic data and is
+a conventional data frame, wich obligatory includes
+'id' and 'sex' variables, and ordered an a way that it
+couples to the genotypic data.
-Being a data frame, \texttt{phdata} may be accessed using
+Being a data frame, \texttt{phdata} may be accessed using
the corresponding methods:
<<>>=
phdata(srdta)[1:5, ]
@@ -295,14 +295,14 @@
phdata(srdta)$sex[1:5]
@
-The modification of the phenotypic data is
-performed using special methods, because
-of specific restrictions on phenotypic
-data frames. There are two main functions which
-allow you to add (\texttt{add.phdata}) and
-delete (\texttt{del.phdata}) phenotypes
-from \texttt{phdata} part of an object of
-\texttt{gwaa.data-class}.
+The modification of the phenotypic data is
+performed using special methods, because
+of specific restrictions on phenotypic
+data frames. There are two main functions which
+allow you to add (\texttt{add.phdata}) and
+delete (\texttt{del.phdata}) phenotypes
+from \texttt{phdata} part of an object of
+\texttt{gwaa.data-class}.
For example, if you want to add a variable (say, the square of age)
computed from the ``age'' variable of \texttt{srdta}
@@ -316,16 +316,16 @@
phdata(srdta)[1:5, ]
@
-You can add more than one variable at once using the
-same function, however, in this case the second (``newph'')
-argument of the function should be a data frame,
+You can add more than one variable at once using the
+same function, however, in this case the second (``newph'')
+argument of the function should be a data frame,
which contains an 'id' variable specifing the IDs of the
-individuals. Imagine we have the data for individuals
-'p1', 'p2' and 'p7' (we will generate random data
+individuals. Imagine we have the data for individuals
+'p1', 'p2' and 'p7' (we will generate random data
for them; pay attention only to the result):
<<>>=
newvalues <- matrix( rnorm(3*5), 3, 5 )
-newdata <- data.frame(id=c("p1", "p2", "p7"),
+newdata <- data.frame(id=c("p1", "p2", "p7"),
ph1=1, ph2=1, ph3=1, ph4=1, ph5=1)
newdata[, c(2:6)] <- newvalues
newdata
@@ -337,21 +337,21 @@
phdata(srdta)[1:10, ]
@
-Finally, if you need, you can delete some phenotypes
-from the \texttt{phdata} using \texttt{del.phdata}
-function. Let us delete the phenotypes we have just
+Finally, if you need, you can delete some phenotypes
+from the \texttt{phdata} using \texttt{del.phdata}
+function. Let us delete the phenotypes we have just
added:
<<>>=
-srdta <- del.phdata(srdta,
+srdta <- del.phdata(srdta,
c("age_squared", "ph1", "ph2", "ph3", "ph4", "ph5"))
phdata(srdta)[1:10, ]
@
\begin{summary}
-\item Phenotypic data contained in an object of \texttt{gwaa.data-class}
+\item Phenotypic data contained in an object of \texttt{gwaa.data-class}
can be accessed using the \texttt{phdata} functions
-\item You can add phenotypes using the \texttt{add.phdata} function
-\item You can delete phenotypes using the \texttt{del.phdata} function
+\item You can add phenotypes using the \texttt{add.phdata} function
+\item You can delete phenotypes using the \texttt{del.phdata} function
\end{summary}
@@ -359,38 +359,38 @@
\section{Sub-setting and coercing gwaa.data}
\label{subs:gwaa_subs}
-It is possible to sub-set the object, which stores the GWA data
-in the manner similar to that used for conventional \texttt{R}
-matrices and data frames. Very primitively,
-you may think of
-an object of class \texttt{gwaa.data} as a matrix whose rows
-correspond to study subjects and columns correspond to SNPs studied
-(though the actual object is more complicated). For example,
-if we would like to investigate the
-content of \texttt{srdta} for the first 5 people and 3 SNPs, we can
+It is possible to sub-set the object, which stores the GWA data
+in the manner similar to that used for conventional \texttt{R}
+matrices and data frames. Very primitively,
+you may think of
+an object of class \texttt{gwaa.data} as a matrix whose rows
+correspond to study subjects and columns correspond to SNPs studied
+(though the actual object is more complicated). For example,
+if we would like to investigate the
+content of \texttt{srdta} for the first 5 people and 3 SNPs, we can
run
<<>>=
ssubs <- srdta[1:5, 1:3]
class(ssubs)
@
-As you can see, by sub-setting we obtained a smaller object of
-\texttt{gwaa.data-class}. The two major parts it contains are
-phenotypic data, which can be accessed through \texttt{phdata}
+As you can see, by sub-setting we obtained a smaller object of
+\texttt{gwaa.data-class}. The two major parts it contains are
+phenotypic data, which can be accessed through \texttt{phdata}
(discussed in section \ref{subsec:modify_phdata}):
<<>>=
phdata(ssubs)
@
-and genotypc data, which can be accessed via the \texttt{gtdata}
+and genotypc data, which can be accessed via the \texttt{gtdata}
function:
<<>>=
gtdata(ssubs)
@
whose content is not quite straightforward to read.
-To get human-readable information, a genotypic object
-should be coerced to a regular \texttt{R} data type,
-\eg character, using the
+To get human-readable information, a genotypic object
+should be coerced to a regular \texttt{R} data type,
+\eg character, using the
\texttt{as.character()} function:
<<>>=
as.character(gtdata(ssubs))
@@ -400,67 +400,67 @@
<<>>=
as.numeric(gtdata(ssubs))
@
-Note that conversion to numeric happened according to the
+Note that conversion to numeric happened according to the
underlying genotype and the rules specified by SNP coding:
<<>>=
coding(ssubs)
@
--- the genotype, which is made of the 'first'
-allele of the 'code' is converted to ``0'',
-the heterozygote to ``1'' and a homozygote
+-- the genotype, which is made of the 'first'
+allele of the 'code' is converted to ``0'',
+the heterozygote to ``1'' and a homozygote
for the second allele is converted to ``2''.
-For example, when the coding is ``GA'', as it is for the \texttt{rs18}
+For example, when the coding is ``GA'', as it is for the \texttt{rs18}
(the second SNP), homozygotes for the first allele, as
specified by coding (``G'')
-are converted to zeros (``0''), heterozygotes are converted to
-ones (``1''), and homozygotes for the second allele (``A'') are
-converted to twos (``2''). Clearly, when numerically converted
-data are used for association analysis, the effects will be
-estimated for the second allele, while first will be used as
+are converted to zeros (``0''), heterozygotes are converted to
+ones (``1''), and homozygotes for the second allele (``A'') are
+converted to twos (``2''). Clearly, when numerically converted
+data are used for association analysis, the effects will be
+estimated for the second allele, while first will be used as
a reference.
-Genotypic data converted to standard \texttt{R} format
-can be used in any further analysis.
+Genotypic data converted to standard \texttt{R} format
+can be used in any further analysis.
-Several useful genetic analysis libraries were developed for R.
-These include \texttt{genetics} (analysis of linkage disequilibrium and
-many other useful functions) and \texttt{haplo.stats} (analysis of
-association between traits and haplotypes). These use their own
-genetic data formats.
+Several useful genetic analysis libraries were developed for R.
+These include \texttt{genetics} (analysis of linkage disequilibrium and
+many other useful functions) and \texttt{haplo.stats} (analysis of
+association between traits and haplotypes). These use their own
+genetic data formats.
-One can translate \GA{} genetic data
+One can translate \GA{} genetic data
to the format used by "genetics" library using \texttt{as.genotype()}:
<<>>=
as.genotype(gtdata(ssubs))
@
-To translate \GA{} data to the format used by "haplo.stats" you
+To translate \GA{} data to the format used by "haplo.stats" you
can use function \texttt{as.hsgeno()}
<<>>=
as.hsgeno(gtdata(ssubs))
@
Actually, most users will not need the latter function, as \GA{}
-provides a functional interface to "haplo.stats" (such \GA{} functions
+provides a functional interface to "haplo.stats" (such \GA{} functions
as \texttt{scan.haplo()} and \texttt{scan.haplo.2D()}).
-It is possible to select sub-sets of \texttt{gwaa.data-class} based not
-only on index (\eg first 10 people and SNP number 33), but also based
-on names.
+It is possible to select sub-sets of \texttt{gwaa.data-class} based not
+only on index (\eg first 10 people and SNP number 33), but also based
+on names.
-For example, if we would like to retrieve phenotypic data on people
+For example, if we would like to retrieve phenotypic data on people
with IDs "p141", "p147" and "p2000", we can use
<<>>=
phdata( srdta[c("p141", "p147", "p2000"), ] )
@
here, the first part of expression sub-sets \texttt{srdta} on
-selected IDs, and the second tells which part of the retrieved
-sub-set we want to see. You can try
-\texttt{srdta[c("p141", "p147", "p2000"),]}, but be prepared to
-see long output, as all information will be reported.
+selected IDs, and the second tells which part of the retrieved
+sub-set we want to see. You can try
+\texttt{srdta[c("p141", "p147", "p2000"),]}, but be prepared to
+see long output, as all information will be reported.
-In a similar manner, we can also select on SNP name. For example,
+In a similar manner, we can also select on SNP name. For example,
if we are interested to see information on SNPs "rs10" and "rs29"
for above people, we can run
<<>>=
@@ -468,12 +468,12 @@
gtdata(srdta[c("p141", "p147", "p2000"), c("rs10", "rs29")])
@
-To see the actual genotypes for the above three people and
+To see the actual genotypes for the above three people and
two SNPs, use
<<>>=
as.character(srdta[c("p141", "p147", "p2000"), c("rs10", "rs29")])
@
-or
+or
<<>>=
as.numeric(srdta[c("p141", "p147", "p2000"), c("rs10", "rs29")])
@
@@ -484,36 +484,36 @@
\Question What is the frequency of \textbf{non-reference} (``effect'') allele in total sample?
\Question What is the frequency of the effect allele in males?
\Question What is the frequency of the effect allele in females?
-\Question What is the frequency of the \textbf{reference} allele
+\Question What is the frequency of the \textbf{reference} allele
in the total sample, males and females?
\end{Exercise}
\begin{Answer}
-To learn what allele of ``rs114'' is the reference
+To learn what allele of ``rs114'' is the reference
you need to run
<<>>=
coding(srdta)["rs114"]
@
-Here, the first (``\Sexpr{strsplit(coding(srdta)["rs114"], "")[[1]][1]}'')
-allele is the reference and thus
-the second (``\Sexpr{strsplit(coding(srdta)["rs114"], "")[[1]][2]}'')
-is the effect allele. Remember that when using the \texttt{as.numeric}
-function to convert the genotypes to human-readable and \texttt{R}-operatable
-format, the homozygotes for reference will be coded as ``0'', heterozygotes
+Here, the first (``\Sexpr{strsplit(coding(srdta)["rs114"], "")[[1]][1]}'')
+allele is the reference and thus
+the second (``\Sexpr{strsplit(coding(srdta)["rs114"], "")[[1]][2]}'')
+is the effect allele. Remember that when using the \texttt{as.numeric}
+function to convert the genotypes to human-readable and \texttt{R}-operatable
+format, the homozygotes for reference will be coded as ``0'', heterozygotes
as ``1'' and the non-reference (``effect'') homozygotes will be coded as ``2'':
<<>>=
-table(as.character( gtdata(srdta[, "rs114"] )),
+table(as.character( gtdata(srdta[, "rs114"] )),
as.numeric( gtdata(srdta[,"rs114"] )))
@
-To compute the frequency of the effect allele of SNP ``rs114'' in the total
-sample, you can go two ways. First, we can try to take a sum
-of all ``rs114'' genotypes and divide it by twice the number of
+To compute the frequency of the effect allele of SNP ``rs114'' in the total
+sample, you can go two ways. First, we can try to take a sum
+of all ``rs114'' genotypes and divide it by twice the number of
people:
<<>>=
a <- as.numeric(gtdata(srdta[, "rs114"]))
sum(a)
@
-This, however, returns \texttt{NA}, because some of the genotypes are
+This, however, returns \texttt{NA}, because some of the genotypes are
missing. We can deal with this problem by running \texttt{sum()}
with the option \texttt{na.rm=TRUE}:
<<>>=
@@ -521,13 +521,13 @@
@
so the number of ``effect'' alleles is \Sexpr{sum(a, na.rm=TRUE)}.
-However, now we do not know
-what was the number of people for whom the genotype was
+However, now we do not know
+what was the number of people for whom the genotype was
measured! -- \texttt{nids} would return the \emph{total}
-number of people, but not the number of those measured for
-``rs114''.
+number of people, but not the number of those measured for
+``rs114''.
-This problem can be dealt with through using the \texttt{is.na(A)}
+This problem can be dealt with through using the \texttt{is.na(A)}
function which returns true when some element of \texttt{A} is not measured.
Thus, the number of people with measured genotype for ``rs114'' is
<<>>=
@@ -535,20 +535,20 @@
nmeasured <- sum(!is.na(a))
nmeasured
@
-(note the ``!'' before \texttt{is.na}, which means NOT, so we get
-these elements which are not \texttt{NA}).
+(note the ``!'' before \texttt{is.na}, which means NOT, so we get
+these elements which are not \texttt{NA}).
Consequently, the frequency of the ``effect'' allele is
<<>>=
sum(a, na.rm=TRUE) / (2 * nmeasured)
@
-An easier way would be to compute mean value of ``rs114'' with the
+An easier way would be to compute mean value of ``rs114'' with the
\texttt{mean( \ldots , na.rm=TRUE)} function and divide it by 2:
<<>>=
mean(a, na.rm=TRUE)/2
@
-To compute the frequency of the effect allele of ``rs114'' in males,
+To compute the frequency of the effect allele of ``rs114'' in males,
you can use
<<>>=
amale <- as.numeric( gtdata(srdta[male(srdta)==1, "rs114"]) )
@@ -561,7 +561,7 @@
mean(afemale, na.rm=TRUE)/2
@
-The frequencies of the reference allele are computed very simply as
+The frequencies of the reference allele are computed very simply as
one minus the frequency of the effective allele:
<<>>=
1 - mean(a, na.rm=TRUE)/2
@@ -573,25 +573,25 @@
\begin{summary}
-\item It is possible to obtain subsets of objects of
+\item It is possible to obtain subsets of objects of
\texttt{gwaa.data-class} and \texttt{snp.data-class}
-using the standard 2D sub-setting model \texttt{[i, j]},
-where \texttt{i} corresponds to study subjects and
+using the standard 2D sub-setting model \texttt{[i, j]},
+where \texttt{i} corresponds to study subjects and
\texttt{j} corresponds to SNPs.
-\item It is possible to provide ID and SNP names instead
-of indexes when sub-setting an object of class
+\item It is possible to provide ID and SNP names instead
+of indexes when sub-setting an object of class
\texttt{gwaa.data-class}.
-\item The function \texttt{as.numeric()} converts
+\item The function \texttt{as.numeric()} converts
genotypic data from \texttt{snp.data-class} to
-regular integer numbers, which can be used in analysis
+regular integer numbers, which can be used in analysis
with R.
-\item The function \texttt{as.character()} converts
+\item The function \texttt{as.character()} converts
genotypic data from \texttt{snp.data-class} to
character format.
-\item The function \texttt{as.genotype()} converts
+\item The function \texttt{as.genotype()} converts
genotypic data from \texttt{snp.data-class} to
the format used by the \texttt{genetics} library.
-\item The function \texttt{as.hsgeno()} converts
+\item The function \texttt{as.hsgeno()} converts
genotypic data from \texttt{snp.data-class} to
the format used by the \texttt{haplo.stats} library.
\end{summary}
@@ -602,9 +602,9 @@
Implementation of the function \texttt{summary()} to
summarize the genotypic part of a \texttt{gwaa.data-class} object
-is very useful in genetic data
-exploration and quality control (QC).
-Let us try application of this function to the
+is very useful in genetic data
+exploration and quality control (QC).
+Let us try application of this function to the
\texttt{ssubs}:
<<>>=
a <- summary(ssubs)
@@ -612,27 +612,27 @@
@
In the first section, a summary is generated for the phenotypic data.
-In the second section, a summary is generated for the genotypic data.
-In this section, \texttt{NoMeasured} refers to the number
+In the second section, a summary is generated for the genotypic data.
+In this section, \texttt{NoMeasured} refers to the number
of genotypes scores, \texttt{CallRate} to the proportion of these,
-\texttt{Q.2} is the frequency of the 'B' allele. The counts in
-three genotypic classes are provided next.
+\texttt{Q.2} is the frequency of the 'B' allele. The counts in
+three genotypic classes are provided next.
\texttt{Pexact} refers to exact $P$-value for the test of Hardy-Weinberg
equilibrium.
-As you have seen above, an object of the class \texttt{gwaa.data-class}
+As you have seen above, an object of the class \texttt{gwaa.data-class}
is sub-settable in the
-standard manner: \texttt{[i, j]}, where \texttt{i} is the index of a
-study subject and \texttt{j} is the index of a SNP. Importantly,
+standard manner: \texttt{[i, j]}, where \texttt{i} is the index of a
+study subject and \texttt{j} is the index of a SNP. Importantly,
\texttt{i} could be a list of indexes:
<<>>=
vec <- which(phdata(srdta)$age >= 65)
vec
summary( gtdata(srdta[vec, 1:3]) )
@
-This shows summary of first three genotypes for people with age
-greater then or equal to 65 y.o. The same result may be achieved
-by sub-setting using a vector of logical values:
+This shows summary of first three genotypes for people with age
+greater then or equal to 65 y.o. The same result may be achieved
+by sub-setting using a vector of logical values:
<<>>=
vec <- (phdata(srdta)$age >= 65)
table(vec)
@@ -645,24 +645,24 @@
summary( gtdata(srdta[vec1, 1:3]) )
@
-Let us explore the object returned by the \texttt{summary} function
+Let us explore the object returned by the \texttt{summary} function
when applied to \texttt{snp.data} class in more detail:
<<>>=
a <- summary( gtdata(srdta[vec1, 1:3]) )
class(a)
@
-Thus, the object returned is a \texttt{data.frame}. Therefore
+Thus, the object returned is a \texttt{data.frame}. Therefore
it should have dimensions and names:
<<>>=
dim(a)
names(a)
@
-Indeed, we derived 8 characteristics ("NoMeasured", "CallRate",
+Indeed, we derived 8 characteristics ("NoMeasured", "CallRate",
"Q.2", "P.11", "P.12", "P.22", "Pexact", "Chromosome") for the first 3 SNPs.
\begin{Exercise}[title=Testing HWE for 10 SNPs]%\label{ex:gwaasubset}
-Test if Hardy-Weinberg equilibrium holds for
-the first 10 SNPs
+Test if Hardy-Weinberg equilibrium holds for
+the first 10 SNPs
\Question Total sample
\Question In cases (\texttt{bt} is 1)
\Question In controls (\texttt{bt} is 0)
@@ -671,28 +671,28 @@
To test for HWE in first 10 SNPs in total sample
<<>>=
[TRUNCATED]
To get the complete diff run:
svnlook diff /svnroot/genabel -r 2004
More information about the Genabel-commits
mailing list