[Genabel-commits] r2004 - tutorials/GenABEL_general

noreply at r-forge.r-project.org noreply at r-forge.r-project.org
Wed Jul 8 22:12:46 CEST 2015


Author: lckarssen
Date: 2015-07-08 22:12:45 +0200 (Wed, 08 Jul 2015)
New Revision: 2004

Modified:
   tutorials/GenABEL_general/workgwaaclass.Rnw
Log:
Summary: Removing unnecessary white space from the Introduction to the GenABEL-package chapter of the GenABEL tutorial.


Modified: tutorials/GenABEL_general/workgwaaclass.Rnw
===================================================================
--- tutorials/GenABEL_general/workgwaaclass.Rnw	2015-07-08 20:05:22 UTC (rev 2003)
+++ tutorials/GenABEL_general/workgwaaclass.Rnw	2015-07-08 20:12:45 UTC (rev 2004)
@@ -1,12 +1,12 @@
 \chapter{Introduction to the \GA{}}
 \label{sec:workgwaaclass}
 
-In this section, you will become familiar with the 
+In this section, you will become familiar with the
 \GA{} library, designed for GWA analysis. Compared to the
-\texttt{genetics} package, it provides specific 
+\texttt{genetics} package, it provides specific
 facilities for storage and manipulation of large amounts
-of data, very fast tests for GWA analysis, and special 
-functions to analyse and graphically present the 
+of data, very fast tests for GWA analysis, and special
+functions to analyse and graphically present the
 results of GWA analysis (thus "analysis of analysis").
 
 Start R and load the \GA{} library using the command
@@ -21,90 +21,90 @@
 \section{General description of \texttt{gwaa.data-class}}
 \label{subs:gwaaclass}
 
-The object you have loaded, \texttt{srdta}, belongs to the 
+The object you have loaded, \texttt{srdta}, belongs to the
 \texttt{gwaa.data} class. This is a special R class developed to facilitate
-GWA analysis. 
+GWA analysis.
 
-In GWA analysis, different types of data are used. These 
-include the phenotypic and genotypic data on the study 
-participants and chromosome and location of every SNP. 
-For every SNP, it is desirable to know the details of 
+In GWA analysis, different types of data are used. These
+include the phenotypic and genotypic data on the study
+participants and chromosome and location of every SNP.
+For every SNP, it is desirable to know the details of
 coding (how are the alleles coded? -- A, T, G, C? -- as well as
 the strand -- '+' or '-', 'top' or 'bot'? -- this coding is for).
 
-One could attempt to store all phenotypes and genotypes 
-together in a single table, using, \eg one row per study subject; 
-than the columns will correspond to study phenotypes and SNPs. 
+One could attempt to store all phenotypes and genotypes
+together in a single table, using, \eg one row per study subject;
+than the columns will correspond to study phenotypes and SNPs.
 For a typical GWA data set, this would lead to a table of few thousands
-rows and few hundreds of thousands to millions of columns. Such a format is generated 
-when one downloads HapMap data for a region. 
-To store GWA data in such tables internally, within R, 
-proves to be inefficient. In \GA{}, a special data class, 
+rows and few hundreds of thousands to millions of columns. Such a format is generated
+when one downloads HapMap data for a region.
+To store GWA data in such tables internally, within R,
+proves to be inefficient. In \GA{}, a special data class,
 \texttt{gwaa.data-class} is used to store GWA data.
 
-You may consider an object of \texttt{gwaa.data-class} as 
-a 'black box' from which you can get specific data using 
-specific functions. If you are interested in the internal 
-structure of the \texttt{gwaa.data-class}, you can find 
+You may consider an object of \texttt{gwaa.data-class} as
+a 'black box' from which you can get specific data using
+specific functions. If you are interested in the internal
+structure of the \texttt{gwaa.data-class}, you can find
 the description in section \ref{subs:gwaaclass_internal}
-(\nameref{subs:gwaaclass_internal}). 
+(\nameref{subs:gwaaclass_internal}).
 
-The data frame, which contains all phenotypic data in the 
-study may be accessed using the \texttt{phdata} function. 
-Let us have a look at the first few rows of the phenotypic data 
+The data frame, which contains all phenotypic data in the
+study may be accessed using the \texttt{phdata} function.
+Let us have a look at the first few rows of the phenotypic data
 frame of \texttt{srdta}:
 <<>>=
 phdata(srdta)[1:5,]
 @
-The rows of this data frame correspond to th estudy subjects, and 
-the columns correspond to the variables. There are two default 
-variables, which are always present in \texttt{phdata}. 
-The first of these is ``id'', which contains study subject 
-identification code. This identification code can be arbitrary 
-character, numer, or alphanumeric combination, 
+The rows of this data frame correspond to th estudy subjects, and
+the columns correspond to the variables. There are two default
+variables, which are always present in \texttt{phdata}.
+The first of these is ``id'', which contains study subject
+identification code. This identification code can be arbitrary
+character, numer, or alphanumeric combination,
 but every person must be coded with an unique ID.
-The second default variable is ``sex'', where males are coded 
+The second default variable is ``sex'', where males are coded
 with ones (``1'') and females are coded with zero (``0'').
 
-It is important to understand that this data frame is not 
-supposed to be directly modified by the user, as its 
+It is important to understand that this data frame is not
+supposed to be directly modified by the user, as its
 structure is coupled to the structure of genotypic data.
 If at some point you need to manipulate (add/delete) the phenotypes
-included in \texttt{phdata}, you need to use such \GA{} functions 
-as \texttt{add.phdata} and \texttt{del.phdata} 
-(see section \ref{subsec:modify_phdata}). 
+included in \texttt{phdata}, you need to use such \GA{} functions
+as \texttt{add.phdata} and \texttt{del.phdata}
+(see section \ref{subsec:modify_phdata}).
 
-%In particular, 
-%it is extremely important to remember that one should not 
-%directly add subjects to the table, change the values of 
-%''id'' and ''sex'', and change the order of subjects in 
-%\texttt{phdata} unless this one is really understands 
-%the way \GA{} works. One also should not run such data 
-%manipulation functions as \texttt{merge}, \texttt{cbind} 
-%and \texttt{rbind} -- exactly because they may change the 
-%number of subjects or interfere with the order. On the other 
-%hand, it is OK to add more variables to the data frame 
-%through direct computations, for example, if one wishes to 
-%add computed body mass index, it is OK to run the command 
-%like 
+%In particular,
+%it is extremely important to remember that one should not
+%directly add subjects to the table, change the values of
+%''id'' and ''sex'', and change the order of subjects in
+%\texttt{phdata} unless this one is really understands
+%the way \GA{} works. One also should not run such data
+%manipulation functions as \texttt{merge}, \texttt{cbind}
+%and \texttt{rbind} -- exactly because they may change the
+%number of subjects or interfere with the order. On the other
+%hand, it is OK to add more variables to the data frame
+%through direct computations, for example, if one wishes to
+%add computed body mass index, it is OK to run the command
+%like
 
 The other part of an object of \texttt{gwaa.data-class} is
-\texttt{gtdata}, which contains 
-all GWA genetic information in an object of class 
-\texttt{snp.data} class. It is not supposed to be 
-modified directly by user. The genotypic data can 
-be accessed through the \texttt{gtdata} function, \eg 
+\texttt{gtdata}, which contains
+all GWA genetic information in an object of class
+\texttt{snp.data} class. It is not supposed to be
+modified directly by user. The genotypic data can
+be accessed through the \texttt{gtdata} function, \eg
 <<>>=
 gtdata(srdta[1:10, 1:10])
 @
-As you can see, these data are of little direct use 
-as these are stored in an internal format -- 
-you need to coerce that to another data type if you 
-want to manipulate/analyse these data using 
+As you can see, these data are of little direct use
+as these are stored in an internal format --
+you need to coerce that to another data type if you
+want to manipulate/analyse these data using
 non-\GA{} functions (see section \ref{subs:gwaa}).
 
-The number of individuals described in an 
-object of \texttt{gwaa.data-class} can be 
+The number of individuals described in an
+object of \texttt{gwaa.data-class} can be
 accessed through \texttt{nids} function, \eg
 <<>>=
 nids(srdta)
@@ -115,31 +115,31 @@
 nsnps(srdta)
 @
 
-The IDs of the individuals included in the study can be 
-accessed via the \texttt{idnames} function, for example 
+The IDs of the individuals included in the study can be
+accessed via the \texttt{idnames} function, for example
 the IDs of the first 7 individuals in the study are
 <<>>=
 idnames(srdta)[1:7]
 @
 
-The sex of the individuals can be accessed using the \texttt{male} 
+The sex of the individuals can be accessed using the \texttt{male}
 function:
 <<>>=
 male(srdta)[1:7]
 @
-where males (heterogametic sex) are assigned with ``1'' and 
+where males (heterogametic sex) are assigned with ``1'' and
 a homogametic sex (females) are assigned the value ``0''.
 
-Names of SNPs can be 
-accessed using the \texttt{snpnames} function; for 
-example the names of the first 10 SNPs in the 
+Names of SNPs can be
+accessed using the \texttt{snpnames} function; for
+example the names of the first 10 SNPs in the
 \texttt{srdta} are
 <<>>=
 snpnames(srdta)[1:10]
 @
 
 SNP annotation includes (shown for the first 10 SNPs only):
-\begin{itemize} 
+\begin{itemize}
 \item Chromosome:
 <<>>=
 chromosome(srdta)[1:10]
@@ -152,18 +152,18 @@
 <<>>=
 coding(srdta)[1:10]
 @
-For every SNP, the coding is represented with a pair of characters, 
-for example ``AG''. For an ``AG'' polymorphism, you may expect 
-``AA'', ``AG'' and ``GG'' genotypes to be found in your population. 
-The order (that is ``AG'' vs.~``GA'') is important -- 
-the first allele reported is the one 
-which will be used as a reference in association analysis, and 
-thus the effects are reported for the second allele. You can 
+For every SNP, the coding is represented with a pair of characters,
+for example ``AG''. For an ``AG'' polymorphism, you may expect
+``AA'', ``AG'' and ``GG'' genotypes to be found in your population.
+The order (that is ``AG'' vs.~``GA'') is important --
+the first allele reported is the one
+which will be used as a reference in association analysis, and
+thus the effects are reported for the second allele. You can
 also access the reference allele with the method \texttt{refallele}
 <<>>=
 refallele(srdta)[1:10]
 @
-and the effective (or 'coded') allelel with 
+and the effective (or 'coded') allelel with
 <<>>=
 effallele(srdta)[1:10]
 @
@@ -177,15 +177,15 @@
 \begin{summary}
 \item \GA{} uses a special data class, \texttt{gwaa.data-class}, to
 	store GWA data.
-\item To access the content of an object of \texttt{gwaa.data-class}, 
+\item To access the content of an object of \texttt{gwaa.data-class},
 	a number of functions is used
 \end{summary}
 
 \begin{Exercise}[title=Exploring IDs in \texttt{srdta}]
-Explore \texttt{srdta}. 
-\Question How many people are included in the study? 
-\Question How many of these are males? 
-\Question How many are females? 
+Explore \texttt{srdta}.
+\Question How many people are included in the study?
+\Question How many of these are males?
+\Question How many are females?
 \Question What is male proportion?
 \end{Exercise}
 \begin{Answer}
@@ -206,7 +206,7 @@
 nids(srdta) - sum(male(srdta))
 @
 \ldots you could get the same answer like this\footnote{
-This is something covered later in the section \ref{subs:gwaa_subs} 
+This is something covered later in the section \ref{subs:gwaa_subs}
 ("\nameref{subs:gwaa_subs}")
 }:
 <<>>=
@@ -224,9 +224,9 @@
 
 
 %\noindent\begin{ex}\label{ex2}
-%In this exercise, you will explore the vector representing 
-%the sex of study subjects in \texttt{srdta} example data 
-%set supplied together with \GA{}. First, you need to load 
+%In this exercise, you will explore the vector representing
+%the sex of study subjects in \texttt{srdta} example data
+%set supplied together with \GA{}. First, you need to load
 %\GA{} by typing
 %<<results=hide>>=
 %library(GenABEL)
@@ -235,26 +235,26 @@
 %<<>=
 %data(srdta)
 %@
-%The vector containing study subjects sex can be accessed 
-%through \texttt{srdta at gtdata@male}; this vector's value 
-%is one when the corresponding person is male and zero 
-%otherwise. 
-%Explore \texttt{srdta}. 
+%The vector containing study subjects sex can be accessed
+%through \texttt{srdta at gtdata@male}; this vector's value
+%is one when the corresponding person is male and zero
+%otherwise.
+%Explore \texttt{srdta}.
 %\begin{enumerate}
-%\item What is the ID and sex of the first person in the data set? 
-%\item Of the 22nd person? 
-%\item How many males are observed among first hundred subjects? 
+%\item What is the ID and sex of the first person in the data set?
+%\item Of the 22nd person?
+%\item How many males are observed among first hundred subjects?
 %\item How many FEMALES are among 4th hundred?
 %\item What is the male proportion in first 1000 people?
 %\item What is the FEMALE proportion in second 1000 (1001:2000) people?
-%\item What is name, chromosome and map position of 33rd maker? 
+%\item What is name, chromosome and map position of 33rd maker?
 %\item What is distance between markers 25 and 26?
 %\end{enumerate}
 %\end{ex}
 
 \begin{Exercise}[title=Exploring SNPs in \texttt{srdta}]%\label{ex2.1}
-Explore the SNPs contained in \texttt{srdta} using the functions to 
-access SNP names (\texttt{snpnames}) and map (\texttt{map}) 
+Explore the SNPs contained in \texttt{srdta} using the functions to
+access SNP names (\texttt{snpnames}) and map (\texttt{map})
 location
 \Question What are names of markers located after 2,490,000 b.p.?
 \Question Between 1,100,000 and 1,105,000 b.p.?
@@ -265,7 +265,7 @@
 vec <- (map(srdta) > 2490000)
 snpnames(srdta)[vec]
 @
-The names of markers located 
+The names of markers located
 between 1,100,000 and 1,105,000 b.p.~are:
 <<>>=
 vec <- (map(srdta) > 1100000 & map(srdta) < 1105000)
@@ -279,13 +279,13 @@
 \section{Accessing and modifying phenotypic data}
 \label{subsec:modify_phdata}
 
-As was already mentioned, the object returned by \texttt{phdata} 
-contains phenotypic data and is 
-a conventional data frame, wich obligatory includes 
-'id' and 'sex' variables, and ordered an a way that it 
-couples to the genotypic data. 
+As was already mentioned, the object returned by \texttt{phdata}
+contains phenotypic data and is
+a conventional data frame, wich obligatory includes
+'id' and 'sex' variables, and ordered an a way that it
+couples to the genotypic data.
 
-Being a data frame, \texttt{phdata} may be accessed using 
+Being a data frame, \texttt{phdata} may be accessed using
 the corresponding methods:
 <<>>=
 phdata(srdta)[1:5, ]
@@ -295,14 +295,14 @@
 phdata(srdta)$sex[1:5]
 @
 
-The modification of the phenotypic data is 
-performed using special methods, because 
-of specific restrictions on phenotypic 
-data frames. There are two main functions which 
-allow you to add (\texttt{add.phdata}) and 
-delete (\texttt{del.phdata}) phenotypes 
-from \texttt{phdata} part of an object of 
-\texttt{gwaa.data-class}. 
+The modification of the phenotypic data is
+performed using special methods, because
+of specific restrictions on phenotypic
+data frames. There are two main functions which
+allow you to add (\texttt{add.phdata}) and
+delete (\texttt{del.phdata}) phenotypes
+from \texttt{phdata} part of an object of
+\texttt{gwaa.data-class}.
 
 For example, if you want to add a variable (say, the square of age)
 computed from the ``age'' variable of \texttt{srdta}
@@ -316,16 +316,16 @@
 phdata(srdta)[1:5, ]
 @
 
-You can add more than one variable at once using the 
-same function, however, in this case the second (``newph'') 
-argument of the function should be a data frame, 
+You can add more than one variable at once using the
+same function, however, in this case the second (``newph'')
+argument of the function should be a data frame,
 which contains an 'id' variable specifing the IDs of the
-individuals. Imagine we have the data for individuals 
-'p1', 'p2' and 'p7' (we will generate random data 
+individuals. Imagine we have the data for individuals
+'p1', 'p2' and 'p7' (we will generate random data
 for them; pay attention only to the result):
 <<>>=
 newvalues <- matrix( rnorm(3*5), 3, 5 )
-newdata   <- data.frame(id=c("p1", "p2", "p7"), 
+newdata   <- data.frame(id=c("p1", "p2", "p7"),
                         ph1=1, ph2=1, ph3=1, ph4=1, ph5=1)
 newdata[, c(2:6)] <- newvalues
 newdata
@@ -337,21 +337,21 @@
 phdata(srdta)[1:10, ]
 @
 
-Finally, if you need, you can delete some phenotypes 
-from the \texttt{phdata} using \texttt{del.phdata} 
-function. Let us delete the phenotypes we have just 
+Finally, if you need, you can delete some phenotypes
+from the \texttt{phdata} using \texttt{del.phdata}
+function. Let us delete the phenotypes we have just
 added:
 <<>>=
-srdta <- del.phdata(srdta, 
+srdta <- del.phdata(srdta,
                     c("age_squared", "ph1", "ph2", "ph3", "ph4", "ph5"))
 phdata(srdta)[1:10, ]
 @
 
 \begin{summary}
-\item Phenotypic data contained in an object of \texttt{gwaa.data-class} 
+\item Phenotypic data contained in an object of \texttt{gwaa.data-class}
 	can be accessed using the \texttt{phdata} functions
-\item You can add phenotypes using the \texttt{add.phdata} function 
-\item You can delete phenotypes using the \texttt{del.phdata} function 
+\item You can add phenotypes using the \texttt{add.phdata} function
+\item You can delete phenotypes using the \texttt{del.phdata} function
 \end{summary}
 
 
@@ -359,38 +359,38 @@
 \section{Sub-setting and coercing gwaa.data}
 \label{subs:gwaa_subs}
 
-It is possible to sub-set the object, which stores the GWA data 
-in the manner similar to that used for conventional \texttt{R} 
-matrices and data frames. Very primitively, 
-you may think of 
-an object of class \texttt{gwaa.data} as a matrix whose rows 
-correspond to study subjects and columns correspond to SNPs studied 
-(though the actual object is more complicated). For example, 
-if we would like to investigate the 
-content of \texttt{srdta} for the first 5 people and 3 SNPs, we can 
+It is possible to sub-set the object, which stores the GWA data
+in the manner similar to that used for conventional \texttt{R}
+matrices and data frames. Very primitively,
+you may think of
+an object of class \texttt{gwaa.data} as a matrix whose rows
+correspond to study subjects and columns correspond to SNPs studied
+(though the actual object is more complicated). For example,
+if we would like to investigate the
+content of \texttt{srdta} for the first 5 people and 3 SNPs, we can
 run
 <<>>=
 ssubs <- srdta[1:5, 1:3]
 class(ssubs)
 @
 
-As you can see, by sub-setting we obtained a smaller object of 
-\texttt{gwaa.data-class}. The two major parts it contains are 
-phenotypic data, which can be accessed through \texttt{phdata} 
+As you can see, by sub-setting we obtained a smaller object of
+\texttt{gwaa.data-class}. The two major parts it contains are
+phenotypic data, which can be accessed through \texttt{phdata}
 (discussed in section \ref{subsec:modify_phdata}):
 <<>>=
 phdata(ssubs)
 @
-and genotypc data, which can be accessed via the \texttt{gtdata} 
+and genotypc data, which can be accessed via the \texttt{gtdata}
 function:
 <<>>=
 gtdata(ssubs)
 @
 whose content is not quite straightforward to read.
 
-To get human-readable information, a genotypic object  
-should be coerced to a regular \texttt{R} data type, 
-\eg character, using the 
+To get human-readable information, a genotypic object
+should be coerced to a regular \texttt{R} data type,
+\eg character, using the
 \texttt{as.character()} function:
 <<>>=
 as.character(gtdata(ssubs))
@@ -400,67 +400,67 @@
 <<>>=
 as.numeric(gtdata(ssubs))
 @
-Note that conversion to numeric happened according to the 
+Note that conversion to numeric happened according to the
 underlying genotype and the rules specified by SNP coding:
 <<>>=
 coding(ssubs)
 @
--- the genotype, which is made of the 'first' 
-allele of the 'code' is converted to ``0'', 
-the heterozygote to ``1'' and a homozygote 
+-- the genotype, which is made of the 'first'
+allele of the 'code' is converted to ``0'',
+the heterozygote to ``1'' and a homozygote
 for the second allele is converted to ``2''.
 
-For example, when the coding is ``GA'', as it is for the \texttt{rs18} 
+For example, when the coding is ``GA'', as it is for the \texttt{rs18}
 (the second SNP), homozygotes for the first allele, as
 specified by coding (``G'')
-are converted to zeros (``0''), heterozygotes are converted to 
-ones (``1''), and homozygotes for the second allele (``A'') are 
-converted to twos (``2''). Clearly, when numerically converted 
-data are used for association analysis, the effects will be 
-estimated for the second allele, while first will be used as 
+are converted to zeros (``0''), heterozygotes are converted to
+ones (``1''), and homozygotes for the second allele (``A'') are
+converted to twos (``2''). Clearly, when numerically converted
+data are used for association analysis, the effects will be
+estimated for the second allele, while first will be used as
 a reference.
 
-Genotypic data converted to standard \texttt{R} format 
-can be used in any further analysis. 
+Genotypic data converted to standard \texttt{R} format
+can be used in any further analysis.
 
-Several useful genetic analysis libraries were developed for R. 
-These include \texttt{genetics} (analysis of linkage disequilibrium and 
-many other useful functions) and \texttt{haplo.stats} (analysis of 
-association between traits and haplotypes). These use their own 
-genetic data formats. 
+Several useful genetic analysis libraries were developed for R.
+These include \texttt{genetics} (analysis of linkage disequilibrium and
+many other useful functions) and \texttt{haplo.stats} (analysis of
+association between traits and haplotypes). These use their own
+genetic data formats.
 
-One can translate \GA{} genetic data 
+One can translate \GA{} genetic data
 to the format used by "genetics" library using \texttt{as.genotype()}:
 <<>>=
 as.genotype(gtdata(ssubs))
 @
 
-To translate \GA{} data to the format used by "haplo.stats" you 
+To translate \GA{} data to the format used by "haplo.stats" you
 can use function \texttt{as.hsgeno()}
 <<>>=
 as.hsgeno(gtdata(ssubs))
 @
 
 Actually, most users will not need the latter function, as \GA{}
-provides a functional interface to "haplo.stats" (such \GA{} functions 
+provides a functional interface to "haplo.stats" (such \GA{} functions
 as \texttt{scan.haplo()} and \texttt{scan.haplo.2D()}).
 
-It is possible to select sub-sets of \texttt{gwaa.data-class} based not 
-only on index (\eg first 10 people and SNP number 33), but also based 
-on names. 
+It is possible to select sub-sets of \texttt{gwaa.data-class} based not
+only on index (\eg first 10 people and SNP number 33), but also based
+on names.
 
-For example, if we would like to retrieve phenotypic data on people 
+For example, if we would like to retrieve phenotypic data on people
 with IDs "p141", "p147" and "p2000", we can use
 <<>>=
 phdata( srdta[c("p141", "p147", "p2000"), ] )
 @
 here, the first part of expression sub-sets \texttt{srdta} on
-selected IDs, and the second tells which part of the retrieved 
-sub-set we want to see. You can try 
-\texttt{srdta[c("p141", "p147", "p2000"),]}, but be prepared to 
-see long output, as all information will be reported. 
+selected IDs, and the second tells which part of the retrieved
+sub-set we want to see. You can try
+\texttt{srdta[c("p141", "p147", "p2000"),]}, but be prepared to
+see long output, as all information will be reported.
 
-In a similar manner, we can also select on SNP name. For example, 
+In a similar manner, we can also select on SNP name. For example,
 if we are interested to see information on SNPs "rs10" and "rs29"
 for above people, we can run
 <<>>=
@@ -468,12 +468,12 @@
 gtdata(srdta[c("p141", "p147", "p2000"), c("rs10", "rs29")])
 @
 
-To see the actual genotypes for the above three people and 
+To see the actual genotypes for the above three people and
 two SNPs, use
 <<>>=
 as.character(srdta[c("p141", "p147", "p2000"), c("rs10", "rs29")])
 @
-or 
+or
 <<>>=
 as.numeric(srdta[c("p141", "p147", "p2000"), c("rs10", "rs29")])
 @
@@ -484,36 +484,36 @@
 \Question What is the frequency of \textbf{non-reference} (``effect'') allele in total sample?
 \Question What is the frequency of the effect allele in males?
 \Question What is the frequency of the effect allele in females?
-\Question What is the frequency of the \textbf{reference} allele 
+\Question What is the frequency of the \textbf{reference} allele
 in the total sample, males and females?
 \end{Exercise}
 \begin{Answer}
-To learn what allele of ``rs114'' is the reference 
+To learn what allele of ``rs114'' is the reference
 you need to run
 <<>>=
 coding(srdta)["rs114"]
 @
-Here, the first (``\Sexpr{strsplit(coding(srdta)["rs114"], "")[[1]][1]}'') 
-allele is the reference and thus 
-the second (``\Sexpr{strsplit(coding(srdta)["rs114"], "")[[1]][2]}'') 
-is the effect allele. Remember that when using the \texttt{as.numeric} 
-function to convert the genotypes to human-readable and \texttt{R}-operatable 
-format, the homozygotes for reference will be coded as ``0'', heterozygotes 
+Here, the first (``\Sexpr{strsplit(coding(srdta)["rs114"], "")[[1]][1]}'')
+allele is the reference and thus
+the second (``\Sexpr{strsplit(coding(srdta)["rs114"], "")[[1]][2]}'')
+is the effect allele. Remember that when using the \texttt{as.numeric}
+function to convert the genotypes to human-readable and \texttt{R}-operatable
+format, the homozygotes for reference will be coded as ``0'', heterozygotes
 as ``1'' and the non-reference (``effect'') homozygotes will be coded as ``2'':
 <<>>=
-table(as.character( gtdata(srdta[, "rs114"] )), 
+table(as.character( gtdata(srdta[, "rs114"] )),
       as.numeric(   gtdata(srdta[,"rs114"] )))
 @
 
-To compute the frequency of the effect allele of SNP ``rs114'' in the total 
-sample, you can go two ways. First, we can try to take a sum 
-of all ``rs114'' genotypes and divide it by twice the number of 
+To compute the frequency of the effect allele of SNP ``rs114'' in the total
+sample, you can go two ways. First, we can try to take a sum
+of all ``rs114'' genotypes and divide it by twice the number of
 people:
 <<>>=
 a <- as.numeric(gtdata(srdta[, "rs114"]))
 sum(a)
 @
-This, however, returns \texttt{NA}, because some of the genotypes are 
+This, however, returns \texttt{NA}, because some of the genotypes are
 missing. We can deal with this problem by running \texttt{sum()}
 with the option \texttt{na.rm=TRUE}:
 <<>>=
@@ -521,13 +521,13 @@
 @
 so the number of ``effect'' alleles is \Sexpr{sum(a, na.rm=TRUE)}.
 
-However, now we do not know 
-what was the number of people for whom the genotype was 
+However, now we do not know
+what was the number of people for whom the genotype was
 measured! -- \texttt{nids} would return the \emph{total}
-number of people, but not the number of those measured for 
-``rs114''. 
+number of people, but not the number of those measured for
+``rs114''.
 
-This problem can be dealt with through using the \texttt{is.na(A)} 
+This problem can be dealt with through using the \texttt{is.na(A)}
 function which returns true when some element of \texttt{A} is not measured.
 Thus, the number of people with measured genotype for ``rs114'' is
 <<>>=
@@ -535,20 +535,20 @@
 nmeasured <- sum(!is.na(a))
 nmeasured
 @
-(note the ``!'' before \texttt{is.na}, which means NOT, so we get 
-these elements which are not \texttt{NA}). 
+(note the ``!'' before \texttt{is.na}, which means NOT, so we get
+these elements which are not \texttt{NA}).
 Consequently, the frequency of the ``effect'' allele is
 <<>>=
 sum(a, na.rm=TRUE) / (2 * nmeasured)
 @
 
-An easier way would be to compute mean value of ``rs114'' with the 
+An easier way would be to compute mean value of ``rs114'' with the
 \texttt{mean( \ldots , na.rm=TRUE)} function and divide it by 2:
 <<>>=
 mean(a, na.rm=TRUE)/2
 @
 
-To compute the frequency of the effect allele of ``rs114'' in males, 
+To compute the frequency of the effect allele of ``rs114'' in males,
 you can use
 <<>>=
 amale <- as.numeric( gtdata(srdta[male(srdta)==1, "rs114"]) )
@@ -561,7 +561,7 @@
 mean(afemale, na.rm=TRUE)/2
 @
 
-The frequencies of the reference allele are computed very simply as 
+The frequencies of the reference allele are computed very simply as
 one minus the frequency of the effective allele:
 <<>>=
 1 - mean(a,       na.rm=TRUE)/2
@@ -573,25 +573,25 @@
 
 
 \begin{summary}
-\item It is possible to obtain subsets of objects of 
+\item It is possible to obtain subsets of objects of
 \texttt{gwaa.data-class} and \texttt{snp.data-class}
-using the standard 2D sub-setting model \texttt{[i, j]}, 
-where \texttt{i} corresponds to study subjects and 
+using the standard 2D sub-setting model \texttt{[i, j]},
+where \texttt{i} corresponds to study subjects and
 \texttt{j} corresponds to SNPs.
-\item It is possible to provide ID and SNP names instead 
-of indexes when sub-setting an object of class 
+\item It is possible to provide ID and SNP names instead
+of indexes when sub-setting an object of class
 \texttt{gwaa.data-class}.
-\item The function \texttt{as.numeric()} converts 
+\item The function \texttt{as.numeric()} converts
 genotypic data from \texttt{snp.data-class} to
-regular integer numbers, which can be used in analysis 
+regular integer numbers, which can be used in analysis
 with R.
-\item The function \texttt{as.character()} converts 
+\item The function \texttt{as.character()} converts
 genotypic data from \texttt{snp.data-class} to
 character format.
-\item The function \texttt{as.genotype()} converts 
+\item The function \texttt{as.genotype()} converts
 genotypic data from \texttt{snp.data-class} to
 the format used by the \texttt{genetics} library.
-\item The function \texttt{as.hsgeno()} converts 
+\item The function \texttt{as.hsgeno()} converts
 genotypic data from \texttt{snp.data-class} to
 the format used by the \texttt{haplo.stats} library.
 \end{summary}
@@ -602,9 +602,9 @@
 
 Implementation of the function \texttt{summary()} to
 summarize the genotypic part of a \texttt{gwaa.data-class} object
-is very useful in genetic data 
-exploration and quality control (QC). 
-Let us try application of this function to the 
+is very useful in genetic data
+exploration and quality control (QC).
+Let us try application of this function to the
 \texttt{ssubs}:
 <<>>=
 a <- summary(ssubs)
@@ -612,27 +612,27 @@
 @
 
 In the first section, a summary is generated for the phenotypic data.
-In the second section, a summary is generated for the genotypic data. 
-In this section, \texttt{NoMeasured} refers to the number 
+In the second section, a summary is generated for the genotypic data.
+In this section, \texttt{NoMeasured} refers to the number
 of genotypes scores, \texttt{CallRate} to the proportion of these,
-\texttt{Q.2} is the frequency of the 'B' allele. The counts in 
-three genotypic classes are provided next. 
+\texttt{Q.2} is the frequency of the 'B' allele. The counts in
+three genotypic classes are provided next.
 \texttt{Pexact} refers to exact $P$-value for the test of Hardy-Weinberg
 equilibrium.
 
-As you have seen above, an object of the class \texttt{gwaa.data-class} 
+As you have seen above, an object of the class \texttt{gwaa.data-class}
 is sub-settable in the
-standard manner: \texttt{[i, j]}, where \texttt{i} is the index of a 
-study subject and \texttt{j} is the index of a SNP. Importantly, 
+standard manner: \texttt{[i, j]}, where \texttt{i} is the index of a
+study subject and \texttt{j} is the index of a SNP. Importantly,
 \texttt{i} could be a list of indexes:
 <<>>=
 vec <- which(phdata(srdta)$age >= 65)
 vec
 summary( gtdata(srdta[vec, 1:3]) )
 @
-This shows summary of first three genotypes for people with age 
-greater then or equal to 65 y.o. The same result may be achieved 
-by sub-setting using a vector of logical values: 
+This shows summary of first three genotypes for people with age
+greater then or equal to 65 y.o. The same result may be achieved
+by sub-setting using a vector of logical values:
 <<>>=
 vec <- (phdata(srdta)$age >= 65)
 table(vec)
@@ -645,24 +645,24 @@
 summary( gtdata(srdta[vec1, 1:3]) )
 @
 
-Let us explore the object returned by the \texttt{summary} function 
+Let us explore the object returned by the \texttt{summary} function
 when applied to \texttt{snp.data} class in more detail:
 <<>>=
 a <- summary( gtdata(srdta[vec1, 1:3]) )
 class(a)
 @
-Thus, the object returned is a \texttt{data.frame}. Therefore 
+Thus, the object returned is a \texttt{data.frame}. Therefore
 it should have dimensions and names:
 <<>>=
 dim(a)
 names(a)
 @
-Indeed, we derived 8 characteristics ("NoMeasured", "CallRate",   
+Indeed, we derived 8 characteristics ("NoMeasured", "CallRate",
 "Q.2", "P.11", "P.12", "P.22", "Pexact", "Chromosome") for the first 3 SNPs.
 
 \begin{Exercise}[title=Testing HWE for 10 SNPs]%\label{ex:gwaasubset}
-Test if Hardy-Weinberg equilibrium holds for 
-the first 10 SNPs 
+Test if Hardy-Weinberg equilibrium holds for
+the first 10 SNPs
 \Question Total sample
 \Question In cases (\texttt{bt} is 1)
 \Question In controls (\texttt{bt} is 0)
@@ -671,28 +671,28 @@
 To test for HWE in first 10 SNPs in total sample
 <<>>=
[TRUNCATED]

To get the complete diff run:
    svnlook diff /svnroot/genabel -r 2004


More information about the Genabel-commits mailing list