[IPSUR-commits] r126 - pkg/IPSUR/inst/doc

Wed Jan 6 07:33:32 CET 2010

Author: gkerns
Date: 2010-01-06 07:33:30 +0100 (Wed, 06 Jan 2010)
New Revision: 126

Modified:
   pkg/IPSUR/inst/doc/IPSUR.Rnw
Log:
lot of figure references, and added poly to MLR


Modified: pkg/IPSUR/inst/doc/IPSUR.Rnw
===================================================================

--- pkg/IPSUR/inst/doc/IPSUR.Rnw	2010-01-06 03:48:50 UTC (rev 125)
+++ pkg/IPSUR/inst/doc/IPSUR.Rnw	2010-01-06 06:33:30 UTC (rev 126)
@@ -172,18 +172,6 @@
 morestring=[b]"
 }
 
-
-
-%\newcommand... at figure{ \@dottedtocline{1}{1.5em}{2.3em} }
-%\usepackage{tocloft}
-%\setlength{\cftfignumwidth}{3em}
-% get the list of figures right
-%\newcounter{myfigure}[chapter]
-%\renewcommand{\thefigure}{\thechapter.\thefigure}
-%\makeatletter
-%\@addtoreset{myfigure}{chapter}
-%\makeatother
-
 \@ifundefined{showcaptionsetup}{}{%
  \PassOptionsToPackage{caption=false}{subfig}}
 \usepackage{subfig}
@@ -2444,8 +2432,9 @@
 half.
 
 We have already encountered skewed distributions: both the discoveries
-data in Figure BLANK and the volcano data in Figure BLANK appear right-skewed.
-The UKDriverDeaths data in Example BLANK is relatively symmetric (but
+data in Figure \ref{fig:Various-stripchart-methods,} and the precip
+data in Figure \ref{fig:histograms-bins} appear right-skewed. The
+UKDriverDeaths data in Example BLANK is relatively symmetric (but
 note the one extreme value 2654 identified at the bottom of the stemplot).
 
 
@@ -5136,8 +5125,8 @@
 
 Students are usually surprised to hear that, using the formula above,
 one needs only $n=23$ students to have a greater than 50\% chance
-of at least one match. Figure BLANK shows a graph of the birthday
-probabilities:
+of at least one match. Figure \ref{fig:The-Birthday-Problem} shows
+a graph of the birthday probabilities:
 \end{example}
 %
 \begin{figure}
@@ -6569,7 +6558,7 @@
 In particular, the CDF of $X$ is defined for the entire real line,
 $\R$. The CDF is right continuous and nondecreasing. A graph of the
 $\mathsf{binom}(\mathtt{size}=3,\,\mathtt{prob}=1/2)$ CDF is shown
-in Figure BLANK.
+in Figure \ref{fig:binom-cdf-base}.
 
 \end{example}
 %
@@ -6585,7 +6574,7 @@
 \par\end{centering}
 
 \caption{Graph of the $\mathsf{binom}(\mathtt{size}=3,\,\mathtt{prob}=1/2)$
-CDF}
+CDF\label{fig:binom-cdf-base}}
 
 \end{figure}
 
@@ -6624,7 +6613,7 @@
 Random variables defined via the \inputencoding{latin9}\lstinline[showstringspaces=false]!distr!\inputencoding{utf8}
 package may be \emph{plotted}, which will return graphs of the PMF,
 CDF, and quantile function (introduced in Section BLANK). See Figure
-BLANK for an example.
+\ref{fig:binom-plot-distr} for an example.
 
 %
 \begin{figure}[H]
@@ -6635,7 +6624,7 @@
 \par\end{centering}
 
 \caption{The \textsf{binom}(\texttt{size} = 3, \texttt{prob} = 0.5) distribution
-from the \texttt{distr} package}
+from the \texttt{distr} package\label{fig:binom-plot-distr}}
 
 \end{figure}
 
@@ -6930,13 +6919,13 @@
 @
 \par\end{centering}
 
-\caption{The empirical CDF}
+\caption{The empirical CDF\label{fig:empirical-CDF}}
 
 \end{figure}
 
 
-See Figure BLANK. The graph is of a right-continuous function with
-jumps exactly at the locations stored in \inputencoding{latin9}\lstinline[showstringspaces=false]!x!\inputencoding{utf8}.
+See Figure \ref{fig:empirical-CDF}. The graph is of a right-continuous
+function with jumps exactly at the locations stored in \inputencoding{latin9}\lstinline[showstringspaces=false]!x!\inputencoding{utf8}.
 There are no repeated values in \inputencoding{latin9}\lstinline[showstringspaces=false]!x!\inputencoding{utf8}
 so all of the jumps are equal to $1/5=0.2$.
 
@@ -8055,15 +8044,15 @@
 Here are some properties of quantile functions:
 \begin{enumerate}
 \item The quantile function is defined and finite for all $0<p<1$.
-\item $Q_{X}$ is left-continuous (see Appendix BLANK). For discrete random
-variables it is a step function, and for continuous random variables
-it is a continuous function.
+\item $Q_{X}$ is left-continuous (see Appendix \ref{sec:Differential-and-Integral}).
+For discrete random variables it is a step function, and for continuous
+random variables it is a continuous function.
 \item In the continuous case the graph of $Q_{X}$ may be obtained by reflecting
 the graph of $F_{X}$ about the line $y=x$. In the discrete case,
 before reflecting one should: 1) connect the dots to get rid of the
 jumps -- this will make the graph look lik a set of stairs, 2) erase
 the horizontal lines so that only vertical lines remain, and finally
-3) swap the open circles with the solid dots. Please see Figure BLANK
+3) swap the open circles with the solid dots. Please see Figure \ref{fig:binom-plot-distr}
 for a comparison. 
 \item The two limits \[
 \lim_{p\to0^{+}}Q_{X}(p)\quad\mbox{and}\quad\lim_{p\to1^{-}}Q_{X}(p)\]
@@ -8092,14 +8081,14 @@
 to do was back substitute for $x=g^{-1}(u)$ in the PMF of $X$ (sometimes
 accumulating probability mass along the way). In the continuous case,
 however, we need more sophisticated tools. Now would be a good time
-to review Appendix BLANK.
+to review Appendix \ref{sec:Differential-and-Integral}.
 
 
 \subsection{The PDF Method}
 \begin{prop}
-Let $X$ have PDF $f_{X}$ and let $g$ be a function which is one-to-one
-with a differentiable inverse $g^{-1}$. Then the PDF of $U=g(X)$
-is given by\begin{equation}
+\label{pro:func-cont-rvs-pdf-formula}Let $X$ have PDF $f_{X}$ and
+let $g$ be a function which is one-to-one with a differentiable inverse
+$g^{-1}$. Then the PDF of $U=g(X)$ is given by\begin{equation}
 f_{U}(u)=f_{X}\left[g^{-1}(u)\right]\ \left|\frac{\diff}{\diff u}g^{-1}(u)\right|.\end{equation}
 \end{prop}
 \begin{rem}
@@ -8118,7 +8107,7 @@
 f_{Y}(y)=f_{X}(\ln y)\cdot\left|\frac{1}{y}\right|=\frac{1}{\sigma\sqrt{2\pi}}\exp\left\{ \frac{(\ln y-\mu)^{2}}{2\sigma^{2}}\right\} \cdot\frac{1}{y},\]
 where we have dropped the absolute value bars since $y>0$. The random
 variable $Y$ is said to have a \emph{lognormal distribution}; see
-Section BLANK.
+Section \ref{sec:Other-Continuous-Distributions}.
 \end{example}
 
 \begin{example}
@@ -8154,10 +8143,10 @@
 fine.
 \begin{rem}
 In the case that $g$ is not monotone we cannot apply Proposition
-BLANK directly. However, hope is not lost. Rather, we break the support
-of $X$ into pieces such that $g$ is monotone on each one. We apply
-Proposition BLANK on each piece, and finish up by adding the results
-together.
+\ref{pro:func-cont-rvs-pdf-formula} directly. However, hope is not
+lost. Rather, we break the support of $X$ into pieces such that $g$
+is monotone on each one. We apply Proposition \ref{pro:func-cont-rvs-pdf-formula}
+on each piece, and finish up by adding the results together.
 \end{rem}
 
 \subsection{The CDF method}
@@ -8188,7 +8177,7 @@
 the PDF $f_{Y}$ we need only differentiate $F_{Y}$:\[
 f_{Y}(y)=\frac{\diff}{\diff y}\left(1-\me^{-y}\right)=0-\me^{-y}(-1),\]
 or $f_{Y}(y)=\me^{-y}$ for $y>0$. This turns out to be a member
-of the exponential family of distributions, see Section BLANK. 
+of the exponential family of distributions, see Section \ref{sec:Other-Continuous-Distributions}. 
 \end{example}
 
 \begin{example}
@@ -9457,7 +9446,8 @@
 should be no surprise that the correlation between $X$ and $Y$ is
 exactly $\mbox{Corr}(X,Y)=\rho$.
 \begin{prop}
-The conditional distribution of $Y|\, X=x$ is $\mathsf{norm}(\mathtt{mean}=\mu_{Y|x},\,\mathtt{sd}=\sigma_{Y|x})$,
+\label{pro:mvnorm-cond-dist}The conditional distribution of $Y|\, X=x$
+is $\mathsf{norm}(\mathtt{mean}=\mu_{Y|x},\,\mathtt{sd}=\sigma_{Y|x})$,
 where\begin{equation}
 \mu_{Y|x}=\mu_{Y}+\rho\frac{\sigma_{Y}}{\sigma_{X}}\left(x-\mu_{X}\right),\mbox{ and }\sigma_{Y|x}=\sigma_{Y}\sqrt{1-\rho^{2}}.\end{equation}
 
@@ -10664,7 +10654,7 @@
 highest likelihood?'' In other words, for all of the different possible
 values of $F$, which one makes the above probability the biggest?
 We can answer this question with a plot of $\P(X=x)$ versus $F$.
-See Figure BLANK.
+See Figure \ref{fig:capture-recapture}.
 \end{example}
 %
 \begin{figure}
@@ -10726,7 +10716,7 @@
 $L(p)$:\[
 L(p)=p^{\sum x_{i}}(1-p)^{n-\sum x_{i}}.\]
 A graph of $L$ for values of $\sum x_{i}=3,\ 4$, and 5 when $n=7$
-is shown in Figure BLANK. 
+is shown in Figure \ref{fig:fishing-part-two}. 
 
 <<eval = FALSE>>=
 curve(x^5*(1-x)^2, from = 0, to = 1, xlab = "p", ylab = "L(p)")
@@ -10744,7 +10734,7 @@
 @
 \par\end{centering}
 
-\caption{Assorted likelihood functions for fishing, part two}
+\caption{Assorted likelihood functions for fishing, part two\label{fig:fishing-part-two}}
 
 
 {\small Three graphs are shown of $L$ when $\sum x_{i}$ equals 3,
@@ -10782,7 +10772,7 @@
 Test (see BLANK) we could be certain that $\hat{p}=\xbar$ is indeed
 a maximum likelihood estimator, and not a minimum likelihood estimator.
 \end{rem}
-The result is shown in Figure BLANK.
+The result is shown in Figure \ref{fig:species-mle}.
 \end{example}
 %
 \begin{figure}
@@ -12153,7 +12143,8 @@
 We use these $n$ data points to estimate the parameters.
 
 More to the point, there are \emph{three simple linear regression
-(SLR) assumptions} that will form the basis for the rest of this chapter:
+(SLR) assumptions}\index{regression assumptions} that will form the
+basis for the rest of this chapter:
 \begin{assumption}
 We assume that $\mu$ is a linear function of $x$, that is, \begin{equation}
 \mu(x)=\beta_{0}+\beta_{1}x,\end{equation}
@@ -12175,30 +12166,31 @@
 \end{assumption}
 \begin{rem}
 We assume both the normality of the errors $\epsilon$ and the linearity
-of the mean function $\mu$. Recall from Proposition BLANK of Chapter
-BLANK that if $(X,Y)\sim\mathsf{mvnorm}$ then the mean of $Y|x$
-is a linear function of $x$. This is not a coincidence. In more advanced
-classes we study the case that both $X$ and $Y$ are random, and
-in particular, when they are jointly normally distributed.
+of the mean function $\mu$. Recall from Proposition \ref{pro:mvnorm-cond-dist}
+of Chapter \ref{cha:Multivariable-Distributions} that if $(X,Y)\sim\mathsf{mvnorm}$
+then the mean of $Y|x$ is a linear function of $x$. This is not
+a coincidence. In more advanced classes we study the case that both
+$X$ and $Y$ are random, and in particular, when they are jointly
+normally distributed.
 \end{rem}
 
 \subsection*{What does it all mean?}
 
 See Figure \ref{fig:philosophy}. Shown in the figure is a solid line,
-the regression line $\mu$, which in this display has slope $0.5$
-and $y$-intercept 2.5, that is, $\mu(x)=2.5+0.5x$. The intuition
-is that for each given value of $x$, we observe a random value of
-$Y$ which is normally distributed with a mean equal to the height
-of the regression line at that $x$ value. Normal densities are superimposed
-on the plot to drive this point home; in principle, the densities
-stand outside of the page, perpendicular to the plane of the paper.
-The figure shows three such values of $x$, namely, $x=1$, $x=2.5$,
-and $x=4$. Not only do we assume that the observations at the three
-locations are independent, but we also assume that their distributions
-have the same spread. In mathematical terms this means that the normal
-densities all along the line have identical standard deviations --
-there is no {}``fanning out'' or {}``scrunching in'' of the normal
-densities as $x$ increases%
+the regression line\index{regression line} $\mu$, which in this
+display has slope $0.5$ and $y$-intercept 2.5, that is, $\mu(x)=2.5+0.5x$.
+The intuition is that for each given value of $x$, we observe a random
+value of $Y$ which is normally distributed with a mean equal to the
+height of the regression line at that $x$ value. Normal densities
+are superimposed on the plot to drive this point home; in principle,
+the densities stand outside of the page, perpendicular to the plane
+of the paper. The figure shows three such values of $x$, namely,
+$x=1$, $x=2.5$, and $x=4$. Not only do we assume that the observations
+at the three locations are independent, but we also assume that their
+distributions have the same spread. In mathematical terms this means
+that the normal densities all along the line have identical standard
+deviations -- there is no {}``fanning out'' or {}``scrunching in''
+of the normal densities as $x$ increases%
 \footnote{In practical terms, this constant variance assumption is often violated,
 in that we often observe scatterplots that fan out from the line as
 $x$ gets large or small. We say under those circumstances that the
@@ -12235,9 +12227,8 @@
 \end{figure}
 \end{quotation}
 \begin{example}
-\label{exa:Speed-and-Stopping}Speed and Stopping Distance of Cars 
-
-We will use the data frame \inputencoding{latin9}\lstinline[showstringspaces=false]!cars!\inputencoding{utf8}
+\label{exa:Speed-and-Stopping}\textbf{Speed and stopping distance
+of cars.} We will use the data frame \inputencoding{latin9}\lstinline[showstringspaces=false]!cars!\inputencoding{utf8}\index{Data sets!cars@\texttt{cars}}
 from the \inputencoding{latin9}\lstinline[showstringspaces=false]!datasets!\inputencoding{utf8}
 package. It has two variables: \inputencoding{latin9}\lstinline[showstringspaces=false]!speed!\inputencoding{utf8}
 and \inputencoding{latin9}\lstinline[showstringspaces=false]!dist!\inputencoding{utf8}.
@@ -12267,10 +12258,19 @@
 \end{figure}
 
 \end{quotation}
-You can see the output in Figure \ref{fig:Scatter-cars}. 
+You can see the output in Figure \ref{fig:Scatter-cars}, which was
+produced by the following code.
 
 \end{example}
+<<eval = FALSE>>=
+plot(dist ~ speed, data = cars)
+@
 
+There is a pronounced upward trend to the data points, and the pattern
+looks approximately linear. There does not appear to be substantial
+fanning out of the points or extreme values. 
+
+
 \section{Estimation\label{sec:SLR-Estimation}}
 
 
@@ -12278,27 +12278,27 @@
 
 Where is $\mu(x)$? In essence, we would like to {}``fit'' a line
 to the points. But how do we determine a {}``good'' line? Is there
-a \emph{best} line? We will use maximum likelihood to find it. We
-know:\begin{equation}
+a \emph{best} line? We will use maximum likelihood\index{maximum likelihood}
+to find it. We know:\begin{equation}
 Y_{i}=\beta_{0}+\beta_{1}x_{i}+\epsilon_{i},\quad i=1,\ldots,n,\end{equation}
 where the $\epsilon_{i}$'s are i.i.d.~$\mathsf{norm}(\mathtt{mean}=0,\,\mathtt{sd}=\sigma)$.
 Thus $Y_{i}\sim\mathsf{norm}(\mathtt{mean}=\beta_{0}+\beta_{1}x_{i},\,\mathtt{sd}=\sigma),\ i=1,\ldots,n$.
 Furthermore, $Y_{1},\ldots,Y_{n}$ are independent -- but not identically
-distributed. The likelihood function is:\begin{alignat}{1}
+distributed. The likelihood function\index{likelihood function} is:\begin{alignat}{1}
 L(\beta_{0},\beta_{1},\sigma)= & \prod_{i=1}^{n}f_{Y_{i}}(y_{i}),\\
 = & \prod_{i=1}^{n}(2\pi\sigma^{2})^{-1/2}\exp\left\{ \frac{-(y_{i}-\beta_{0}-\beta_{1}x_{i})^{2}}{2\sigma^{2}}\right\} ,\\
 = & (2\pi\sigma^{2})^{-n/2}\exp\left\{ \frac{-\sum_{i=1}^{n}(y_{i}-\beta_{0}-\beta_{1}x_{i})^{2}}{2\sigma^{2}}\right\} .\end{alignat}
 We take the natural logarithm to get\begin{equation}
 \ln L(\beta_{0},\beta_{1},\sigma)=-\frac{n}{2}\ln(2\pi\sigma^{2})-\frac{\sum_{i=1}^{n}(y_{i}-\beta_{0}-\beta_{1}x_{i})^{2}}{2\sigma^{2}}.\label{eq:regML-lnL}\end{equation}
  We would like to maximize this function of $\beta_{0}$ and $\beta_{1}$.
-See Appendix BLANK, which tells us that we should find critical points
-by means of the partial derivatives. Let us start by differentiating
-with respect to $\beta_{0}$: \begin{equation}
+See Appendix \ref{sec:Multivariable-Calculus} which tells us that
+we should find critical points by means of the partial derivatives.
+Let us start by differentiating with respect to $\beta_{0}$: \begin{equation}
 \frac{\partial}{\partial\beta_{0}}\ln L=0-\frac{1}{2\sigma^{2}}\sum_{i=1}^{n}2(y_{i}-\beta_{0}-\beta_{1}x_{i})(-1),\end{equation}
 and the partial derivative equals zero when $\sum_{i=1}^{n}(y_{i}-\beta_{0}-\beta_{1}x_{i})=0$,
 that is, when\begin{equation}
 n\beta_{0}+\beta_{1}\sum_{i=1}^{n}x_{i}=\sum_{i=1}^{n}y_{i}.\label{eq:regML-a}\end{equation}
-Moving on, we next take the partial derivative of $\ln L$ (equation
+Moving on, we next take the partial derivative of $\ln L$ (Equation
 \ref{eq:regML-lnL}) with respect to $\beta_{1}$ to get \begin{alignat}{1}
 \frac{\partial}{\partial\beta_{1}}\ln L=\  & 0-\frac{1}{2\sigma^{2}}\sum_{i=1}^{n}2(y_{i}-\beta_{0}-\beta_{1}x_{i})(-x_{i}),\\
 = & \frac{1}{\sigma^{2}}\sum_{i=1}^{n}\left(x_{i}y_{i}-\beta_{0}x_{i}-\beta_{1}x_{i}^{2}\right),\end{alignat}
@@ -12310,7 +12310,7 @@
 n\beta_{0}+\beta_{1}\sum_{i=1}^{n}x_{i} & = & \sum_{i=1}^{n}y_{i}\\
 \beta_{0}\sum_{i=1}^{n}x_{i}+\beta_{1}\sum_{i=1}^{n}x_{i}^{2} & = & \sum_{i=1}^{n}x_{i}y_{i}\end{eqnarray}
 for $\beta_{0}$ and $\beta_{1}$ (in Exercise BLANK) gives \begin{equation}
-\hat{\beta}_{1}=\frac{\sum_{i=1}^{n}x_{i}y_{i}-\left.\left(\sum_{i=1}^{n}x_{i}\right)\left(\sum_{i=1}^{n}y_{i}\right)\right\slash n}{\sum_{i=1}^{n}x_{i}^{2}-\left.\left(\sum_{i=1}^{n}x_{i}\right)^{2}\right\slash n}\end{equation}
+\hat{\beta}_{1}=\frac{\sum_{i=1}^{n}x_{i}y_{i}-\left.\left(\sum_{i=1}^{n}x_{i}\right)\left(\sum_{i=1}^{n}y_{i}\right)\right\slash n}{\sum_{i=1}^{n}x_{i}^{2}-\left.\left(\sum_{i=1}^{n}x_{i}\right)^{2}\right\slash n}\label{eq:regline-slope-formula}\end{equation}
 and\begin{equation}
 \hat{\beta}_{0}=\ybar-\hat{\beta}_{1}\xbar.\end{equation}
 The conclusion? To estimate the mean line \begin{equation}
@@ -12321,10 +12321,10 @@
 For notation we will usually write $b_{0}=\hat{\beta_{0}}$ and $b_{1}=\hat{\beta_{1}}$
 so that $\hat{\mu}(x)=b_{0}+b_{1}x$.
 \begin{rem}
-The formula for $b_{1}$ in Equation BLANK gets the job done, but
-does not really make any sense. There are many equivalent formulas
-for $b_{1}$ that are more intuitive, or at the least are easier to
-remember. One of the author's favorites is\begin{equation}
+The formula for $b_{1}$ in Equation \ref{eq:regline-slope-formula}
+gets the job done but does not really make any sense. There are many
+equivalent formulas for $b_{1}$ that are more intuitive, or at the
+least are easier to remember. One of the author's favorites is\begin{equation}
 b_{1}=r\frac{s_{y}}{s_{x}},\end{equation}
 where $r$, $s_{y}$, and $s_{x}$ are the sample correlation coefficient
 and the sample standard deviations of the $Y$ and $x$ data, respectively.
@@ -12666,7 +12666,7 @@
 
 Our point estimate of $\mu(x_{0})$ is of course $\hat{Y}=\hat{Y}(x_{0})$,
 so for a confidence interval we will need to know $\hat{Y}$'s sampling
-distribution. It turns out (see Section BLANK) that $\hat{Y}=\hat{\mu}(x_{0})$
+distribution. It turns out (see Section ) that $\hat{Y}=\hat{\mu}(x_{0})$
 is distributed\begin{equation}
 \hat{Y}\sim\mathsf{norm}\left(\mathtt{mean}=\mu(x_{0}),\:\mathtt{sd}=\sigma\sqrt{\frac{1}{n}+\frac{(x_{0}-\xbar)^{2}}{\sum_{i=1}^{n}(x_{i}-\xbar)^{2}}}\right).\end{equation}
 Since $\sigma$ is unknown we estimate it with $S$ (we should expect
@@ -12675,7 +12675,7 @@
 
 A $100(1-\alpha)\%$ \emph{confidence interval (CI) for} $\mu(x_{0})$
 is given by\begin{equation}
-\hat{Y}\pm\mathsf{t}_{\alpha/2}(\mathtt{df}=n-2)\, S\sqrt{\frac{1}{n}+\frac{(x_{0}-\xbar^{2})}{\sum_{i=1}^{n}(x_{i}-\xbar)^{2}}}.\end{equation}
+\hat{Y}\pm\mathsf{t}_{\alpha/2}(\mathtt{df}=n-2)\, S\sqrt{\frac{1}{n}+\frac{(x_{0}-\xbar^{2})}{\sum_{i=1}^{n}(x_{i}-\xbar)^{2}}}.\label{eq:SLR-conf-int-formula}\end{equation}
 
 
 It is time for prediction intervals, which are slightly different.
@@ -12685,15 +12685,16 @@
 Of course $\sigma$ is unknown and we estimate it with $S$. Thus,
 a $100(1-\alpha)\%$ prediction interval (PI) for a future value of
 $Y$ at $x_{0}$ is given by \begin{equation}
-\hat{Y}(x_{0})\pm\mathsf{t}_{\alpha/2}(\mathtt{df}=n-1)\: S\,\sqrt{1+\frac{1}{n}+\frac{(x_{0}-\xbar)^{2}}{\sum_{i=1}^{n}(x_{i}-\xbar)^{2}}}.\end{equation}
-We notice that the CI in Equation BLANK is wider than the PI in Equation
-BLANK, just as we expected at the beginning of the section.
+\hat{Y}(x_{0})\pm\mathsf{t}_{\alpha/2}(\mathtt{df}=n-1)\: S\,\sqrt{1+\frac{1}{n}+\frac{(x_{0}-\xbar)^{2}}{\sum_{i=1}^{n}(x_{i}-\xbar)^{2}}}.\label{eq:SLR-pred-int-formula}\end{equation}
+We notice that the prediction interval in Equation \ref{eq:SLR-pred-int-formula}
+is wider than the confidence interval in Equation \ref{eq:SLR-conf-int-formula},
+as we expected at the beginning of the section.
 
 
 \subsection*{How to do it with \textsf{R}}
 
 Confidence and prediction intervals are calculated in \textsf{R} with
-the \inputencoding{latin9}\lstinline[showstringspaces=false]!predict!\inputencoding{utf8}
+the \inputencoding{latin9}\lstinline[showstringspaces=false]!predict!\inputencoding{utf8}\index{predict@\texttt{predict}}
 function, which we encountered in Section BLANK. There we neglected
 to take advantage of its additional \inputencoding{latin9}\lstinline[showstringspaces=false]!interval!\inputencoding{utf8}
 argument. The general syntax follows.
@@ -13037,19 +13038,19 @@
 We can assess the normality of the residuals with graphical methods
 and hypothesis tests. To check graphically whether the residuals are
 normally distributed we may look at histograms or \emph{q}-\emph{q}
-plots. We first examine a histogram in Figure BLANK. There we see
-that the distribution of the residuals appears to be mound shaped,
-for the most part. We can plot the order statistics of the sample
-versus quantiles from a $\mathsf{norm}(\mathtt{mean}=0,\,\mathtt{sd}=1)$
+plots. We first examine a histogram in Figure \ref{fig:Normal-q-q-plot-cars}.
+There we see that the distribution of the residuals appears to be
+mound shaped, for the most part. We can plot the order statistics
+of the sample versus quantiles from a $\mathsf{norm}(\mathtt{mean}=0,\,\mathtt{sd}=1)$
 distribution with the command \inputencoding{latin9}\lstinline[breaklines=true,showstringspaces=false]!plot(cars.lm, which = 2)!\inputencoding{utf8},
-and the results are in Figure BLANK. If the assumption of normality
-were true, then we would expect points randomly scattered about the
-dotted straight line displayed in the figure. In this case, we see
-a slight departure from normality in that the dots show systematic
-clustering on one side or the other of the line. The points on the
-upper end of the plot also appear begin to stray from the line. We
-would say there is some evidence that the residuals are not perfectly
-normal.
+and the results are in Figure \ref{fig:Normal-q-q-plot-cars}. If
+the assumption of normality were true, then we would expect points
+randomly scattered about the dotted straight line displayed in the
+figure. In this case, we see a slight departure from normality in
+that the dots show systematic clustering on one side or the other
+of the line. The points on the upper end of the plot also appear begin
+to stray from the line. We would say there is some evidence that the
+residuals are not perfectly normal.
 
 %
 \begin{figure}
@@ -13135,8 +13136,9 @@
 the second a decrease.
 
 In this case, we plot the standardized residuals versus the fitted
-values. The graph may be seen in Figure BLANK. For these data there
-does appear to be somewhat of a slight fanning-out of the residuals.
+values. The graph may be seen in Figure \ref{fig:std-resids-fitted-cars}.
+For these data there does appear to be somewhat of a slight fanning-out
+of the residuals.
 
 %
 \begin{figure}
@@ -13203,8 +13205,9 @@
 followed by negative residuals, which are then followed by positive
 residuals, \emph{etc}. Consequently, negatively correlated residuals
 are often associated with an alternating pattern in the residual plots.
-We examine the residual plot in Figure BLANK. There is no obvious
-cyclical wave pattern or structure to the residual plot. 
+We examine the residual plot in Figure \ref{fig:resids-fitted-cars}.
+There is no obvious cyclical wave pattern or structure to the residual
+plot. 
 
 %
 \begin{figure}
@@ -13580,7 +13583,7 @@
 The \inputencoding{latin9}\lstinline[showstringspaces=false]!par!\inputencoding{utf8}
 command is used so that $2\times2=4$ plots will be shown on the same
 display. The diagnostic plots for the \inputencoding{latin9}\lstinline[showstringspaces=false]!cars!\inputencoding{utf8}
-data are shown in Figure BLANK:
+data are shown in Figure \ref{fig:Diagnostic-plots-cars}:
 
 %
 \begin{figure}
@@ -13616,9 +13619,9 @@
 @
 
 The graph with the identified points is omitted (but the plain plot
-is shown in the bottom right corner of Figure BLANK). Observations
-1 and 2 fall on the far right side of the plot, near the horizontal
-axis.
+is shown in the bottom right corner of Figure \ref{fig:Diagnostic-plots-cars}).
+Observations 1 and 2 fall on the far right side of the plot, near
+the horizontal axis.
 
 \newpage{}
 
@@ -13650,8 +13653,9 @@
 
 Most of the results are stated without proof or with only a cursory
 justification. Those yearning for more should consult an advanced
-text in linear regression for details, such as Applied Linear Regression
-Models or C. R. Rao.
+text in linear regression for details, such as \emph{Applied Linear
+Regression Models} \cite{Neter1996}or \emph{Linear Models: Least
+Squares and Alternatives} \cite{Rao1999}.
 
 
 \paragraph*{What do I want them to know?}
@@ -13676,15 +13680,16 @@
 1 & x_{12} & x_{22} & \cdots & x_{p2}\\
 \vdots & \vdots & \vdots & \ddots & \vdots\\
 1 & x_{1n} & x_{2n} & \cdots & x_{pn}\end{bmatrix}.\end{equation}
-The vector $\mathbf{Y}$ is called the \emph{response vector} and
-the matrix $\mathbf{X}$ is called the \emph{model matrix}. As in
-Chapter BLANK, the most general assumption that relates $\mathbf{Y}$
-to $\mathbf{X}$ is\begin{equation}
+The vector $\mathbf{Y}$ is called the \emph{response vector\index{response vector}}
+and the matrix $\mathbf{X}$ is called the \emph{model matrix}\index{model matrix}.
+As in Chapter \ref{cha:Simple-Linear-Regression}, the most general
+assumption that relates $\mathbf{Y}$ to $\mathbf{X}$ is\begin{equation}
 \mathbf{Y}=\mu(\mathbf{X})+\upepsilon,\end{equation}
 where $\mu$ is some function (the \emph{signal}) and $\upepsilon$
 is the \emph{noise} (everything else). We usually impose some structure
 on $\mu$ and $\upepsilon$. In particular, the standard multiple
-linear regression model assumes \begin{equation}
+linear regression model\index{model!multiple linear regression} assumes
+\begin{equation}
 \mathbf{Y}=\mathbf{X}\upbeta+\upepsilon,\end{equation}
 where the parameter vector $\upbeta$ looks like \begin{equation}
 \upbeta_{(\mathrm{p}+1)\times1}=\begin{bmatrix}\beta_{0} & \beta_{1} & \cdots & \beta_{p}\end{bmatrix}^{\mathrm{T}},\end{equation}
@@ -13700,7 +13705,7 @@
 Y_{i}=\beta_{0}+\beta_{1}x_{1i}+\beta_{2}x_{2i}+\cdots+\beta_{p}x_{pi}+\epsilon_{i},\quad i=1,2,\ldots,n.\end{equation}
 
 \begin{example}
-\textbf{Girth, Height, and Volume for Black Cherry trees.} Measurements
+\textbf{Girth, Height, and Volume for Black Cherry trees.}\index{Data sets!trees@\texttt{trees}}Measurements
 were made of the girth, height, and volume of timber in 31 felled
 black cherry trees. Note that girth is the diameter of the tree (in
 inches) measured at 4\,ft 6\,in above the ground. The variables
@@ -13725,7 +13730,7 @@
 is made with the \inputencoding{latin9}\lstinline[showstringspaces=false]!splom!\inputencoding{utf8}
 function in the \inputencoding{latin9}\lstinline[showstringspaces=false]!lattice!\inputencoding{utf8}
 package \cite{Sarkarlattice} as shown below. The plot is shown in
-Figure BLANK.
+Figure \ref{fig:splom-trees}.
 
 %
 \begin{figure}
@@ -13791,7 +13796,7 @@
 
 Another way to do it is with the \inputencoding{latin9}\lstinline[showstringspaces=false]!scatterplot3d!\inputencoding{utf8}
 function in the \inputencoding{latin9}\lstinline[showstringspaces=false]!scatterplot3d!\inputencoding{utf8}
-package. The syntax follows, and the result is shown in Figure BLANK.
+package. The code follows, and the result is shown in Figure \ref{fig:3D-scatterplot-trees}.
 
 <<eval = FALSE>>=
 library(scatterplot3d)
@@ -13827,17 +13832,19 @@
 
 \subsection{Parameter estimates}
 
-We will proceed exactly like we did in Section BLANK. We know \begin{equation}
+We will proceed exactly like we did in Section \ref{sec:SLR-Estimation}.
+We know \begin{equation}
 \upepsilon\sim\mathsf{mvnorm}\left(\mathtt{mean}=\mathbf{0}_{\mathrm{n}\times1},\,\mathtt{sigma}=\sigma^{2}\mathbf{I}_{\mathrm{n}\times\mathrm{n}}\right),\end{equation}
 which means that $\mathbf{Y}=\mathbf{X}\upbeta+\upepsilon$ has an
 $\mathsf{mvnorm}\left(\mathtt{mean}=\mathbf{X}\upbeta,\,\mathtt{sigma}=\sigma^{2}\mathbf{I}_{\mathrm{n}\times\mathrm{n}}\right)$
-distribution. Therefore, the likelihood function is\begin{equation}
+distribution. Therefore, the likelihood function\index{likelihood function}
+is\begin{equation}
 L(\upbeta,\sigma)=\frac{1}{2\pi^{n/2}\sigma}\exp\left\{ -\frac{1}{2\sigma^{2}}\left(\mathbf{Y}-\mathbf{X}\upbeta\right)^{\mathrm{T}}\left(\mathbf{Y}-\mathbf{X}\upbeta\right)\right\} .\end{equation}
-To \emph{maximize} the likelihood in $\upbeta$, we need to \emph{minimize}
-the quantity $g(\upbeta)=\left(\mathbf{Y}-\mathbf{X}\upbeta\right)^{\mathrm{T}}\left(\mathbf{Y}-\mathbf{X}\upbeta\right)$.
+To \emph{maximize} the likelihood\index{maximum likelihood} in $\upbeta$,
+we need to \emph{minimize} the quantity $g(\upbeta)=\left(\mathbf{Y}-\mathbf{X}\upbeta\right)^{\mathrm{T}}\left(\mathbf{Y}-\mathbf{X}\upbeta\right)$.
 We do this by differentiating $g$ with respect to $\upbeta$. (It
-may be a good idea to brush up on the material in Appendix BLANK.)
-First we will rewrite $g$:\begin{equation}
+may be a good idea to brush up on the material in Appendices \ref{sec:Linear-Algebra}
+and \ref{sec:Multivariable-Calculus}.) First we will rewrite $g$:\begin{equation}
 g(\upbeta)=\mathbf{Y}^{\mathrm{T}}\mathbf{Y}-\mathbf{Y}^{\mathrm{T}}\mathbf{X}\upbeta-\upbeta^{\mathrm{T}}\mathbf{X}^{\mathrm{T}}\mathbf{Y}+\upbeta^{\mathrm{T}}\mathbf{X}^{\mathrm{T}}\mathbf{X}\upbeta,\end{equation}
 which can be further simplified to $g(\upbeta)=\mathbf{Y}^{\mathrm{T}}\mathbf{Y}-2\upbeta^{\mathrm{T}}\mathbf{X}^{\mathrm{T}}\mathbf{Y}+\upbeta^{\mathrm{T}}\mathbf{X}^{\mathrm{T}}\mathbf{X}\upbeta$
 since $\upbeta^{\mathrm{T}}\mathbf{X}^{\mathrm{T}}\mathbf{Y}$ is
@@ -13846,7 +13853,7 @@
 \frac{\partial g}{\partial\upbeta}=\mathbf{0}-2\mathbf{X}^{\mathrm{T}}\mathbf{Y}+2\mathbf{X}^{\mathrm{T}}\mathbf{X}\upbeta,\end{equation}
 since $\mathbf{X}^{\mathrm{T}}\mathbf{X}$ is symmetric. Setting the
 derivative equal to the zero vector yields the so called {}``normal
-equations''\begin{equation}
+equations''\index{normal equations}\begin{equation}
 \mathbf{X}^{\mathrm{T}}\mathbf{X}\upbeta=\mathbf{X}^{\mathrm{T}}\mathbf{Y}.\end{equation}
 In the case that $\mathbf{X}^{\mathrm{T}}\mathbf{X}$ is invertible%
 \footnote{We can find solutions of the normal equations even when $\mathbf{X}^{\mathrm{T}}\mathbf{X}$
@@ -13855,25 +13862,26 @@
 (CR.Rao)%
 }, we may solve the equation for $\upbeta$ to get the maximum likelihood
 estimator of $\upbeta$ which we denote by $\mathbf{b}$:\begin{equation}
-\mathbf{b}=\left(\mathbf{X}^{\mathrm{T}}\mathbf{X}\right)^{-1}\mathbf{X}^{\mathrm{T}}\mathbf{Y}.\end{equation}
+\mathbf{b}=\left(\mathbf{X}^{\mathrm{T}}\mathbf{X}\right)^{-1}\mathbf{X}^{\mathrm{T}}\mathbf{Y}.\label{eq:b-formula-matrix}\end{equation}
 
 \begin{rem}
-The formula in Equation BLANK is convenient for mathematical study
-but is inconvenient for numerical computation. Researchers have devised
-much more efficient algorithms for the actual calculation of the parameter
-estimates, and we do not explore them here.
+The formula in Equation \ref{eq:b-formula-matrix} is convenient for
+mathematical study but is inconvenient for numerical computation.
+Researchers have devised much more efficient algorithms for the actual
+calculation of the parameter estimates, and we do not explore them
+here.
 \end{rem}
 
 \begin{rem}
 We have only found a critical value, and have not actually shown that
 the critical value is a minimum. We omit the details and refer the
-interested reader to BLANK.
+interested reader to \cite{Rao1999}.
 \end{rem}
 
 \subsection{How to do it with \textsf{R}}
 
 We do all of the above just as we would in simple linear regression.
-The powerhouse is the \inputencoding{latin9}\lstinline[showstringspaces=false]!lm!\inputencoding{utf8}
+The powerhouse is the \inputencoding{latin9}\lstinline[showstringspaces=false]!lm!\inputencoding{utf8}\index{lm@\texttt{lm}}
 function. Everything else is based on it. We separate explanatory
 variables in the model formula by a plus sign.
 
@@ -13888,7 +13896,7 @@
 given by \begin{alignat}{1}
 \hat{\mu}(x_{1},x_{2})= & \ b_{0}+b_{1}x_{1}+b_{2}x_{2},\\
 \approx & -58.0+4.7x_{1}+0.3x_{2}.\end{alignat}
-We could see the entire model matrix $\mathbf{X}$ with the \inputencoding{latin9}\lstinline[showstringspaces=false]!model.matrix!\inputencoding{utf8}
+We could see the entire model matrix $\mathbf{X}$ with the \inputencoding{latin9}\lstinline[showstringspaces=false]!model.matrix!\inputencoding{utf8}\index{model.matrix@\texttt{model.matrix}}
 function, but in the interest of brevity we only show the first few
 rows.
 
@@ -13900,20 +13908,21 @@
 \subsection{Point Estimates of the Regression Surface}
 
 The parameter estimates $\mathbf{b}$ make it easy to find the fitted
-values, $\hat{\mathbf{Y}}$. We write them individually as $\hat{Y}_{i}$,
-$i=1,2,\ldots,n$, and recall that they are defined by\begin{eqnarray}
+values\index{fitted values}, $\hat{\mathbf{Y}}$. We write them individually
+as $\hat{Y}_{i}$, $i=1,2,\ldots,n$, and recall that they are defined
+by\begin{eqnarray}
[TRUNCATED]

To get the complete diff run:
    svnlook diff /svnroot/ipsur -r 126