[IPSUR-commits] r135 - pkg/IPSUR/inst/doc

Sat Jan 9 21:15:02 CET 2010

Author: gkerns
Date: 2010-01-09 21:14:59 +0100 (Sat, 09 Jan 2010)
New Revision: 135

Modified:
   pkg/IPSUR/inst/doc/IPSUR.Rnw
Log:
small changes


Modified: pkg/IPSUR/inst/doc/IPSUR.Rnw
===================================================================

--- pkg/IPSUR/inst/doc/IPSUR.Rnw	2010-01-09 19:38:29 UTC (rev 134)
+++ pkg/IPSUR/inst/doc/IPSUR.Rnw	2010-01-09 20:14:59 UTC (rev 135)
@@ -222,7 +222,8 @@
 @
 
 <<echo = FALSE>>=
-set.seed(42)
+seed <- 42
+set.seed(seed)
 options(width = 70)
 #library(random)
 #i_seed <- randomNumbers(n = 624, col = 1, min = -1e+09, max = 1e+09)
@@ -11162,25 +11163,32 @@
 \begin{figure}
 \begin{centering}
 <<echo = FALSE, fig=true, height = 6, width = 6>>=
+set.seed(seed + 1)
 library(TeachingDemos)
 ci.examp()
 @
 \par\end{centering}
 
-\caption{Confidence interval simulation}
-The graph was generated by \texttt{ci.examp()} from the \texttt{TeachingDemos}
-package. Fifty (50) samples of size twenty five (25) were generated
-from a $\mathsf{norm}(\mathtt{mean}=100,\,\mathtt{sd}=10)$ distribution,
+\caption{Simulated confidence intervals\label{fig:ci-examp}}
+
+
+~
+
+{\small The graph was generated by the }\texttt{\small ci.examp}{\small{}
+function from the }\texttt{\small TeachingDemos}{\small{} package.
+Fifty (50) samples of size twenty five (25) were generated from a
+$\mathsf{norm}(\mathtt{mean}=100,\,\mathtt{sd}=10)$ distribution,
 and each sample was used to find a 95\% confidence interval for the
 population mean using Equation \ref{eq:z-interval}. The 50 confidence
 intervals are represented above by horizontal lines, and the respective
 sample means are denoted by vertical slashes. Confidence intervals
 that {}``cover'' the true mean value of 100 are plotted in black;
 those that fail to cover are plotted in a lighter color. In the plot
-we see that two (2) simulated intervals out of the 50 failed to cover
-$\mu=100$, which is a success rate of 96\%. As the number of generated
-samples increased from 50 to 500 to 50000, \ldots{}, we would expect
-our success rate to approach the exact value of 95\%.\label{fig:ci-examp}
+we see that only one (1) of the simulated intervals out of the 50
+failed to cover $\mu=100$, which is a success rate of 98\%. If the
+number of generated samples were to increase from 50 to 500 to 50000,
+\ldots{}, then we would expect our success rate to approach the exact
+value of 95\%.}
 \end{figure}
 
 
@@ -11228,15 +11236,15 @@
 research. We are going to use the one-sample $z$-interval.
 
 <<keep.source = TRUE>>=
-dim(PlantGrowth)         # sample size is first entry
+dim(PlantGrowth)   # sample size is first entry
 with(PlantGrowth, mean(weight))
 qnorm(0.975)
 @
 
 We find the sample mean of the data to be $\xbar=5.073$ and $z_{\alpha/2}=z_{0.025}\approx1.96$.
 Our interval is therefore\[
-\xbar\pm z_{\alpha/2}\frac{\sigma}{\sqrt{n}}=5.073\pm1.96\cdot\frac{0.70}{\sqrt{30}}\]
-which is approximately the interval $[4.823,\,5.323]$. In conclusion,
+\xbar\pm z_{\alpha/2}\frac{\sigma}{\sqrt{n}}=5.073\pm1.96\cdot\frac{0.70}{\sqrt{30}},\]
+which comes out to approximately $[4.823,\,5.323]$. In conclusion,
 we are 95\% confident that the true mean weight $\mu$ of all plants
 of this species lies somewhere between 4.823~g and 5.323~g, that
 is, we are 95\% confident that the interval $[4.823,\,5.323]$ covers
@@ -11245,34 +11253,42 @@
 %
 \begin{figure}
 \begin{centering}
-<<echo = FALSE, fig=true, height = 6, width = 6>>=
+<<echo = FALSE, fig=true, height = 6.5, width = 6.5>>=
 library(TeachingDemos)
 plot(z.test(PlantGrowth$weight, stdev = 0.70), "Conf")
 @
 \par\end{centering}
 
 \caption{Confidence interval plot for the \texttt{PlantGrowth} data\label{fig:plant-z-int-plot}}
-The graph was generated by computing a \texttt{z.test} from the \texttt{TeachingDemos}
-package, storing the resulting \texttt{htest} object, and plotting
-it with \texttt{plot.htest} from the \texttt{IPSUR} package.
+
+
+~
+
+{\small The shaded portion represents 95\% of the total area under
+the curve, and the upper and lower bounds are the limits of the one-sample
+95\% confidence interval. The graph is centered at the observed sample
+mean. It was generated by computing a }\texttt{\small z.test}{\small{}
+from the }\texttt{\small TeachingDemos}{\small{} package, storing the
+resulting }\texttt{\small htest}{\small{} object, and plotting it with
+}\texttt{\small plot.htest}{\small{} from the }\texttt{\small IPSUR}{\small{}
+package. See the remarks in the {}``How to do it with }\textsf{\small R}{\small ''
+discussion later in this section.}
 \end{figure}
 
 
 
 \begin{example}
-yieldPlantGrowth Give some data with $X_{1}$, $X_{2}$, \ldots{},
-$X_{n}$ an $SRS(n)$ from a $\mathsf{norm}(\mathtt{mean}=\mu,\,\mathtt{sd}=\sigma)$
-distribution. Maybe small sample?\end{example}
+Give some data with $X_{1}$, $X_{2}$, \ldots{}, $X_{n}$ an $SRS(n)$
+from a $\mathsf{norm}(\mathtt{mean}=\mu,\,\mathtt{sd}=\sigma)$ distribution.
+Maybe small sample?\end{example}
 \begin{enumerate}
 \item What is the parameter of interest? in the context of the problem.
 Give a point estimate for $\mu$.
 \item What are the assumptions being made in the problem? Do they meet the
 conditions of the interval?
 \item Calculate the interval.
-\item Draw the conclusion.
-\end{enumerate}
-hdf
-
+\item Draw the conclusion.\end{enumerate}
+\begin{rem}
 What if $\sigma$ is unknown? We instead use the interval\begin{equation}
 \Xbar\pm z_{\alpha/2}\frac{S}{\sqrt{n}},\end{equation}
 where $S$ is the sample standard deviation.
@@ -11285,7 +11301,7 @@
 \item If $n$ is small, then 
 
 \begin{itemize}
-\item if the underlying population is normal then we may replace $z_{\alpha/2}$
+\item If the underlying population is normal then we may replace $z_{\alpha/2}$
 with $t_{\alpha/2}(\mathtt{df}=n-1)$. The resulting $100(1-\alpha)\%$
 confidence interval is\begin{equation}
 \Xbar\pm t_{\alpha/2}(\mathtt{df}=n-1)\frac{S}{\sqrt{n}}\label{eq:one-samp-t-int}\end{equation}
@@ -11298,10 +11314,11 @@
 for advice.
 \end{itemize}
 \end{itemize}
-In general, with confidence interval problems it is useful to follow
-a similar procedure. An acronym to summarize the procedure is PANIC:
-\emph{P}arameter, \emph{A}ssumptions, \emph{N}ame, \emph{I}nterval,
-and \emph{C}onclusion.
+\end{rem}
+The author learned of a handy acronym from AP Statistics Exam graders
+that summarizes the important parts of confidence interval estimation,
+which is PANIC: \emph{P}arameter, \emph{A}ssumptions, \emph{N}ame,
+\emph{I}nterval, and \emph{C}onclusion.
 \begin{description}
 \item [{Parameter:}] identify the parameter of interest with the proper
 symbols. Write down what the parameter means in the context of the
@@ -11316,15 +11333,15 @@
 that the results may not be reliable. Write down any underlying formulas
 used.
 \item [{Interval:}] calculate the interval from the sample data. This can
-be done by hand but will more often be done with the aid of the computer.
+be done by hand but will more often be done with the aid of a computer.
 Regardless of the method, all calculations or code should be shown
-to make the entire process repeatable by a subsequent reader.
+so that the entire process is repeatable by a subsequent reader.
 \item [{Conclusion:}] state the final results, using language in the context
 of the problem. Include the appropriate interpretation of the interval,
 making reference to the confidence coefficient.\end{description}
 \begin{rem}
-The intervals above are two-sided, but there are also one-sided intervals
-for $\mu$. They look like \begin{equation}
+All of the above intervals for $\mu$ were two-sided, but there are
+also one-sided intervals for $\mu$. They look like \begin{equation}
 \left[\Xbar-z_{\alpha}\frac{\sigma}{\sqrt{n}},\ \infty\right)\quad\mbox{or}\quad\left(-\infty,\ \Xbar+z_{\alpha}\frac{\sigma}{\sqrt{n}}\right]\end{equation}
 and satisfy\begin{equation}
 \P\left(\Xbar-z_{\alpha}\frac{\sigma}{\sqrt{n}}\leq\mu\right)=1-\alpha\quad\mbox{and}\quad\P\left(\Xbar+z_{\alpha}\frac{\sigma}{\sqrt{n}}\geq\mu\right)=1-\alpha.\end{equation}
@@ -11340,18 +11357,24 @@
 
 \subsection{How to do it with \textsf{R}}
 
-We can do Example \ref{exa:plant-one-samp-z-int} with The 
+We can do Example \ref{exa:plant-one-samp-z-int} with the following
+code.
 
-library(HH)
+<<>>=
+library(TeachingDemos)
+temp <- with(PlantGrowth, z.test(weight, stdev = 0.7))
+temp
+@
 
-normal.and.t.dist(obs.mean = 56.8, std.dev = 2, n = 10, alpha.right
-= 0.025, Use.alpha.left = TRUE, hypoth.or.conf = 'Conf', polygon.density
-= 10 )
+The confidence interval bounds are shown in the sixth line down of
+the output (please disregard all of the additional output information
+for now -- we will use it in Chapter \ref{cha:Hypothesis-Testing}).
+We can make the plot for Figure \ref{fig:plant-z-int-plot} with
 
-normal.and.t.dist(obs.mean = mean(c(37.4, 48.8, 46.9, 55, 44)), std.dev
-= sd(c(37.4, 48.8, 46.9, 55, 44)), n = 5, alpha.right = 0.05, deg.freedom
-= 4, Use.alpha.left = TRUE, hypoth.or.conf = 'Conf', polygon.density
-= 10 ) 
+<<eval = FALSE>>=
+library(IPSUR)
+plot(temp, "Conf")
+@
 
 
 \section{Confidence Intervals for Differences of Means\label{sec:Conf-Interv-for-Diff-Means}}
@@ -11461,8 +11484,8 @@
 (see Exercise \ref{xca:CI-quad-form}):\begin{equation}
 \left.\left[\left(\hat{p}+\frac{z_{\alpha/2}^{2}}{2n}\right)\pm z_{\alpha/2}\sqrt{\frac{\hat{p}(1-\hat{p})}{n}+\frac{z_{\alpha/2}^{2}}{(2n)^{2}}}\right]\right\slash \left(1+\frac{z_{\alpha/2}^{2}}{n}\right)\end{equation}
  This approach is called the \emph{score interval} because it is based
-on the inversion of the {}``Score test''. See Section BLANK. It
-is also known as the \emph{Wilson interval}; see reference BLANK.
+on the inversion of the {}``Score test''. See Chapter \ref{cha:Categorical-Data-Analysis}.
+It is also known as the \emph{Wilson interval}; see Agresti \cite{Agresti2002}.
 \end{enumerate}
 For two proportions $p_{1}$ and $p_{2}$, we may collect independent
 $\mathsf{binom}(\mathtt{size}=1,\,\mathtt{prob}=p)$ samples of size