[IPSUR-commits] r106 - pkg/IPSUR/inst/doc
noreply at r-forge.r-project.org
noreply at r-forge.r-project.org
Sun Dec 27 14:47:52 CET 2009
Author: gkerns
Date: 2009-12-27 14:47:51 +0100 (Sun, 27 Dec 2009)
New Revision: 106
Modified:
pkg/IPSUR/inst/doc/IPSUR.Rnw
Log:
too many
Modified: pkg/IPSUR/inst/doc/IPSUR.Rnw
===================================================================
--- pkg/IPSUR/inst/doc/IPSUR.Rnw 2009-12-27 04:27:34 UTC (rev 105)
+++ pkg/IPSUR/inst/doc/IPSUR.Rnw 2009-12-27 13:47:51 UTC (rev 106)
@@ -199,7 +199,7 @@
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%% %%%
%%% IPSUR.Rnw - Introduction to Probability and Statistics Using R %%%
-%%% Copyright (C) 2009 G. Jay Kerns, <gkerns at ysu.edu> %%%
+%%% Copyright (C) 2010 G. Jay Kerns, <gkerns at ysu.edu> %%%
%%% This program is free software: you can redistribute it and/or %%%
%%% modify it under the terms of the GNU General Public License as %%%
%%% published by the Free Software Foundation, either version 3 %%%
@@ -217,7 +217,7 @@
<<echo = FALSE>>=
### IPSUR.R - Introduction to Probability and Statistics Using R
-### Copyright (C) 2009 G. Jay Kerns, <gkerns at ysu.edu>
+### Copyright (C) 2010 G. Jay Kerns, <gkerns at ysu.edu>
### This program is free software: you can redistribute it and/or modify
### it under the terms of the GNU General Public License as published by
### the Free Software Foundation, either version 3 of the License, or
@@ -395,6 +395,8 @@
\noindent Copyright \textcopyright~2010 G.~Jay Kerns
+\noindent ISBN: 978-0-557-24979-4
+
\medskip{}
@@ -423,86 +425,89 @@
at Bowling Green State University. The course for which the materials
were written is 50-50 Probability and Statistics, and the attendees
include mathematics, engineering, and computer science majors (among
-others). The prerequisites for the course are a full year of Calculus.
+others). The catalog prerequisites for the course are a full year
+of calculus.
+The book can be subdivided into three basic parts. The first part
+includes the introductions and elementary \emph{descriptive statistics};
+I want the students to be knee-deep in data right out of the gate.
+The second part is the study of \emph{probability}, which begins at
+the basics of sets and the equally likely model, journeys past discrete
+and continuous random variables, and continues through to multivariate
+distributions. The chapter on Sampling Distributions paves the way
+to the third part, which is \emph{inferential statistics}. This last
+part includes point and interval estimation, hypothesis testing, and
+finishes with introductions to selected topics in applied statistics.
+
+I normally only have time in one semester to cover a small subset
+of this book. I typically cover the material in Chapter 2 in one class
+period that is supplemented by a take-home assignment for the students.
+I spend a lot of time on Data Description, Probability, Discrete,
+and Continuous Distributions. I mention selected facts from Multivariate
+Distributions in passing, and discuss the meaty parts of Sampling
+Distributions before moving right along to Estimation (which is another
+one I dwell on considerably). Hypothesis Testing goes faster after
+all of the previous work, and by that time the end of the semester
+is in sight. I normally choose one or two final chapters (sometimes
+three) from the remaining to survey, and regret at the end that I
+did not have the chance to cover more.
+
In an attempt to be correct I have included material in this book
which I would normally not mention during the course of a standard
lecture. For instance, I normally do not highlight the intricacies
of measure theory or absolute integrability when speaking to the class.
-Moreover, I typically stray from the matrix approach to multiple linear
+Moreover, I often stray from the matrix approach to multiple linear
regression bacause many of my students have not yet been formally
-trained in linear algebra. It is important, however, in my mind for
-the students to hold something in their hands which acknowledges the
-world of mathematics and statistics beyond, and which may be useful
-to them for many semesters to come.
+trained in linear algebra. That being said, it is important to me
+for the students to hold something in their hands which acknowledges
+the world of mathematics and statistics beyond the classroom, and
+which may be useful to them for many semesters to come.
-I normally only have time in one semester to cover a small subset
-of this book. I typically cover the material in Chapter 2 in one class
-period with a take-home assignment for the students. I spend a lot
-of time on Data Description, Probability, Discrete and Continuous
-Distributions. I mention selected facts from Multivariate Distributions
-in passing, and discuss the meaty parts of Sampling Distributions
-before moving right along to Estimation (which is another one I dwell
-on considerably). Hypothesis Testing goes faster after all of the
-previous work, and by that time the end of the semester is in sight.
-I normally choose one or two final chapters (sometimes three) from
-the remaining to survey, and regret in retrospect that I did not have
-the chance to cover more.
+This document's future goal is to be a more or less self contained,
+essentially complete, correct, textbook. There should be plenty of
+exercises for the student, and the problems should have full solutions
+for some, and no solutions for others (so that the instructor may
+assign them for grading). By \inputencoding{latin9}\lstinline[showstringspaces=false]!Sweave!\inputencoding{utf8}'s
+dynamic nature it is possible to write randomly generated exercises
+and I had planned to implement this idea already throughout the book.
+Alas, there are only 24 hours in a day. Look for more in the Second
+Edition.
-This document's goal is to be a more or less self contained, essentially
-complete, correct, textbook. There should be plenty of exercises for
-the student, and the problems should have full solutions for some,
-and no solutions for others (so that the instructor may assign them
-for grading). I have constructed this book to have many randomly generated
-exercises. The numbers change, but the concept of the problem remains
-the same.
+Seasoned readers will be able to detect my statistical origins: \emph{Probability
+and Statistical Inference} by Hogg and Tanis, \emph{Statistical Inference}
+by Casella and Berger, and \emph{Theory of Point Estimation/Testing
+Statistical Hypotheses} by Lehmann. I highly recommend each of those
+books to every reader of this one.
-This book was inspired by
-\begin{itemize}
-\item Categorical Data Analysis, Agresti ()
-\item Forecasting, Time Series, and Regression, 4th Ed., Bowerman, O'Connell,
-and Koehler (Duxbury)
-\item Mathematical Statistics, Vol. I, 2nd Ed., Bickel and Doksum (Prentice
-Hall)
-\item Probability and Statistical Inference, 5th Ed., Hogg and Tanis, (Prentice
-Hall)
-\item Applied Linear Regression Models, 3rd Ed., Neter, Kutner, Nachtsheim,
-and Wasserman (Irwin)
-\item Statistical Inference, 1st Ed, Casella and Berger (Duxbury)
-\item Monte Carlo Statistical Methods, 1st Ed., Robert and Casella (Springer)
-\item Introduction to Statistical Thought
-\item Using \textsf{R} for Introductory Statistics
-\item Introductory Statistics with \textsf{R}
-\item Data Analysis and Graphics using \textsf{R}
-\end{itemize}
Please bear in mind that the title of this book is {}``Introduction
to Probability and Statistics Using \textsf{R}'', and not {}``Introduction
to \textsf{R} Using Probability and Statistics'', nor even {}``Introduction
-to Probability and Statistics and \textsf{R} Using Words''. The goal
-is probability and statistics; the tool is \textsf{R}. There are consequently
+to Probability and Statistics and \textsf{R} Using Words''. The goals
+are probability and statistics; the tool is \textsf{R}. There are
several important topics about \textsf{R} which some individuals will
-feel are underdeveloped, glossed over, or omitted unnecessarily. Some
-will feel the same way about the probabilistic and/or statistical
-content. Still others will just want to learn \textsf{R} and skip
-all of the mathematics.
+feel are underdeveloped, glossed over, or wantonly omitted. Some will
+feel the same way about the probabilistic and/or statistical content.
+Still others will just want to learn \textsf{R} and skip all of the
+mathematics.
Despite any misgivings: here it is. I humbly invite said individuals
to take this book, with the GNU-FDL in hand, and make it better. In
-that spirit there are many ways in which this book could be improved:
+that spirit there are at least a few ways in which this book could
+be improved in my view.
\begin{description}
\item [{Better~data:}] the data analyzed in this book are almost entirely
from the \inputencoding{latin9}\lstinline[showstringspaces=false]!datasets!\inputencoding{utf8}
-package in base \textsf{R}. There are at least three reasons for this:
+package in base \textsf{R}. Here is why:
\begin{enumerate}
\item I made a conscious effort to minimize dependence on contributed packages,
-\item The data are instantly available, already in the correct format, and
-we do not need to waste time managing them, and
+\item The data are instantly available, already in the correct format, so
+we do not need time to manage them, and
\item The data are \emph{real}.
\end{enumerate}
I made no attempt to choose data sets that would be interesting to
the students; rather, data were chosen for their potential to convey
-a statistical point. Many of the datasets are decades old, or more
+a statistical point. Many of the datasets are decades old or more
(for instance, the data used to introduce simple linear regression
are the speeds and stopping distances of cars in the 1920's).
@@ -511,17 +516,18 @@
in \emph{every} example. One day I hope to stumble over that time.
In the meantime, I will add new data sets incrementally as time permits.
-\item [{More~proofs:}] for the sake of completeness. Many proofs have
-been skipped. There is no rhyme or reason to the current omissions.
-I will add more proofs as time permits.
+\item [{More~proofs:}] for the sake of completeness (and I understand
+that some people would not consider more proofs to be \emph{improvement}).
+Many proofs have been skipped entirely, and there is no rhyme or reason
+to the current omissions. I will add more when I get a chance.
\item [{More~and~better~graphics:}] I have not used the \inputencoding{latin9}\lstinline[basicstyle={\ttfamily},breaklines=true,language=R]!ggplot2!\inputencoding{utf8}
package because I do not know how to use it yet. It is on my to-do
list.
-\item [{More~and~better~exercises:}] There are not nearly enough exercises
+\item [{More~and~better~exercises:}] There are only a few exercises
in the first edition. I have not used the \inputencoding{latin9}\lstinline[basicstyle={\ttfamily},breaklines=true,language=R]!exams!\inputencoding{utf8}
-package, but I believe that it is a right way to move forward with
-this book. As I learn more about what the package can do I would like
-to incorporate it into later editions of this book.
+package, but I believe that it is a right way to move forward. As
+I learn more about what the package can do I would like to incorporate
+it into later editions of this book.
\end{description}
\section*{About This Document}
@@ -533,7 +539,7 @@
means to modify the Document. The \emph{Package} is an \textsf{R}
package that houses the Program and the Document. Finally, the \emph{Ancillaries}
are extra materials produced by the Program to supplement use of the
-Document. We briefly describe each of them below.
+Document. We briefly describe each of them in turn.
\subsection*{The Document}
@@ -557,47 +563,45 @@
files for every graph in the Document. These are needed when typesetting
with \LaTeX{}.
\item [{\texttt{IPSUR.pdf}}] is an opaque copy of the Document. This is
-the file that instructors will likely want to distribute to students.
+the file that instructors would likely want to distribute to students.
\item [{\texttt{IPSUR.dvi}}] is another opaque copy of the Document in
a different file format.
\end{description}
\subsection*{The Program}
-The \emph{Program} includes \inputencoding{latin9}\lstinline[basicstyle={\ttfamily}]!IPSUR.lyx!\inputencoding{utf8}
-and its nephew \inputencoding{latin9}\lstinline[basicstyle={\ttfamily}]!IPSUR.Rnw!\inputencoding{utf8};
-the purpose of each is to give instructors a way to quickly customize
-the Document for their particular class of students by means of randomly
-regenerating the Document with brand new data, exercises, student
-and instructor solution manuals, and other ancillaries.
+The \emph{Program} includes \inputencoding{latin9}\lstinline[showstringspaces=false]!IPSUR.lyx!\inputencoding{utf8}
+and its nephew \inputencoding{latin9}\lstinline[showstringspaces=false]!IPSUR.Rnw!\inputencoding{utf8};
+the purpose of each is to give individuals a way to quickly customize
+the Document for their particular purpose(s).
\begin{description}
\item [{\texttt{IPSUR.lyx}}] is the source \LyX{} file for the Program,
released under the GNU General Public License (GNU GPL) Version 3.
This file is opened, modified, and compiled with \LyX{}, a sophisticated
-open-source document processor, and may be used (together with Sweave)
+open-source document processor, and may be used (together with \inputencoding{latin9}\lstinline[showstringspaces=false]!Sweave!\inputencoding{utf8})
to generate a randomized, modified copy of the Document with brand
-new data sets for some of the exercises, and the solution manuals.
-Additionally, \LyX{} can easily activate/deactivate entire blocks
-of the document, \emph{e.g.~}the \textsf{proofs} of the theorems,
-the student \textsf{solutions} to the exercises, or the instructor
-\textsf{answers} to the problems, so that the new author may choose
-which sections (s)he would like to include in the final Document.
-The \inputencoding{latin9}\lstinline[basicstyle={\ttfamily}]!IPSUR.lyx!\inputencoding{utf8}
+new data sets for some of the exercises, and the solution manuals
+(for the Second Edition). Additionally, \LyX{} can easily activate/deactivate
+entire blocks of the document, \emph{e.g.~}the \textsf{proofs} of
+the theorems, the student \textsf{solutions} to the exercises, or
+the instructor \textsf{answers} to the problems, so that the new author
+may choose which sections (s)he would like to include in the final
+Document (again, Second Edition). The \inputencoding{latin9}\lstinline[showstringspaces=false]!IPSUR.lyx!\inputencoding{utf8}
file is all that a person needs (in addition to a properly configured
system -- see Appendix BLANK) to generate/compile/export to all of
the other formats described above and below, which includes the ancillary
-materials \inputencoding{latin9}\lstinline[basicstyle={\ttfamily}]!IPSUR.Rdata!\inputencoding{utf8}
-and \inputencoding{latin9}\lstinline[basicstyle={\ttfamily}]!IPSUR.R!\inputencoding{utf8}.
+materials \inputencoding{latin9}\lstinline[showstringspaces=false]!IPSUR.Rdata!\inputencoding{utf8}
+and \inputencoding{latin9}\lstinline[showstringspaces=false]!IPSUR.R!\inputencoding{utf8}.
\item [{\texttt{IPSUR.Rnw}}] is another form of the source code for the
Program, also released under the GNU GPL Version 3. It was produced
-by exporting \inputencoding{latin9}\lstinline[basicstyle={\ttfamily}]!IPSUR.lyx!\inputencoding{utf8}
-into\textsf{ R}/Sweave format (\inputencoding{latin9}\lstinline[basicstyle={\ttfamily}]!.Rnw!\inputencoding{utf8}).
+by exporting \inputencoding{latin9}\lstinline[showstringspaces=false]!IPSUR.lyx!\inputencoding{utf8}
+into\textsf{ R}/Sweave format (\inputencoding{latin9}\lstinline[showstringspaces=false]!.Rnw!\inputencoding{utf8}).
This file may be processed with Sweave to generate a randomized copy
-of \inputencoding{latin9}\lstinline[basicstyle={\ttfamily}]!IPSUR.tex!\inputencoding{utf8}
+of \inputencoding{latin9}\lstinline[showstringspaces=false]!IPSUR.tex!\inputencoding{utf8}
-- a transparent copy of the Document -- together with the ancillary
-materials \inputencoding{latin9}\lstinline[basicstyle={\ttfamily}]!IPSUR.Rdata!\inputencoding{utf8}
-and \inputencoding{latin9}\lstinline[basicstyle={\ttfamily}]!IPSUR.R!\inputencoding{utf8}.
-Please note, however, that \inputencoding{latin9}\lstinline[basicstyle={\ttfamily}]!IPSUR.Rnw!\inputencoding{utf8}
+materials \inputencoding{latin9}\lstinline[showstringspaces=false]!IPSUR.Rdata!\inputencoding{utf8}
+and \inputencoding{latin9}\lstinline[showstringspaces=false]!IPSUR.R!\inputencoding{utf8}.
+Please note, however, that \inputencoding{latin9}\lstinline[showstringspaces=false]!IPSUR.Rnw!\inputencoding{utf8}
is just a simple text file which does not support many of the extra
features that \LyX{} offers such as WYSIWYM editing, instantly (de)activating
branches of the manuscript, and more.
@@ -605,24 +609,35 @@
\subsection*{The Package}
-There is a contributed package on \inputencoding{latin9}\lstinline[basicstyle={\ttfamily}]!CRAN!\inputencoding{utf8},
-called \inputencoding{latin9}\lstinline[basicstyle={\ttfamily}]!IPSUR!\inputencoding{utf8}.
-The package has two purposes. The first is to house the Document in
-an easy-to-access medium. Indeed, a student can have the Document
-at his/her fingertips with only three commands:
+There is a contributed package on \inputencoding{latin9}\lstinline[showstringspaces=false]!CRAN!\inputencoding{utf8},
+called \inputencoding{latin9}\lstinline[showstringspaces=false]!IPSUR!\inputencoding{utf8}.
+The package affords many advantages, one being that it houses the
+Document in an easy-to-access medium. Indeed, a student can have the
+Document at his/her fingertips with only three commands:
<<eval = FALSE>>=
-install.packages(IPSUR)
+install.packages("IPSUR")
library(IPSUR)
read(IPSUR)
@
-The second purpose goes hand in hand with the Document's license;
-since \IPSUR\ is free, the source code must be freely available to
-anyone that wants it. Hosting the package on \inputencoding{latin9}\lstinline[basicstyle={\ttfamily}]!CRAN!\inputencoding{utf8}
-satisfies this requirement nicely.
+Another advantage goes hand in hand with the Document's license; since
+\IPSUR\ is free, the source code must be freely available to anyone
+that wants it. A package hosted on \inputencoding{latin9}\lstinline[basicstyle={\ttfamily}]!CRAN!\inputencoding{utf8}
+allows me to obey the license by default.
+Yet another advantage is that the excellent facilities at \textsf{R}-Forge
+are building and checking the package daily against patched and development
+versions of the absolute latest pre-release of \textsf{R}. If any
+problems surface then I will know about it within 24 hours.
+And finally, suppose there is some sort of problem. The package structure
+makes it \emph{incredibly} easy for me to distribute bug-fixes and
+corrected typographical errors. As an author I can make my corrections,
+upload them to the repository, and they will be reflected \emph{worldwide}
+within hours. We aren't in Kansas anymore, Dorothy.
+
+
\subsection*{Ancillary Materials}
These are extra materials that accompany \IPSUR.
@@ -679,16 +694,28 @@
\paragraph*{What do I want them to know?}
+The trouble with this chapter is that there is so much to say -- and
+so many people have already said it so much better than I could. When
+I get something I can be happy with I will put it here.
+In the meantime, there is a lot of information already available to
+a person with an Internet connection. I recommend to start at Wikipedia
+(which is not flawless but has the main ideas with links to reputable
+sources).
+
+In my lectures I usually tell stories about Fisher, Galton, Gauss,
+Laplace, Quetelet, the Chevalier de Mere, and others.
+
+
\section{Probability}
-Probability concerns the study of uncertainty. Games of chance have
-been played for millenia.
+Probability is the study of uncertainty. Games of chance have been
+played for millenia.
The common folklore is that probability has been around for years
but did not gain the attention of mathematicians until approximately
1654 when Chevalier de Mere had a problem dividing the payoff to two
-players in a game that must end prematurely.
+players in a game had to end prematurely.
\section{Statistics}
@@ -8848,8 +8875,8 @@
Compare this answer to what we got in Example BLANK.
-To do the continuous case we probably would be wise to resort to the
-computer algebra utilities of \inputencoding{latin9}\lstinline[showstringspaces=false]!Yacas!\inputencoding{utf8}
+To do the continuous case we could use the computer algebra utilities
+of \inputencoding{latin9}\lstinline[showstringspaces=false]!Yacas!\inputencoding{utf8}
and the associated \textsf{R} package \inputencoding{latin9}\lstinline[showstringspaces=false]!Ryacas!\inputencoding{utf8}.
See Section BLANK for another example where the \inputencoding{latin9}\lstinline[showstringspaces=false]!Ryacas!\inputencoding{utf8}
package appears.
@@ -8932,10 +8959,9 @@
however, because the intuition is the same. There is a prior distribution
$\pi(\theta)$, a likelihood $f(x_{1},x_{2},\ldots,x_{n}|\theta)$,
and a posterior distribution $\pi(\theta|x_{1},x_{2},\ldots,x_{n})$.
-Bayes' Rule states that the relationship between the three may be
-conveniently written as\[
+Bayes' Rule states that the relationship between the three is\[
\pi(\theta|x_{1},x_{2},\ldots,x_{n})\propto\pi(\theta)\, f(x_{1},x_{2},\ldots,x_{n}|\theta),\]
-where, of course, the constant of proportionality is $\int\pi(u)\, f(x_{1},x_{2},\ldots,x_{n}|u)\,\diff u$.
+where the constant of proportionality is $\int\pi(u)\, f(x_{1},x_{2},\ldots,x_{n}|u)\,\diff u$.
Any good textbook on Bayesian Statistics will explain these notions
in detail; to the interested reader I recommend Gelman and this other
Bayesian book BLANK.
@@ -8947,7 +8973,7 @@
\subsection{Independent Random Variables\label{sub:Independent-Random-Variables}}
We recall from Chapter BLANK that the events $A$ and $B$ are said
-to be independent if\begin{equation}
+to be independent when\begin{equation}
\P(A\cap B)=\P(A)\P(B).\end{equation}
If it happens that\begin{equation}
\P(X=x,Y=y)=\P(X=x)\P(Y=y),\quad\mbox{for every }x\in S_{X},\ y\in S_{Y},\end{equation}
@@ -8956,10 +8982,10 @@
PMF notation from above, we see that independent discrete random variables
satisfy \begin{equation}
f_{X,Y}(x,y)=f_{X}(x)f_{Y}(y)\quad\mbox{for every }x\in S_{X},\ y\in S_{Y}.\end{equation}
-Now continuing the reasoning, given two continuous random variables
-$X$ and $Y$ with joint PDF $f_{X,Y}$ and respective marginal PDFs
-$f_{X}$ and $f_{Y}$ that are supported on the sets $S_{X}$ and
-$S_{Y}$, if it happens that \begin{equation}
+Continuing the reasoning, given two continuous random variables $X$
+and $Y$ with joint PDF $f_{X,Y}$ and respective marginal PDFs $f_{X}$
+and $f_{Y}$ that are supported on the sets $S_{X}$ and $S_{Y}$,
+if it happens that \begin{equation}
f_{X,Y}(x,y)=f_{X}(x)f_{Y}(y)\quad\mbox{for every }x\in S_{X},\ y\in S_{Y},\end{equation}
then we say that $X$ and $Y$ are independent.
\begin{example}
@@ -8987,8 +9013,8 @@
more important ones.
\begin{prop}
If $X$ and $Y$ are independent, then for any functions $u$ and
-$v$, \[
-\E\left(u(X)v(Y)\right)=\left(\E u(X)\right)\left(\E v(Y)\right).\]
+$v$, \begin{equation}
+\E\left(u(X)v(Y)\right)=\left(\E u(X)\right)\left(\E v(Y)\right).\end{equation}
\end{prop}
\begin{proof}
@@ -9060,9 +9086,8 @@
Thus\begin{eqnarray*}
\E Y^{2} & = & a_{1}^{2}(\sigma_{1}^{2}+\mu_{1}^{2})+a_{2}^{2}(\sigma_{2}^{2}+\mu_{2}^{2})+2a_{1}a_{2}\mu_{1}\mu_{2},\\
& = & a_{1}^{2}\sigma_{1}^{2}+a_{2}^{2}\sigma_{2}^{2}+\left(a_{1}^{2}\mu_{1}^{2}+a_{2}^{2}\mu_{2}^{2}+2a_{1}a_{2}\mu_{1}\mu_{2}\right).\end{eqnarray*}
-But notice that the expression in the parentheses is exactly \[
-\left(a_{1}\mu_{1}+a_{2}\mu_{2}\right)^{2}=\left(\E Y\right)^{2},\]
-and the proof is complete.
+But notice that the expression in the parentheses is exactly $\left(a_{1}\mu_{1}+a_{2}\mu_{2}\right)^{2}=\left(\E Y\right)^{2}$,
+so the proof is complete..
\end{proof}
@@ -9085,7 +9110,7 @@
\end{example}
\begin{example}
-Here is another one, somewhat more complicated than the one above.\begin{multline}
+Here is another one, more complicated than the one above.\begin{multline}
f_{X,Y}(x,y)=(1+\alpha)\lambda^{2}\me^{-\lambda(x+y)}+\alpha(2\lambda)^{2}\me^{-2\lambda(x+y)}-2\alpha\lambda^{2}\left(\me^{-\lambda(2x+y)}+\me^{-\lambda(x+2y)}\right).\end{multline}
It is straightforward and tedious to check that $\iint f=1$. We may
see immediately that $f_{X,Y}(x,y)=f_{X,Y}(y,x)$ for all $(x,y)$,
@@ -9093,13 +9118,15 @@
is said to be an association parameter. This particular example is
one from the Farlie-Gumbel-Morgenstern family of distributions; see
BLANK.
-
-There seems to be a common misconception that exchangeability is somehow
-a weaker condition than independence, but in fact, the two notions
-are incommensurable. One direct connection between the two is made
-clear by DeFinetti's Thereom. See Section BLANK for details.
\end{example}
+\begin{rem}
+If $X$ and $Y$ are i.i.d.~(with common marginal distribution $F$)
+then $X$ and $Y$ are exchangeable because\[
+F_{X,Y}(x,y)=F(x)F(y)=F(y)F(x)=F_{X,Y}(y,x).\]
+
+\end{rem}
+
\section{The Bivariate Normal Distribution\label{sec:The-Bivariate-Normal}}
The bivariate normal PDF is given by the unwieldly formula\begin{multline}
@@ -9155,8 +9182,8 @@
Use package \inputencoding{latin9}\lstinline[basicstyle={\ttfamily}]!mvtnorm!\inputencoding{utf8}
or \inputencoding{latin9}\lstinline[basicstyle={\ttfamily}]!mnormt!\inputencoding{utf8}%
-\footnote{Another way to do this is with the function curve3d in the emdbook
-package. It looks like this:
+\footnote{Another way to do this is with the \texttt{curve3d} function in the
+\texttt{emdbook} package. It looks like this:
\begin{lyxcode}
library(emdbook);~library(mvtnorm)~~~\#~note:~the~order~matters
@@ -9166,13 +9193,14 @@
curve3d(f(x,y),~from~=~c(-3,-3),~to~=~c(3,3),~theta~=~-30,~phi~=~30)
\end{lyxcode}
-The code above is slightly shorter than that using persp and is easier
-to understand. One must be careful, however. If the library calls
-are swapped then the code will not work because both packages emdbook
-and mvtnorm have a function called {}``dmvnorm''; one must load
-them to the search path in the correct order or \textsf{R} will use
-the wrong one (the arguments are named differently and the underlying
-algorithms are different). %
+The code above is slightly shorter than that using \texttt{persp}
+and is easier to understand. One must be careful, however. If the
+\texttt{library} calls are swapped then the code will not work because
+both packages \texttt{emdbook} and \texttt{mvtnorm} have a function
+called {}``\texttt{dmvnorm}''; one must load them to the search
+path in the correct order or \textsf{R} will use the wrong one (the
+arguments are named differently and the underlying algorithms are
+different). %
}
%
@@ -9187,7 +9215,7 @@
@
\par\end{centering}
-\caption{Capture-recapture experiment\label{fig:mvnorm-pdf}}
+\caption{Graph of a bivariate normal PDF\label{fig:mvnorm-pdf}}
\end{figure}
@@ -9275,8 +9303,8 @@
We studied in Section BLANK how to find the PDF of $Y=g(X)$ given
the PDF of $X$. But now we have two random variables $X$ and Y,
with joint PDF $f_{X,Y}$, and we would like to consider the joint
-PDF of two new random variables\[
-U=g(X,Y)\quad\mbox{and}\quad V=h(X,Y),\]
+PDF of two new random variables\begin{equation}
+U=g(X,Y)\quad\mbox{and}\quad V=h(X,Y),\end{equation}
where $g$ and $h$ are two given functions, typically {}``nice''
in the sense of Appendix BLANK.
@@ -9385,7 +9413,7 @@
f_{\mathbf{X}}(\mathbf{x^{\ast}})=f_{\mathbf{X}}(\mathbf{x}),\end{equation}
for any reordering $\mathbf{x^{\ast}}$ of the elements of $\mathbf{x}=(x_{1},x_{2},\ldots,x_{n})$
in the joint support.
-\begin{thm}
+\begin{prop}
Let $X_{1}$, $X_{2}$, \ldots{}, $X_{n}$ be independent with respective
population means $\mu_{1}$, $\mu_{2}$, \ldots{}, $\mu_{n}$ and
standard deviations $\sigma_{1}$, $\sigma_{2}$, \ldots{}, $\sigma_{n}$.
@@ -9394,7 +9422,7 @@
of $Y$ are given by the formulas\begin{equation}
\mu_{Y}=\sum_{i=1}^{n}a_{i}\mu_{i},\quad\sigma_{Y}=\left(\sum_{i=1}^{n}a_{i}^{2}\sigma_{i}^{2}\right)^{1/2}.\end{equation}
-\end{thm}
+\end{prop}
\begin{proof}
The mean is easy:\[
\E Y=\E\left(\sum_{i=1}^{n}a_{i}X_{i}\right)=\sum_{i=1}^{n}a_{i}\E X_{i}=\sum_{i=1}^{n}a_{i}\mu_{i}.\]
@@ -9431,24 +9459,43 @@
of $\mathsf{binom}(\mathtt{size}=1,\,\mathtt{prob}=p)$ random variables
such that $(X_{1},\ldots,X_{k})$ are exchangeable for every $k$.
Then there exists a random variable $\Theta$ with support $[0,1]$
-and PDF $f_{\Theta}(\theta)$ such that\[
-\P(X_{1}=x_{1},\ldots X_{k}=x_{k})=\int_{0}^{1}\theta^{\sum x_{i}}(1-\theta)^{k-\sum x_{i}}\, f_{\Theta}(\theta)\diff\theta,\]
+and PDF $f_{\Theta}(\theta)$ such that\begin{equation}
+\P(X_{1}=x_{1},\ldots,\, X_{k}=x_{k})=\int_{0}^{1}\theta^{\sum x_{i}}(1-\theta)^{k-\sum x_{i}}\, f_{\Theta}(\theta)\,\diff\theta,\end{equation}
for all $x_{i}=0,\,1$, $i=1,\,2,\ldots,k$.
\end{thm}
-The intuitive meaning of de Finetti's theorem
+To get a handle on the intuitive content de Finetti's theorem, imagine
+that we have a \emph{bunch} of coins in our pocket with each having
+its own unique value of $\theta=\P(\mbox{Heads})$. We reach into
+our pocket and select a coin at random according to some probability
+-- say, $f_{\Theta}(\theta)$. We take the randomly selected coin
+and flip it $k$ times.
-If we flip a coin repeatedly then the sequence of Heads and Tails
-is a set of Bernoulli trials, which are independent. Now imagine that
-we have a bunch of coins in our pocket which have potentially different
-values of $\P(\mbox{Heads})$. We reach into our pocket and select
-a coin at random. We take the randomly selected coin flip it $k$
-times. The sequence of Heads and Tails are not independent anymore
-because the outcome of the experiment depends on the coin chosen.
+Think carefully: the conditional probability of observing a sequence
+$X_{1}=x_{1},\ldots,\, X_{k}=x_{k}$, given a specific coin $\theta$
+would just be $\theta^{\sum x_{i}}(1-\theta)^{k-\sum x_{i}}$, because
+the coin flips are an independent sequence of Bernoulli trials. But
+the coin is random, so the Theorem of Total Probability says we can
+get the \emph{unconditional} probability $\P(X_{1}=x_{1},\ldots,\, X_{k}=x_{k})$
+by adding up terms that look like\begin{equation}
+\theta^{\sum x_{i}}(1-\theta)^{k-\sum x_{i}}\, f_{\Theta}(\theta),\end{equation}
+where we sum over all possible coins. The right-hand side of Equation
+BLANK is a sophisticated way to denote this process.
+Of course, the integral's value does not change if we jumble the $x_{i}$'s,
+so $(X_{1},\ldots,X_{k})$ are clearly exchangeable. The power of
+de Finetti's Theorem is that \emph{every} infinite binary exchangeable
+sequence can be written in the above form.
+The connection to subjective probability is: our prior information
+about $\theta$ corresponds to $f_{\Theta}(\theta)$ and the likelihood
+of the sequence $X_{1}=x_{1},\ldots,\, X_{k}=x_{k}$ (conditional
+on $\theta$) corresponds to $\theta^{\sum x_{i}}(1-\theta)^{k-\sum x_{i}}$.
+Compare Equation BLANK to Section BLANK and Section BLANK.
+
+
The multivariate normal distribution immediately generalizes from
the bivariate case. If the matrix $\Sigma$ is nonsingular then the
joint PDF of $\mathbf{X}\sim\mathsf{mvnorm}(\mathtt{mean}=\upmu,\,\mathtt{sigma}=\Sigma)$
@@ -9483,9 +9530,7 @@
\begin{xca}
-Prove that $\mbox{Cov}(X,Y)=\E(XY)-(\E X)(\E Y).$
-
-type here
+Prove that $\mbox{Cov}(X,Y)=\E(XY)-(\E X)(\E Y).$
\end{xca}
@@ -9593,7 +9638,7 @@
And because $X_{1}$, $X_{2}$, \ldots{}, $X_{n}$ are independent,
Proposition BLANK allows us to distribute the expectation among each
term in the product, which is\[
-\E\,\me^{tX_{1}/n}\,\E\me^{tX_{2}/n}\cdots\E\me^{tX_{n}/n}.\]
+\E\me^{tX_{1}/n}\,\E\me^{tX_{2}/n}\cdots\E\me^{tX_{n}/n}.\]
The last step is to recognize that each term in last product above
is exactly $M(t/n)$.
\end{proof}
@@ -9751,12 +9796,12 @@
approaches a $\mathsf{norm}(\mathtt{mean}=0,\,\mathtt{sd}=1)$ distribution
as $n\to\infty$. \end{thm}
\begin{rem}
-Since we suppose that $X_{1}$, $X_{2}$, \ldots{}, $X_{n}$ are
-iid, we already know from Section \ref{sub:Simple-Random-Samples}
-that $\Xbar$ has mean $\mu$ and standard deviation $\sigma/\sqrt{n}$,
-so that $Z$ has mean 0 and standard deviation 1. The beauty of the
-CLT is that it addresses the \emph{shape} of $Z$'s distribution when
-the sample size is large.
+We suppose that $X_{1}$, $X_{2}$, \ldots{}, $X_{n}$ are i.i.d.,
+and we learned in Section \ref{sub:Simple-Random-Samples} that $\Xbar$
+has mean $\mu$ and standard deviation $\sigma/\sqrt{n}$, so we already
+knew that $Z$ has mean 0 and standard deviation 1. The beauty of
+the CLT is that it addresses the \emph{shape} of $Z$'s distribution
+when the sample size is large.
\end{rem}
\begin{rem}
@@ -9765,26 +9810,43 @@
any population that is well-behaved enough to have a finite standard
deviation. In particular, if the population is normally distributed
[TRUNCATED]
To get the complete diff run:
svnlook diff /svnroot/ipsur -r 106
More information about the IPSUR-commits
mailing list