[Prob-commits] r42 - pkg/vignettes
noreply at r-forge.r-project.org
noreply at r-forge.r-project.org
Sun Oct 27 23:08:18 CET 2013
Author: gkerns
Date: 2013-10-27 23:08:18 +0100 (Sun, 27 Oct 2013)
New Revision: 42
Modified:
pkg/vignettes/prob.Rnw
Log:
fixed the vignette
Modified: pkg/vignettes/prob.Rnw
===================================================================
--- pkg/vignettes/prob.Rnw 2013-10-27 21:27:03 UTC (rev 41)
+++ pkg/vignettes/prob.Rnw 2013-10-27 22:08:18 UTC (rev 42)
@@ -51,101 +51,216 @@
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Preliminaries}
-This document is designed to get a person up and running doing elementary probability in \texttt{R} using the \texttt{prob} package. In addition to \texttt{prob}, you will want to install the \texttt{combinat} package in order to use a couple of functions, but other than that a base installation of \texttt{R} should be more than enough.
+This document is designed to get a person up and running doing
+elementary probability in \texttt{R} using the \texttt{prob} package.
+In addition to \texttt{prob}, you will want to install the
+\texttt{combinat} package in order to use a couple of functions, but
+other than that a base installation of \texttt{R} should be more than
+enough.
-The prerequisites are minimal. The term ``data frame'' should ring a bell, along with some common tricks like using \texttt{1:6} to represent the integers from 1 to 6. That being said, please note that this document is not a treatise on probability. The reader should have a passing familiarity with the basic tenets of probability and some of the topics that are customarily studied to be able to follow along.
+The prerequisites are minimal. The term ``data frame'' should ring a
+bell, along with some common tricks like using \texttt{1:6} to
+represent the integers from 1 to 6. That being said, please note that
+this document is not a treatise on probability. The reader should
+have a passing familiarity with the basic tenets of probability and
+some of the topics that are customarily studied to be able to follow
+along.
-In the interest of space, most of the examples described below are uncharacteristically small and do not fully showcase the true computational power of \texttt{R}. Nevertheless, the goal is for these small examples to serve as a starting point from which users can investigate more complicated problems at their leisure.
+In the interest of space, most of the examples described below are
+uncharacteristically small and do not fully showcase the true
+computational power of \texttt{R}. Nevertheless, the goal is for
+these small examples to serve as a starting point from which users can
+investigate more complicated problems at their leisure.
\section{Sample Spaces}
-The first step is to set up a \emph{sample space}, or set of possible outcomes of an experiment. In the \texttt{prob} package, the most simple form of a sample space is represented by a \emph{data frame}, that is, a rectangular collection of variables. Each row of the data frame corresponds to an outcome of the experiment.
+The first step is to set up a \emph{sample space}, or set of possible
+outcomes of an experiment. In the \texttt{prob} package, the most
+simple form of a sample space is represented by a \emph{data frame},
+that is, a rectangular collection of variables. Each row of the data
+frame corresponds to an outcome of the experiment.
-This is the primary way we will represent a sample space, both due to its simplicity and to maximize compatibility with the R Commander by John Fox. However, this structure is not rich enough to describe some of the more interesting probabilistic applications we will encounter. To handle these we will need to consider the more general \emph{list} data structure. See the last section for some remarks in this direction.
+This is the primary way we will represent a sample space, both due to
+its simplicity and to maximize compatibility with the R Commander by
+John Fox. However, this structure is not rich enough to describe some
+of the more interesting probabilistic applications we will encounter.
+To handle these we will need to consider the more general \emph{list}
+data structure. See the last section for some remarks in this
+direction.
-Note that the machinery doing the work to generate most of the sample spaces below is the \texttt{expand.grid()} function in the \texttt{base} package, also \texttt{combn()} in \texttt{combinat} and new but related \texttt{permsn()}.
+Note that the machinery doing the work to generate most of the sample
+spaces below is the \texttt{expand.grid()} function in the
+\texttt{base} package, also \texttt{combn()} in \texttt{combinat} and
+new but related \texttt{permsn()}.
\subsection{Some Standard Sample Spaces}
-The \texttt{prob} package has some functions to get one started. For example, consider the experiment of tossing a coin. The outcomes are $H$ and $T$. We can set up the sample space quicky with the \texttt{tosscoin()} function:
+The \texttt{prob} package has some functions to get one started. For
+example, consider the experiment of tossing a coin. The outcomes are
+$H$ and $T$. We can set up the sample space quicky with the
+\texttt{tosscoin()} function:
<<echo=TRUE,print=TRUE>>=
tosscoin(1)
@
-The number 1 tells \texttt{tosscoin()} that we only want to toss the coin once. We could toss it three times:
+The number 1 tells \texttt{tosscoin()} that we only want to toss the
+coin once. We could toss it three times:
<<echo=TRUE,print=TRUE>>=
tosscoin(3)
@
%
-As an alternative, we could consider the experiment of rolling a fair die:
+As an alternative, we could consider the experiment of rolling a fair
+die:
%
<<echo=TRUE,print=TRUE>>=
rolldie(1)
@
-The \texttt{rolldie()} function defaults to a 6-sided die, but we can change it with the \texttt{nsides} argument. Typing \texttt{rolldie(3, nsides = 4)} would be for rolling a 4-sided die three times.
+The \texttt{rolldie()} function defaults to a 6-sided die, but we can
+change it with the \texttt{nsides} argument. Typing
+\texttt{rolldie(3, nsides = 4)} would be for rolling a 4-sided die
+three times.
-Perhaps we would like to draw one card from a standard set of playing cards (it is a long data frame):
+Perhaps we would like to draw one card from a standard set of playing
+cards (it is a long data frame):
<<echo=TRUE,print=TRUE>>=
cards()
@
-The \texttt{cards()} function that we just used has arguments \texttt{jokers} (if you would like Jokers to be in the deck) and \texttt{makespace} which we will discuss later.
+The \texttt{cards()} function that we just used has arguments
+\texttt{jokers} (if you would like Jokers to be in the deck) and
+\texttt{makespace} which we will discuss later.
-Additionally, the \texttt{roulette()} function gives the standard sample space for one spin on a roulette wheel. There are EU and USA versions available. I would appreciate hearing about any other game or sample spaces that may be of general interest.
+Additionally, the \texttt{roulette()} function gives the standard
+sample space for one spin on a roulette wheel. There are EU and USA
+versions available. I would appreciate hearing about any other game
+or sample spaces that may be of general interest.
\subsection{Sampling from Urns}
-Perhaps the most fundamental of statistical experiments consists of drawing distinguishable objects from an urn. The \texttt{prob} package addresses this topic with the \texttt{urnsamples(x, size, replace, ordered)} function. The argument \texttt{x} represents the urn from which sampling is to be done. The \texttt{size} argument tells how large the sample will be. The \texttt{ordered} and \texttt{replace} arguments are logical and specify how sampling will be performed. We will discuss each in turn. In the interest of saving space, for this example let our urn simply contain three balls, labeled 1, 2, and 3, respectively. We are going to take a sample of size 2.
+Perhaps the most fundamental of statistical experiments consists of
+drawing distinguishable objects from an urn. The \texttt{prob}
+package addresses this topic with the \texttt{urnsamples(x, size,
+ replace, ordered)} function. The argument \texttt{x} represents the
+urn from which sampling is to be done. The \texttt{size} argument
+tells how large the sample will be. The \texttt{ordered} and
+\texttt{replace} arguments are logical and specify how sampling will
+be performed. We will discuss each in turn. In the interest of saving
+space, for this example let our urn simply contain three balls,
+labeled 1, 2, and 3, respectively. We are going to take a sample of
+size 2.
\subsubsection*{Ordered, With Replacement}
-If sampling is with replacement, then we can get any outcome $1,2,3$ on any draw. Further, by ``ordered'' we mean that we shall keep track of the order of the draws that we observe. We can accomplish this in \texttt{R} with
+If sampling is with replacement, then we can get any outcome $1,2,3$
+on any draw. Further, by ``ordered'' we mean that we shall keep track
+of the order of the draws that we observe. We can accomplish this in
+\texttt{R} with
<<echo=TRUE,print=TRUE>>=
urnsamples(1:3, size = 2, replace = TRUE, ordered = TRUE)
-@
-Notice that rows 2 and 4 are identical, save for the order in which the numbers are shown. Further, note that every possible pair of the numbers 1 through 3 are listed. This experiment is equivalent to rolling a 3-sided die twice, which we could have accomplished with \texttt{rolldie(2, nsides = 3)}.
+@
+Notice that rows 2 and 4 are identical, save for the order in which
+the numbers are shown. Further, note that every possible pair of the
+numbers 1 through 3 are listed. This experiment is equivalent to
+rolling a 3-sided die twice, which we could have accomplished with
+\texttt{rolldie(2, nsides = 3)}.
+
\subsubsection*{Ordered, Without Replacement}
-Here sampling is without replacement, so we may not observe the same number twice in any row. Order is still important, however, so we expect to see the outcomes \texttt{1}, \texttt{2} and \texttt{2}, \texttt{1} somewhere in our data frame as before.
+Here sampling is without replacement, so we may not observe the same
+number twice in any row. Order is still important, however, so we
+expect to see the outcomes \texttt{1}, \texttt{2} and \texttt{2},
+\texttt{1} somewhere in our data frame as before.
<<echo=TRUE,print=TRUE>>=
urnsamples(1:3, size = 2, replace = FALSE, ordered = TRUE)
-@
-This is just as we expected. Notice that there are less rows in this answer, due to the restricted sampling procedure. If the numbers 1, 2, and 3 represented ``Fred'', ``Mary'', and ``Sue'', respectively, then this experiment would be equivalent to selecting two people of the three to serve as president and vice-president of a company, respectively, and the sample space lists all possible ways that this could be done.
+@
+This is just as we expected. Notice that there are less rows in
+this answer, due to the restricted sampling procedure. If the numbers
+1, 2, and 3 represented ``Fred'', ``Mary'', and ``Sue'', respectively,
+then this experiment would be equivalent to selecting two people of
+the three to serve as president and vice-president of a company,
+respectively, and the sample space lists all possible ways that this
+could be done.
\subsubsection*{Unordered, Without Replacement}
-Again, we may not observe the same outcome twice, but in this case, we will only keep those outcomes which (when jumbled) would not duplicate earlier ones.
+Again, we may not observe the same outcome twice, but in this case, we
+will only keep those outcomes which (when jumbled) would not duplicate
+earlier ones.
<<echo=TRUE,print=TRUE>>=
urnsamples(1:3, size = 2, replace = FALSE, ordered = FALSE)
@
-This experiment is equivalent to reaching in the urn, picking a pair, and looking to see what they are. This is the default setting of \texttt{urnsamples()}, so we would have received the same output by simply typing \texttt{urnsamples(1:3,2)}.
+This experiment is equivalent to reaching in the urn, picking a pair,
+and looking to see what they are. This is the default setting of
+\texttt{urnsamples()}, so we would have received the same output by
+simply typing \texttt{urnsamples(1:3,2)}.
+
\subsubsection*{Unordered, With Replacement}
-The last possibility is perhaps the most interesting. We replace the balls after every draw, but we do not remember the order in which the draws come.
+The last possibility is perhaps the most interesting. We replace the
+balls after every draw, but we do not remember the order in which the
+draws come.
<<echo=TRUE,print=TRUE>>=
urnsamples(1:3, size = 2, replace = TRUE, ordered = FALSE)
@
-We may interpret this experiment in a number of alternative ways. One way is to consider this as simply putting two 3-sided dice in a cup, shaking the cup, and looking inside as in a game of Liar's Dice, for instance. Each row of the sample space is a potential pair we could observe. Another equivalent view is to consider each outcome a separate way to distribute two identical golf balls into three boxes labeled 1, 2, and 3. Regardless of the interpretation, \texttt{urnsamples()} lists every possible way that the experiment can conclude.
-Note that the urn does not need to contain numbers; we could have just as easily taken our urn to be \texttt{x = c("Red", "Blue", "Green")}. But, there is an \textbf{important} point to mention before proceeding. Astute readers will notice that in our example, the balls in the urn were \textit{distinguishable} in the sense that each had a unique label to distinguish it from the others in the urn. A natural question would be, ``What happens if your urn has indistinguishable elements, for example, what if \texttt{x = c("Red", "Red", "Blue")}?'' The answer is that \texttt{urnsamples()} behaves as if each ball in the urn is distinguishable, regardless of its actual contents. We may thus imagine that while there are two red balls in the urn, the balls are such that we can tell them apart (in principle) by looking closely enough at the imperfections on their surface.
+We may interpret this experiment in a number of alternative ways. One
+way is to consider this as simply putting two 3-sided dice in a cup,
+shaking the cup, and looking inside as in a game of Liar's Dice, for
+instance. Each row of the sample space is a potential pair we could
+observe. Another equivalent view is to consider each outcome a
+separate way to distribute two identical golf balls into three boxes
+labeled 1, 2, and 3. Regardless of the interpretation,
+\texttt{urnsamples()} lists every possible way that the experiment can
+conclude.
-In this way, when the \texttt{x} argument of \texttt{urnsamples()} has repeated elements, the resulting sample space may appear to be \texttt{ordered = TRUE} even when, in fact, the call to the function was \texttt{urnsamples(..., ordered = FALSE)}. Similar remarks apply for the \texttt{replace} argument. We investigate this issue further in the last section.
+Note that the urn does not need to contain numbers; we could have just
+as easily taken our urn to be \texttt{x <- c("Red", "Blue", "Green")}.
+But, there is an \textbf{important} point to mention before
+proceeding. Astute readers will notice that in our example, the balls
+in the urn were \textit{distinguishable} in the sense that each had a
+unique label to distinguish it from the others in the urn. A natural
+question would be, ``What happens if your urn has indistinguishable
+elements, for example, what if \texttt{x <- c("Red", "Red", "Blue")}?''
+The answer is that \texttt{urnsamples()} behaves as if each ball in
+the urn is distinguishable, regardless of its actual contents. We may
+thus imagine that while there are two red balls in the urn, the balls
+are such that we can tell them apart (in principle) by looking closely
+enough at the imperfections on their surface.
+In this way, when the \texttt{x} argument of \texttt{urnsamples()} has
+repeated elements, the resulting sample space may appear to be
+\texttt{ordered = TRUE} even when, in fact, the call to the function
+was \texttt{urnsamples(..., ordered = FALSE)}. Similar remarks apply
+for the \texttt{replace} argument. We investigate this issue further
+in the last section.
-
-
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Counting Tools}
-The sample spaces we have seen so far have been relatively small, and we can visually study them without much trouble. However, it is VERY easy to generate sample spaces that are prohibitively large. And while \texttt{R} is wonderful and powerful and does almost everything except wash windows, even \texttt{R} has limits of which we should be mindful.
+The sample spaces we have seen so far have been relatively small, and
+we can visually study them without much trouble. However, it is VERY
+easy to generate sample spaces that are prohibitively large. And
+while \texttt{R} is wonderful and powerful and does almost everything
+except wash windows, even \texttt{R} has limits of which we should be
+mindful.
-In many cases, we do not need to actually generate the sample spaces of interest; it suffices merely to count the number of outcomes. The \texttt{nsamp()} function will calculate the number of rows in a sample space made by \texttt{urnsamples()}, without actually devoting the memory resources necessary to generate the space. The arguments are: \texttt{n}, the number of (distinguishable) objects in the urn, \texttt{k}, the sample size, and \texttt{replace}, \texttt{ordered} as above.
+In many cases, we do not need to actually generate the sample spaces
+of interest; it suffices merely to count the number of outcomes. The
+\texttt{nsamp()} function will calculate the number of rows in a
+sample space made by \texttt{urnsamples()}, without actually devoting
+the memory resources necessary to generate the space. The arguments
+are: \texttt{n}, the number of (distinguishable) objects in the urn,
+\texttt{k}, the sample size, and \texttt{replace}, \texttt{ordered} as
+above.
-In a probability course, one derives the formulas used in the respective scenarios. For our purposes, it is sufficient to merely list them in the following table. Note that $x!=x(x-1)(x-2)\cdots3\cdot2\cdot1$ and ${n \choose k}=n!/[k!(n-k)!]$.
+In a probability course, one derives the formulas used in the
+respective scenarios. For our purposes, it is sufficient to merely
+list them in the following table. Note that
+$x!=x(x-1)(x-2)\cdots3\cdot2\cdot1$ and ${n \choose k}=n!/[k!(n-k)!]$.
\begin{center}
\textbf{Values of \texttt{nsamp(n, k, replace, ordered)}}
@@ -165,128 +280,272 @@
\subsubsection*{Examples}
-We will compute the number of outcomes for each of the four \texttt{urnsamples()} examples that we saw in the last section. Recall that we took a sample of size two from an urn with three distinguishable elements.
+We will compute the number of outcomes for each of the four
+\texttt{urnsamples()} examples that we saw in the last section.
+Recall that we took a sample of size two from an urn with three
+distinguishable elements.
<<echo=TRUE,print=TRUE>>=
nsamp(n=3, k=2, replace = TRUE, ordered = TRUE)
nsamp(n=3, k=2, replace = FALSE, ordered = TRUE)
nsamp(n=3, k=2, replace = FALSE, ordered = FALSE)
nsamp(n=3, k=2, replace = TRUE, ordered = FALSE)
@
-Compare these answers with the length of the data frames generated above.
+Compare these answers with the length of the data frames generated
+above.
+
\subsection{The Multiplication Principle}
-A benefit of \texttt{nsamp()} is that it is \emph{vectorized}, so that entering vectors instead of numbers for \texttt{n}, \texttt{k}, \texttt{replace}, and \texttt{ordered} results in a vector of corresponding answers. This becomes particularly convenient when trying to demonstrate the Multiplication Principle for solving combinatorics problems.
+A benefit of \texttt{nsamp()} is that it is \emph{vectorized}, so that
+entering vectors instead of numbers for \texttt{n}, \texttt{k},
+\texttt{replace}, and \texttt{ordered} results in a vector of
+corresponding answers. This becomes particularly convenient when
+trying to demonstrate the Multiplication Principle for solving
+combinatorics problems.
\subsubsection*{Example}
-Question: There are 11 artists who each submit a portfolio containing 7 paintings for competition in an art exhibition. Unfortunately, the gallery director only has space in the winners' section to accomodate 12 paintings in a row equally spread over three consecutive walls. The director decides to give the first, second, and third place winners each a wall to display the work of their choice. The walls boast 31 separate lighting options apiece. How many displays are possible?
+Question: There are 11 artists who each submit a portfolio containing
+7 paintings for competition in an art exhibition. Unfortunately, the
+gallery director only has space in the winners' section to accomodate
+12 paintings in a row equally spread over three consecutive walls.
+The director decides to give the first, second, and third place
+winners each a wall to display the work of their choice. The walls
+boast 31 separate lighting options apiece. How many displays are
+possible?
-Answer: The judges will pick 3 (ranked) winners out of 11 (with \texttt{rep=FALSE}, \texttt{ord=TRUE}). Each artist will select 4 of his/her paintings from 7 for display in a row (\texttt{rep=FALSE}, \texttt{ord=TRUE}), and lastly, each of the 3 walls has 31 lighting possibilities (\texttt{rep=TRUE}, \texttt{ord=TRUE}). These three numbers can be calculated quickly with
+Answer: The judges will pick 3 (ranked) winners out of 11 (with
+\texttt{rep=FALSE}, \texttt{ord=TRUE}). Each artist will select 4 of
+his/her paintings from 7 for display in a row (\texttt{rep=FALSE},
+\texttt{ord=TRUE}), and lastly, each of the 3 walls has 31 lighting
+possibilities (\texttt{rep=TRUE}, \texttt{ord=TRUE}). These three
+numbers can be calculated quickly with
<<echo=TRUE,print=FALSE>>=
-n = c(11,7,31)
-k = c(3,4,3)
-r = c(FALSE,FALSE,TRUE)
+n <- c(11, 7, 31)
+k <- c(3, 4, 3)
+r <- c(FALSE,FALSE,TRUE)
@
+
<<echo=TRUE,print=TRUE>>=
-x = nsamp(n, k, rep = r, ord = TRUE)
+x <- nsamp(n, k, rep = r, ord = TRUE)
@
-(Notice that \texttt{ordered} is always \texttt{TRUE}; \texttt{nsamp()} will recycle \texttt{ordered} and \texttt{replace} to the appropriate length.) By the Multiplication Principle, the number of ways to complete the experiment is the product of the entries of \texttt{x}:
+
+(Notice that \texttt{ordered} is always \texttt{TRUE};
+\texttt{nsamp()} will recycle \texttt{ordered} and \texttt{replace} to
+the appropriate length.) By the Multiplication Principle, the number
+of ways to complete the experiment is the product of the entries of
+\texttt{x}:
<<echo=TRUE,print=TRUE>>=
prod(x)
@
+
Compare this with the some standard ways to compute this in \texttt{R}:
<<echo=TRUE,print=TRUE>>=
(11*10*9)*(7*6*5*4)*31^3
@
+
or alternatively
<<echo=TRUE,print=TRUE>>=
prod(9:11)*prod(4:7)*31^3
@
+
or even
<<echo=TRUE,print=TRUE>>=
prod(factorial(c(11,7))/factorial(c(8,3)))*31^3
@
-As one can guess, in many of the standard counting problems there aren't much savings in the amount of typing; it is about the same using \texttt{nsamp()} versus \texttt{factorial()} and \texttt{choose()}. But the virtue of \texttt{nsamp()} lies in its collecting the relevant counting formulas in a one-stop shop. Ultimately, it is up to the user to choose the method that works best for him/herself.
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+
+As one can guess, in many of the standard counting problems there
+aren't much savings in the amount of typing; it is about the same
+using \texttt{nsamp()} versus \texttt{factorial()} and
+\texttt{choose()}. But the virtue of \texttt{nsamp()} lies in its
+collecting the relevant counting formulas in a one-stop shop.
+Ultimately, it is up to the user to choose the method that works best
+for him/herself.
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Defining a Probability Space}
-Once a sample space is defined, the next step is to associate a probability model with it in order to be able to answer probabilistic questions. Formally speaking, a \textit{probability space} is a triple $(S,\mathscr{B},\P)$, where $S$ is a sample space, $\mathscr{B}$ is a sigma-algebra of subsets of $S$, and $\P$ is a probability measure defined on $\mathscr{B}$. However, for our purposes all of the sample spaces are finite, so we may take $\mathscr{B}$ to be the power set (the set of all subsets of $S$) and it suffices to specify $\P$ on the elements of $S$, the outcomes. The only requirement for $\P$ is that its values should be nonnegative and sum to 1.
+Once a sample space is defined, the next step is to associate a
+probability model with it in order to be able to answer probabilistic
+questions. Formally speaking, a \textit{probability space} is a
+triple $(S,\mathscr{B},\P)$, where $S$ is a sample space,
+$\mathscr{B}$ is a sigma-algebra of subsets of $S$, and $\P$ is a
+probability measure defined on $\mathscr{B}$. However, for our
+purposes all of the sample spaces are finite, so we may take
+$\mathscr{B}$ to be the power set (the set of all subsets of $S$) and
+it suffices to specify $\P$ on the elements of $S$, the outcomes. The
+only requirement for $\P$ is that its values should be nonnegative and
+sum to 1.
-The end result is that in the \texttt{prob} package, a probability space is an object of outcomes \texttt{S} and a vector of probabilities (called ``\texttt{probs}'') with entries that correspond to each outcome in \texttt{S}. When \texttt{S} is a data frame, we may simply add a column called \texttt{probs} to \texttt{S} and we will be finished; the probability space will simply be a data frame which we may call \texttt{space}. In the case that \text{S} is a list, we may combine the \texttt{outcomes} and \texttt{probs} into a larger list, \texttt{space}; it will have two components: \texttt{outcomes} and \texttt{probs}. The only requirement we place is that the entries of \texttt{probs} be nonnegative and \texttt{sum(probs)} is one.
+The end result is that in the \texttt{prob} package, a probability
+space is an object of outcomes \texttt{S} and a vector of
+probabilities (called ``\texttt{probs}'') with entries that correspond
+to each outcome in \texttt{S}. When \texttt{S} is a data frame, we
+may simply add a column called \texttt{probs} to \texttt{S} and we
+will be finished; the probability space will simply be a data frame
+which we may call \texttt{space}. In the case that \text{S} is a
+list, we may combine the \texttt{outcomes} and \texttt{probs} into a
+larger list, \texttt{space}; it will have two components:
+\texttt{outcomes} and \texttt{probs}. The only requirement we place
+is that the entries of \texttt{probs} be nonnegative and
+\texttt{sum(probs)} is one.
-To accomplish this in \texttt{R}, we may use the \texttt{probspace()} function. The general syntax is \texttt{probspace(x, probs)}, where \texttt{x} is a sample space of outcomes and \texttt{probs} is a vector (of the same length as the number of outcomes in \texttt{x}). The specific choice of \texttt{probs} depends on the context of the problem, and some examples follow to demonstrate some of the more common choices.
+To accomplish this in \texttt{R}, we may use the \texttt{probspace()}
+function. The general syntax is \texttt{probspace(x, probs)}, where
+\texttt{x} is a sample space of outcomes and \texttt{probs} is a
+vector (of the same length as the number of outcomes in \texttt{x}).
+The specific choice of \texttt{probs} depends on the context of the
+problem, and some examples follow to demonstrate some of the more
+common choices.
\subsection{Examples}
\subsubsection*{The Equally Likely Model}
-The equally likely model asserts that every outcome of the sample space has the same probability, thus, if a sample space has $n$ outcomes, then \texttt{probs} would be a vector of length $n$ with identical entries $1/n$. The quickest way to generate \texttt{probs} is with the \texttt{rep()} function. We will start with the experiment of rolling a die, so that $n=6$. We will construct the sample space, generate the \texttt{probs} vector, and put them together with \texttt{probspace()}.
+The equally likely model asserts that every outcome of the sample
+space has the same probability, thus, if a sample space has $n$
+outcomes, then \texttt{probs} would be a vector of length $n$ with
+identical entries $1/n$. The quickest way to generate \texttt{probs}
+is with the \texttt{rep()} function. We will start with the
+experiment of rolling a die, so that $n=6$. We will construct the
+sample space, generate the \texttt{probs} vector, and put them
+together with \texttt{probspace()}.
<<echo=TRUE,print=TRUE>>=
-outcomes = rolldie(1)
+outcomes <- rolldie(1)
p <- rep(1/6, times = 6)
probspace(outcomes, probs = p)
@
-The \texttt{probspace()} function is designed to save us some time in many of the most common situations. For example, due to the especial simplicity of the sample space in this case, we could have achieved the same result with simply (note the name change for the first column)
+
+The \texttt{probspace()} function is designed to save us some time in
+many of the most common situations. For example, due to the especial
+simplicity of the sample space in this case, we could have achieved
+the same result with simply (note the name change for the first
+column)
<<echo=TRUE,print=TRUE>>=
probspace(1:6, probs = p)
-@
- Further, since the equally likely model plays such a fundamental role in the study of probability, the \texttt{probspace()} function will assume that the equally model is desired if no \texttt{probs} are specified. Thus, we get the same answer with only
+@
+
+Further, since the equally likely model plays such a fundamental
+role in the study of probability, the \texttt{probspace()} function
+will assume that the equally model is desired if no \texttt{probs} are
+specified. Thus, we get the same answer with only
<<echo=TRUE,print=TRUE>>=
probspace(1:6)
@
-And finally, since rolling dice is such a common experiment in probability classes, the \texttt{rolldie()} function has an additional logical argument \texttt{makespace} that will add a column of equally likely \texttt{probs} to the generated sample space:
+
+And finally, since rolling dice is such a common experiment in
+probability classes, the \texttt{rolldie()} function has an additional
+logical argument \texttt{makespace} that will add a column of equally
+likely \texttt{probs} to the generated sample space:
<<echo=TRUE,print=TRUE>>=
rolldie(1, makespace = TRUE)
@
-or just \texttt{rolldie(1:6,TRUE)}. Many of the other sample space functions (\texttt{tosscoin()}, \texttt{cards()}, \texttt{roulette()}, \textit{etc}.) have similar \texttt{makespace} arguments. Check the documentation for details.
-One sample space function that does NOT have a \texttt{makespace} option is the \texttt{urnsamples()} function. This was intentional. The reason is that under the varied sampling assumptions the outcomes in the respective sample spaces are NOT, in general, equally likely. It is important for the user to carefully consider the experiment to decide whether or not the outcomes are equally likely, and then use \texttt{probspace()} to assign the model.
+or just \texttt{rolldie(1:6,TRUE)}. Many of the other sample space
+functions (\texttt{tosscoin()}, \texttt{cards()}, \texttt{roulette()},
+\textit{etc}.) have similar \texttt{makespace} arguments. Check the
+documentation for details.
+One sample space function that does NOT have a \texttt{makespace}
+option is the \texttt{urnsamples()} function. This was intentional.
+The reason is that under the varied sampling assumptions the outcomes
+in the respective sample spaces are NOT, in general, equally likely.
+It is important for the user to carefully consider the experiment to
+decide whether or not the outcomes are equally likely, and then use
+\texttt{probspace()} to assign the model.
+
\subsubsection*{An unbalanced coin}
-While the \texttt{makespace} argument to \texttt{tosscoin()} is useful to represent the tossing of a \emph{fair} coin, it is not always appropriate. For example, suppose our coin is not perfectly balanced, for instance, maybe the ``$H$'' side is somewhat heavier such that the chances of a $H$ appearing in a single toss is 0.70 instead of 0.5. We may set up the probability space with
+While the \texttt{makespace} argument to \texttt{tosscoin()} is useful
+to represent the tossing of a \emph{fair} coin, it is not always
+appropriate. For example, suppose our coin is not perfectly balanced,
+for instance, maybe the ``$H$'' side is somewhat heavier such that the
+chances of a $H$ appearing in a single toss is 0.70 instead of 0.5. We
+may set up the probability space with
<<echo=TRUE,print=TRUE>>=
[TRUNCATED]
To get the complete diff run:
svnlook diff /svnroot/prob -r 42
More information about the Prob-commits
mailing list