[Sleuth2-commits] r63 - in pkg/Sleuth3: . inst/doc vignettes

noreply at r-forge.r-project.org noreply at r-forge.r-project.org
Sat Jan 2 10:13:13 CET 2016


Author: berwin
Date: 2016-01-02 10:13:13 +0100 (Sat, 02 Jan 2016)
New Revision: 63

Added:
   pkg/Sleuth3/vignettes/
   pkg/Sleuth3/vignettes/chapter01-HortonMosaic.Rnw
   pkg/Sleuth3/vignettes/chapter02-HortonMosaic.Rnw
   pkg/Sleuth3/vignettes/chapter03-HortonMosaic.Rnw
   pkg/Sleuth3/vignettes/chapter04-HortonMosaic.Rnw
   pkg/Sleuth3/vignettes/chapter05-HortonMosaic.Rnw
   pkg/Sleuth3/vignettes/chapter06-HortonMosaic.Rnw
   pkg/Sleuth3/vignettes/chapter07-HortonMosaic.Rnw
   pkg/Sleuth3/vignettes/chapter08-HortonMosaic.Rnw
   pkg/Sleuth3/vignettes/chapter09-HortonMosaic.Rnw
   pkg/Sleuth3/vignettes/chapter10-HortonMosaic.Rnw
   pkg/Sleuth3/vignettes/chapter11-HortonMosaic.Rnw
   pkg/Sleuth3/vignettes/chapter12-HortonMosaic.Rnw
   pkg/Sleuth3/vignettes/chapter13-HortonMosaic.Rnw
   pkg/Sleuth3/vignettes/language.sty
Modified:
   pkg/Sleuth3/DESCRIPTION
   pkg/Sleuth3/inst/doc/Sleuth3-manual.pdf
Log:
Sleuth3:
Created vignettes/ directory and downloaded Nick Horton's knitr
vignettes from his web-site to it.  Renamed the files.
Updated inst/doc/Sleuth3-manual.pdf
Appropriate changes to DESCRIPTION to reflect that we now have
vignettes with knitr as builder.
Bumped version number.


Modified: pkg/Sleuth3/DESCRIPTION
===================================================================
--- pkg/Sleuth3/DESCRIPTION	2016-01-02 09:09:20 UTC (rev 62)
+++ pkg/Sleuth3/DESCRIPTION	2016-01-02 09:13:13 UTC (rev 63)
@@ -1,15 +1,18 @@
 Package: Sleuth3
 Title: Data Sets from Ramsey and Schafer's "Statistical Sleuth (3rd Ed)"
-Version: 0.1-8
-Date: 2015-02-16
+Version: 1.0-1
+Date: 2016-01-02
 Author:  Original by F.L. Ramsey and D.W. Schafer, 
-    modifications by Daniel W. Schafer, Jeannie Sifneos and Berwin A. Turlach
+    modifications by Daniel W. Schafer, Jeannie Sifneos and Berwin
+    A. Turlach, vignettes contributed by Nicholas Horton, Linda Loi,
+    Kate Aloisio and Ruobing Zhang 
 Description: Data sets from Ramsey, F.L. and Schafer, D.W. (2013), "The
     Statistical Sleuth: A Course in Methods of Data Analysis (3rd
     ed)", Cengage Learning. 
 Maintainer: Berwin A Turlach <Berwin.Turlach at gmail.com>
 LazyData: yes
-Depends: R (>= 3.0.0)
-Suggests: MASS, lattice, multcomp, car, leaps, CCA, Hmisc
+Depends: R (>= 3.1.0)
+Suggests: CCA, Hmisc, MASS,agricolae, car, gmodels, knitr, lattice, leaps, mosaic, multcomp
+VignetteBuilder: knitr
 License: GPL (>= 2)
 URL: http://r-forge.r-project.org/projects/sleuth2/

Modified: pkg/Sleuth3/inst/doc/Sleuth3-manual.pdf
===================================================================
(Binary files differ)

Added: pkg/Sleuth3/vignettes/chapter01-HortonMosaic.Rnw
===================================================================
--- pkg/Sleuth3/vignettes/chapter01-HortonMosaic.Rnw	                        (rev 0)
+++ pkg/Sleuth3/vignettes/chapter01-HortonMosaic.Rnw	2016-01-02 09:13:13 UTC (rev 63)
@@ -0,0 +1,317 @@
+%\VignetteEngine{knitr}
+%\VignetteIndexEntry{Chapter 01, Horton et al. using mosaic}
+%\VignettePackage{Sleuth3}
+\documentclass[11pt]{article}
+
+\usepackage[margin=1in,bottom=.5in,includehead,includefoot]{geometry}
+\usepackage{hyperref}
+\usepackage{language}
+\usepackage{alltt}
+\usepackage{fancyhdr}
+\pagestyle{fancy}
+\fancyhf{}
+
+%% Now begin customising things. See the fancyhdr docs for more info.
+
+\chead{}
+\lhead[\sf \thepage]{\sf \leftmark}
+\rhead[\sf \leftmark]{\sf \thepage}
+\lfoot{}
+\cfoot{Statistical Sleuth in R: Chapter 1}
+\rfoot{}
+
+\newcounter{myenumi}
+\newcommand{\saveenumi}{\setcounter{myenumi}{\value{enumi}}}
+\newcommand{\reuseenumi}{\setcounter{enumi}{\value{myenumi}}}
+
+\pagestyle{fancy}
+
+\def\R{{\sf R}}
+\def\Rstudio{{\sf RStudio}}
+\def\RStudio{{\sf RStudio}}
+\def\term#1{\textbf{#1}}
+\def\tab#1{{\sf #1}}
+
+
+\usepackage{relsize}
+
+\newlength{\tempfmlength}
+\newsavebox{\fmbox}
+\newenvironment{fmpage}[1]
+     {
+   \medskip
+   \setlength{\tempfmlength}{#1}
+         \begin{lrbox}{\fmbox}
+           \begin{minipage}{#1}
+                 \vspace*{.02\tempfmlength}
+                 \hfill
+           \begin{minipage}{.95 \tempfmlength}}
+                 {\end{minipage}\hfill
+                 \vspace*{.015\tempfmlength}
+                 \end{minipage}\end{lrbox}\fbox{\usebox{\fmbox}}
+         \medskip
+         }
+
+
+\newenvironment{boxedText}[1][.98\textwidth]%
+{%
+\begin{center}
+\begin{fmpage}{#1}
+}%
+{%
+\end{fmpage}
+\end{center}
+}
+
+\newenvironment{boxedTable}[2][tbp]%
+{%
+\begin{table}[#1]
+  \refstepcounter{table}
+  \begin{center}
+\begin{fmpage}{.98\textwidth}
+  \begin{center}
+        \sf \large Box~\expandafter\thetable. #2
+\end{center}
+\medskip
+}%
+{%
+\end{fmpage}
+\end{center}
+\end{table}             % need to do something about exercises that follow boxedTable
+}
+
+
+\newcommand{\cran}{\href{http://www.R-project.org/}{CRAN}}
+
+\title{The Statistical Sleuth in R: \\
+Chapter 1}
+
+\author{
+Linda Loi \and Ruobing Zhang \and Kate Aloisio \and Nicholas J. Horton\thanks{Department of Mathematics and Statistics, Smith College, nhorton at smith.edu}
+} 
+
+\date{\today}
+
+\begin{document}
+
+
+\maketitle
+\tableofcontents
+
+%\parindent=0pt
+
+
+<<setup, include=FALSE, cache=FALSE>>=
+require(knitr)
+opts_chunk$set(
+  dev="pdf",
+  fig.path="figures/",
+        fig.height=3,
+        fig.width=4,
+        out.width=".47\\textwidth",
+        fig.keep="high",
+        fig.show="hold",
+        fig.align="center",
+        prompt=TRUE,  # show the prompts; but perhaps we should not do this 
+        comment=NA    # turn off commenting of ouput (but perhaps we should not do this either
+  )
+@
+
+<<pvalues, echo=FALSE, message=FALSE>>=
+print.pval = function(pval) {
+  threshold = 0.0001
+    return(ifelse(pval < threshold, paste("p<", sprintf("%.4f", threshold), sep=""),
+                ifelse(pval > 0.1, paste("p=",round(pval, 2), sep=""),
+                       paste("p=", round(pval, 3), sep=""))))
+}
+@
+
+<<setup2,echo=FALSE,message=FALSE>>=
+require(mosaic)
+require(Sleuth3)
+trellis.par.set(theme=col.mosaic())  # get a better color scheme for lattice
+set.seed(123)
+# this allows for code formatting inline.  Use \Sexpr{'function(x,y)'}, for exmaple.
+knit_hooks$set(inline = function(x) {
+if (is.numeric(x)) return(knitr:::format_sci(x, 'latex'))
+x = as.character(x)
+h = knitr:::hilight_source(x, 'latex', list(prompt=FALSE, size='normalsize'))
+h = gsub("([_#$%&])", "\\\\\\1", h)
+h = gsub('(["\'])', '\\1{}', h)
+gsub('^\\\\begin\\{alltt\\}\\s*|\\\\end\\{alltt\\}\\s*$', '', h)
+})
+showOriginal=FALSE
+showNew=TRUE
+@ 
+
+\section{Introduction}
+
+This document is intended to help describe how to undertake analyses introduced as examples in the Third Edition of the \emph{Statistical Sleuth} (2013) by Fred Ramsey and Dan Schafer.
+More information about the book can be found at \url{http://www.proaxis.com/~panorama/home.htm}.  This
+file as well as the associated \pkg{knitr} reproducible analysis source file can be found at
+\url{http://www.math.smith.edu/~nhorton/sleuth3}.
+
+This work leverages initiatives undertaken by Project MOSAIC (\url{http://www.mosaic-web.org}), an NSF-funded effort to improve the teaching of statistics, calculus, science and computing in the undergraduate curriculum. In particular, we utilize the 
+\pkg{mosaic} package, which was written to simplify the use of R for introductory statistics courses. A short summary of the R needed to teach introductory statistics can be found in the mosaic package vignette (\url{http://cran.r-project.org/web/packages/mosaic/vignettes/MinimalR.pdf}).
+
+To use a package within R, it must be installed (one time), and loaded (each session). The package can be installed using the following command:
+<<install_mosaic,eval=FALSE>>=
+install.packages('mosaic')               # note the quotation marks
+@
+Once this is installed, it can be loaded by running the command:
+<<load_mosaic,eval=FALSE>>=
+require(mosaic)
+@
+This
+needs to be done once per session.
+
+In addition the data files for the \emph{Sleuth} case studies can be accessed by installing the \pkg{Sleuth3} package.
+<<install_Sleuth3,eval=FALSE>>=
+install.packages('Sleuth3')               # note the quotation marks
+@
+<<load_Sleuth3,eval=FALSE>>=
+require(Sleuth3)
+@
+
+We also set some options to improve legibility of graphs and output.
+<<eval=TRUE>>=
+trellis.par.set(theme=col.mosaic())  # get a better color scheme for lattice
+options(digits=3)
+@
+
+The specific goal of this document is to demonstrate how to calculate the quantities described in Chapter 1: Drawing Statistical Conclusions using R.
+
+\section{Motivation and Creativity}
+
+For Case Study 1: Motivation and Creativity, the following questions are posed: Do grading systems promote creativity in students? Do ranking systems and incentive awards increase productivity among employees? Do rewards and praise stimulate children to learn? 
+
+The data for Case Study 1 was collected by psychologist Teresa Amabile in an experiment concerning the effects of intrinsic and extrinsic motivation on creativity (page 2 of the \emph{Sleuth}).
+
+\subsection{Statistical summary and graphical display}
+
+We begin by reading the data and summarizing the variables.
+<<>>=
+summary(case0101)
+@
+A total of \Sexpr{nrow(case0101)} subjects with considerable experience in creative writing were randomly assigned to one of two treatment groups: \Sexpr{nrow(subset(case0101, Treatment=="Extrinsic"))} were placed into the ``extrinsic" treatment group and \Sexpr{nrow(subset(case0101, Treatment=="Intrinsic"))} were placed into the ``intrinsic" treatment group, as summarized in Display 1.1 (\emph{Sleuth}, page 2)
+
+<<eval=TRUE>>=
+favstats(Score ~ Treatment, data=case0101)
+histogram(~ Score | Treatment, data=case0101)
+@
+<<>>=
+with(subset(case0101, Treatment=="Extrinsic"), stem(Score, scale=5))
+with(subset(case0101, Treatment=="Intrinsic"), stem(Score, scale=5))
+@
+
+Similar output can be generated using the following code:
+<<eval=FALSE>>=
+maggregate(Score ~ Treatment, data=case0101, FUN=stem)
+@
+
+The extrinsic group (n=\Sexpr{nrow(subset(case0101, Treatment=="Extrinsic"))}) has an average creativity score that is \Sexpr{round(diff(mean(Score ~ Treatment, data=case0101)), 1)} points less than the intrinsic group (n=\Sexpr{nrow(subset(case0101, Treatment=="Intrinsic"))}). The extrinsic group has relatively larger spread than the intrinsic group (sd=\Sexpr{round(sd(Score, data=subset(case0101, Treatment=="Extrinsic")),2)} for extrinsic group and sd=\Sexpr{round(sd(Score, data=subset(case0101, Treatment=="Intrinsic")), 2)} for intrinsic group). Both distributions are approximately normally distributed.
+
+\subsection{Inferential procedures (two-sample t-test)}
+ 
+<<eval=TRUE>>=
+t.test(Score ~ Treatment, alternative="two.sided", data=case0101)
+@
+The two-sample $t$-test shows strong evidence that a subject would receive a lower creativity score for a poem written after the extrinsic motivation questionnaire than for one written after the intrinsic motivation questionnaire. The two-sided $p$-value is \Sexpr{pval(t.test(Score~Treatment, alternative="two.sided", data=case0101), digits=4)}, which is small enough to reject the null hypothesis. 
+
+Thus, we can conclude that there is a difference between the population mean in the extrinsic group and the population mean in the intrinsic group; the estimated difference between these two scores is \Sexpr{round(diff(mean(Score ~ Treatment, data=case0101)), 1)} points on the 0-40 point scale. A 95\% confidence interval for the decrease in score due to having extrinsic motivation rather than intrinsic motivation is between \Sexpr{round(t.test(Score~Treatment, alternative="two.sided", data=case0101)$conf.int[2], 2)} and  \Sexpr{round(t.test(Score~Treatment, alternative="two.sided", data=case0101)$conf.int[1], 2)} points (\emph{Sleuth}, page 3).
+
+
+<<eval=TRUE>>=
+summary(lm(Score ~ Treatment, data=case0101))
+@
+
+In the creativity study, the question of whether there is a treatment effect becomes a question of whether the parameter has a nonzero value. The value of the test statistic for the creativity scores is \Sexpr{round(diff(mean(Score ~ Treatment, data=case0101)), 2)}.
+
+\subsection{Permutation test}
+
+<<eval=TRUE>>=
+diffmeans = diff(mean(Score ~ Treatment, data=case0101))
+diffmeans     # observed difference
+numsim = 1000     # set to a sufficient number
+nulldist = do(numsim)*diff(mean(Score~shuffle(Treatment), data=case0101))
+confint(nulldist)
+# Display 1.8 Sleuth
+histogram(~ Intrinsic, nint=50, data=nulldist, v=c(-4.14,4.14)) 
+@
+
+As described in the \emph{Sleuth} on page 12, if the group assignment changes, we will get different results. First, the test statistics will be just as likely to be negative as positive. Second, the majority of values fall in the range from -3.0 to +3.0. Third, only few of the 1,000 randomization produced test statistics as large as 4.14. This last point indicates that 4.14 is a value corresponding to an unusually uneven randomization outcome, if the null hypothesis is correct.
+
+\section{Gender Discrimination}
+
+For Case Study 2: Gender Discrimination the following questions are posed: Did a bank discriminatorily pay higher starting salaries to men than to women?  Display 1.3 (page 4 of the \emph{Sleuth}) displays the beginning salaries for male and female skilled entry level clerical employees hired between 1969 and 1977.
+
+\subsection{Statistical summary and graphical display}
+
+We begin by reading the data and summarizing the variables.
+
+<<eval=TRUE>>=
+summary(case0102) # Display 1.3 Sleuth p4
+@
+
+<<eval=TRUE>>=
+favstats(Salary ~ Sex, data=case0102)
+bwplot(Salary ~ Sex, data=case0102)
+densityplot(~ Salary, groups=Sex, auto.key=TRUE, data=case0102)
+@
+
+The \Sexpr{nrow(subset(case0102, Sex=="MALE"))} men have an average starting salary that is \$\Sexpr{round(diff(mean(Salary ~ Sex, data=case0102)), 1)} more than the \Sexpr{nrow(subset(case0102, Sex=="Female"))} women (\$\Sexpr{round(mean(Salary, data=subset(case0102, Sex=="Male")),0)} vs \$\Sexpr{round(mean(Salary, data=subset(case0102, Sex=="Female")),0)}).  Both distributions have similar spread (sd=\$\Sexpr{round(sd(Salary, data=subset(case0102, Sex=="Female")), 2)} for women and sd=\$\Sexpr{round(sd(Salary, data=subset(case0102, Sex=="Male")), 2)} for men) and distributions that are approximately normally distributed (see density plot). The key difference between the groups is the shift (as indicated by the parallel boxplots).
+
+To show Display 1.13
+<<>>=
+histogram(rnorm(1000))  # Normal
+histogram(rexp(1000))   # Long-tailed 
+histogram(runif(1000))  # Short-tailed
+histogram(rchisq(1000, df=15)) # Skewed
+@
+
+\subsection{Inferential procedures (two-sample t-test)}
+
+The $t$-test on page 4 of Sleuth can be replicated using the following commands (note that the equal-variance t-test is specified by {\tt var.equal=TRUE} which is not the default).
+
+<<eval=TRUE>>=
+t.test(Salary ~ Sex, var.equal=TRUE, data=case0102)
+@
+
+\subsection{Permutation test}
+
+We undertake a permutation test to assess whether the differences in the center of these samples that we are observing are due to chance, if the distributions are actually equivalent back in the populations of male and female possible clerical hires.  We start by calculating our test statistic (the difference in means) for the observed data, then simulate from the null distribution (where the labels can be interchanged) and calculate our $p$-value. 
+
+<<obsdiff,eval=TRUE>>=
+obsdiff = diff(mean(Salary ~ Sex, data=case0102)); obsdiff
+@
+
+The labeling for the difference in means isn't ideal (but will be given 
+as ``Male'' by R).
+<<permutetest>>=
+numsim = 1000
+res = do(numsim) * diff(mean(Salary~shuffle(Sex), data=case0102))
+densityplot(~ Male, data=res)
+confint(res)
+@
+
+<<eval=FALSE>>=
+larger = sum(with(res, abs(Male) >= abs(obsdiff)))
+larger
+pval = larger/numsim
+pval
+@
+
+Through the permutation test, we observe that the mean starting salary for males is significantly larger than the mean starting salary for females, as we never see a permuted difference in means close to our observed value. Therefore, we reject the null hypothesis ($p<0.001$) and conclude that the salaries of the men are higher than that of the women back in the population.
+
+<<eval=TRUE>>=
+t.test(Salary ~ Sex, alternative="less", data=case0102) 
+@
+
+The $p$-value ($<0.001$) from the two-sample t-test shows that the large difference between estimated salaries for males and females is unlikely to be due to chance.
+
+
+\end{document}
+
+
+
+

Added: pkg/Sleuth3/vignettes/chapter02-HortonMosaic.Rnw
===================================================================
--- pkg/Sleuth3/vignettes/chapter02-HortonMosaic.Rnw	                        (rev 0)
+++ pkg/Sleuth3/vignettes/chapter02-HortonMosaic.Rnw	2016-01-02 09:13:13 UTC (rev 63)
@@ -0,0 +1,312 @@
+%\VignetteEngine{knitr}
+%\VignetteIndexEntry{Chapter 02, Horton et al. using mosaic}
+%\VignettePackage{Sleuth3}
+\documentclass[11pt]{article}
+
+\usepackage[margin=1in,bottom=.5in,includehead,includefoot]{geometry}
+\usepackage{hyperref}
+\usepackage{language}
+\usepackage{alltt}
+\usepackage{fancyhdr}
+\pagestyle{fancy}
+\fancyhf{}
+
+%% Now begin customising things. See the fancyhdr docs for more info.
+
+\chead{}
+\lhead[\sf \thepage]{\sf \leftmark}
+\rhead[\sf \leftmark]{\sf \thepage}
+\lfoot{}
+\cfoot{Statistical Sleuth in R: Chapter 2}
+\rfoot{}
+
+\newcounter{myenumi}
+\newcommand{\saveenumi}{\setcounter{myenumi}{\value{enumi}}}
+\newcommand{\reuseenumi}{\setcounter{enumi}{\value{myenumi}}}
+
+\pagestyle{fancy}
+
+\def\R{{\sf R}}
+\def\Rstudio{{\sf RStudio}}
+\def\RStudio{{\sf RStudio}}
+\def\term#1{\textbf{#1}}
+\def\tab#1{{\sf #1}}
+
+
+\usepackage{relsize}
+
+\newlength{\tempfmlength}
+\newsavebox{\fmbox}
+\newenvironment{fmpage}[1]
+     {
+   \medskip
+   \setlength{\tempfmlength}{#1}
+   \begin{lrbox}{\fmbox}
+     \begin{minipage}{#1}
+     \vspace*{.02\tempfmlength}
+		 \hfill
+	   \begin{minipage}{.95 \tempfmlength}}
+		 {\end{minipage}\hfill
+		 \vspace*{.015\tempfmlength}
+		 \end{minipage}\end{lrbox}\fbox{\usebox{\fmbox}}
+	 \medskip
+	 }
+
+
+\newenvironment{boxedText}[1][.98\textwidth]%
+{%
+\begin{center}
+\begin{fmpage}{#1}
+}%
+{%
+\end{fmpage}
+\end{center}
+}
+
+\newenvironment{boxedTable}[2][tbp]%
+{%
+\begin{table}[#1]
+  \refstepcounter{table}
+  \begin{center}
+\begin{fmpage}{.98\textwidth}
+  \begin{center}
+	\sf \large Box~\expandafter\thetable. #2
+\end{center}
+\medskip
+}%
+{%
+\end{fmpage}
+\end{center}
+\end{table}		% need to do something about exercises that follow boxedTable
+}
+
+
+\newcommand{\cran}{\href{http://www.R-project.org/}{CRAN}}
+
+\title{The Statistical Sleuth in R: \\
+Chapter 2}
+
+\author{
+Linda Loi \and Ruobing Zhang\and Kate Aloisio \and Nicholas J. Horton\thanks{Department of Mathematics and Statistics, Smith College, nhorton at smith.edu}
+} 
+
+\date{\today}
+
+\begin{document}
+
+
+\maketitle
+\tableofcontents
+
+%\parindent=0pt
+
+
+<<setup, include=FALSE, cache=FALSE>>=
+require(knitr)
+opts_chunk$set(
+  dev="pdf",
+  fig.path="figures/",
+        fig.height=3,
+        fig.width=4,
+        out.width=".47\\textwidth",
+        fig.keep="high",
+        fig.show="hold",
+        fig.align="center",
+        prompt=TRUE,  # show the prompts; but perhaps we should not do this
+        comment=NA    # turn off commenting of ouput (but perhaps we should not do this either
+  )
+@
+
+
+<<pvalues, echo=FALSE, message=FALSE>>=
+print.pval = function(pval) {
+  threshold = 0.0001
+    return(ifelse(pval < threshold, paste("p<", sprintf("%.4f", threshold), sep=""),
+                ifelse(pval > 0.1, paste("p=",round(pval, 2), sep=""),
+                       paste("p=", round(pval, 3), sep=""))))
+}
+@
+
+<<setup2,echo=FALSE,message=FALSE>>=
+require(Sleuth3)
+require(mosaic)
+trellis.par.set(theme=col.mosaic())  # get a better color scheme 
+set.seed(123)
+# this allows for code formatting inline.  Use \Sexpr{'function(x,y)'}, for exmaple.
+knit_hooks$set(inline = function(x) {
+if (is.numeric(x)) return(knitr:::format_sci(x, 'latex'))
+x = as.character(x)
+h = knitr:::hilight_source(x, 'latex', list(prompt=FALSE, size='normalsize'))
+h = gsub("([_#$%&])", "\\\\\\1", h)
+h = gsub('(["\'])', '\\1{}', h)
+gsub('^\\\\begin\\{alltt\\}\\s*|\\\\end\\{alltt\\}\\s*$', '', h)
+})
+showOriginal=FALSE
+showNew=TRUE
+@ 
+
+\section{Introduction}
+
+This document is intended to help describe how to undertake analyses introduced as examples in the Third Edition of the \emph{Statistical Sleuth} (2013) by Fred Ramsey and Dan Schafer.
+More information about the book can be found at \url{http://www.proaxis.com/~panorama/home.htm}.
+This
+file as well as the associated \pkg{knitr} reproducible analysis source file can be found at
+\url{http://www.math.smith.edu/~nhorton/sleuth3}.
+
+
+This work leverages initiatives undertaken by Project MOSAIC (\url{http://www.mosaic-web.org}), an NSF-funded effort to improve the teaching of statistics, calculus, science and computing in the undergraduate curriculum. In particular, we utilize the 
+\pkg{mosaic} package, which was written to simplify the use of R for introductory statistics courses. A short summary of the R needed to teach introductory statistics can be found in the mosaic package vignette (\url{http://cran.r-project.org/web/packages/mosaic/vignettes/MinimalR.pdf}). 
+
+To use a package within R, it must be installed (one time), and loaded (each session). The package can be installed using the following command:
+<<install_mosaic,eval=FALSE>>=
+install.packages('mosaic')               # note the quotation marks
+@
+Once this is installed, it can be loaded by running the command:
+<<load_mosaic,eval=FALSE>>=
+require(mosaic)
+@
+This
+needs to be done once per session.
+
+In addition the data files for the \emph{Sleuth} case studies can be accessed by installing the \pkg{Sleuth3} package.
+<<install_Sleuth3,eval=FALSE>>=
+install.packages('Sleuth3')               # note the quotation marks
+@
+<<load_Sleuth3,eval=FALSE>>=
+require(Sleuth3)
+@
+
+We also set some options to improve legibility of graphs and output.
+<<eval=TRUE>>=
+trellis.par.set(theme=col.mosaic())  # get a better color scheme for lattice
+options(digits=3, show.signif.stars=FALSE)
+@
+
+The specific goal of this document is to demonstrate how to calculate the quantities described in Chapter 2: Inference Using \emph{t}-Distributions using R.
+
+\section{Evidence Supporting Darwin's Theory of Natural Selection}
+
+Do birds evolve to adapt to their environments? That's the question being addressed by Case Study 2.1 in the \emph{Sleuth}.
+
+\subsection{Statistical summary and graphical display}
+We begin by reading the data and summarizing the variables.
+
+<<>>=
+summary(case0201)
+fav = favstats(Depth ~ Year, data=case0201); fav
+@
+
+A total of \Sexpr{nrow(case0201)} subjects are included in the data: \Sexpr{nrow(subset(case0201, Year=="1976"))} are finches that were caught in 1976 and \Sexpr{nrow(subset(case0201, Year=="1978"))} are finches that were caught in 1978. 
+The following figure replicates Display 2.1 on page 30.
+
+<<>>=
+bwplot(Year ~ Depth, data=case0201)
+@
+
+<<>>=
+densityplot(~ Depth, groups=Year, auto.key=TRUE, data=case0201)
+@
+
+The distributions are approximately normally distributed, with some evidence for a long left tail.
+
+\subsection{Inferential procedures (two-sample t-test)}
+
+First, we calculate the pooled SD and the standard error between these two different sample average (page 41, Display 2.8).
+<<>>=
+# Calculate Pooled SD
+n1 = fav["1976", "n"]; n1
+n2 = fav["1978", "n"]; n2
+s1 = fav["1976", "sd"]; s1
+s2 = fav["1978", "sd"]; s2
+Sp = sqrt(((n1-1)*(s1)^2+(n2-1)*(s2)^2)/(n1+n2-2)); Sp
+# Calculate standard error
+SE = Sp*sqrt(1/n1+1/n2); SE
+@
+
+So the pooled SD is \Sexpr{round(Sp, 2)} and the standard error is \Sexpr{round(SE, 1)}.
+
+Based on this information, we can construct a 95\% confidence interval (page 43, Display 2.9).
+
+<<>>=
+Y1 = fav["1976", "mean"]; Y1
+Y2 = fav["1978", "mean"]; Y2
+Yd = Y2-Y1; Yd
+df = n1+n2-2; df
+qt = qt(0.975, df); qt
+hw = qt*SE; hw
+lower = Yd-hw; lower
+upper = Yd+hw; upper
+@
+
+So the 95\% confidence interval of the difference between means is (\Sexpr{round(lower, 1)}, \Sexpr{round(upper, 1)})
+
+Now we want to calculate the $t$-statistic and $p$-value (as shown on page 46, Display 2.10).
+<<>>=
+tstats = (Yd-0)/SE; tstats      # The hypothesis difference=0
+onepval = 1-pt(tstats, df); onepval
+twopval = 2*onepval; twopval
+@
+
+The one-sided $p$-value is approximately \Sexpr{round(onepval, 2)} and the two-sided $p$-value is also approximately \Sexpr{round(twopval, 2)}.
+
+We can get the results of ``Summary of Statistical Findings" (page 29) by using the following code: 
+<<>>=
+t.test(Depth ~ Year, var.equal=TRUE, data=case0201)
+confint(lm(Depth ~ Year, data=case0201))
+@
+
+\section{Anatomical Abnormalities Associated with Schizophrenia}
+
+Is the area of brain related to the development of schizophrenia? That's the question being addressed by case study 2.2 in the \emph{Sleuth}.
+
+\subsection{Statistical summary and graphical display}
+We begin by reading the data and summarizing the variables.
+
+<<>>=
+summary(case0202)
+@
+
+A total of \Sexpr{nrow(case0202)} subjects are included in the data. There are \Sexpr{nrow(case0202[ "Affected"])} pairs of twins; one of the twins has schizophrenia, and the other does not. So there are \Sexpr{nrow(case0202["Affected"])} affected subjects and \Sexpr{nrow(case0202["Unaffected"])} unaffected subjects.
+
+The difference in area of left hippocampus of these pairs of twins is:
+<<transform>>=
+case0202 = transform(case0202, DIFF = Unaffected - Affected)
+favstats(~ DIFF, data=case0202)
+@
+
+This matches the results on page 31, Display 2.2.
+
+<<>>=
+densityplot(~ DIFF, data=case0202)
+@
+
+\subsection{Inferential procedures (two-sample t-test)}
+
+We want to calculate the paired t-test and 95\% confidence interval.
+
+<<>>=
+# Calculate t-statistics
+difmean = mean(~ DIFF, data=case0202); difmean
+difsd = sd(~ DIFF, data=case0202); difsd
+difn = nrow(case0202); difn
+difSE = difsd/sqrt(difn); difSE
+tscore = (difmean-0)/difSE; tscore         # hypothesis difference=0
+twopvalue = 2*(1-pt(tscore, difn-1)); twopvalue
+# Construct confidence interval
+tstar = qt(0.975, difn-1); tstar
+schizolower = difmean - tstar*difSE; schizolower
+schizoupper = difmean + tstar*difSE; schizoupper
+@
+
+So the two-sided $p$-value is approximately \Sexpr{round(twopvalue, 3)} and the 95\% confidence interval is (\Sexpr{round(schizolower, 2)}, \Sexpr{round(schizoupper, 2)}).
+
+We can also get the results displayed on page 32 by conducting a paired $t$-test:
+
+<<>>=
+with(case0202, t.test(Unaffected, Affected, paired=TRUE))
+@
+
+\end{document}
+
+
+

Added: pkg/Sleuth3/vignettes/chapter03-HortonMosaic.Rnw
===================================================================
--- pkg/Sleuth3/vignettes/chapter03-HortonMosaic.Rnw	                        (rev 0)
+++ pkg/Sleuth3/vignettes/chapter03-HortonMosaic.Rnw	2016-01-02 09:13:13 UTC (rev 63)
@@ -0,0 +1,323 @@
+%\VignetteEngine{knitr}
+%\VignetteIndexEntry{Chapter 03, Horton et al. using mosaic}
+%\VignettePackage{Sleuth3}
+\documentclass[11pt]{article}
+
+\usepackage[margin=1in,bottom=.5in,includehead,includefoot]{geometry}
+\usepackage{hyperref}
+\usepackage{language}
+\usepackage{alltt}
+\usepackage{fancyhdr}
+\pagestyle{fancy}
+\fancyhf{}
+
+%% Now begin customising things. See the fancyhdr docs for more info.
+
+\chead{}
+\lhead[\sf \thepage]{\sf \leftmark}
+\rhead[\sf \leftmark]{\sf \thepage}
+\lfoot{}
+\cfoot{Statistical Sleuth in R: Chapter 3}
+\rfoot{}
+
+\newcounter{myenumi}
+\newcommand{\saveenumi}{\setcounter{myenumi}{\value{enumi}}}
+\newcommand{\reuseenumi}{\setcounter{enumi}{\value{myenumi}}}
+
+\pagestyle{fancy}
+
+\def\R{{\sf R}}
+\def\Rstudio{{\sf RStudio}}
+\def\RStudio{{\sf RStudio}}
+\def\term#1{\textbf{#1}}
+\def\tab#1{{\sf #1}}
+
+
+\usepackage{relsize}
+
+\newlength{\tempfmlength}
+\newsavebox{\fmbox}
+\newenvironment{fmpage}[1]
+     {
+   \medskip
+   \setlength{\tempfmlength}{#1}
+   \begin{lrbox}{\fmbox}
+     \begin{minipage}{#1}
+     \vspace*{.02\tempfmlength}
+		 \hfill
+	   \begin{minipage}{.95 \tempfmlength}}
+		 {\end{minipage}\hfill
+		 \vspace*{.015\tempfmlength}
+		 \end{minipage}\end{lrbox}\fbox{\usebox{\fmbox}}
+	 \medskip
+	 }
+
+
+\newenvironment{boxedText}[1][.98\textwidth]%
+{%
+\begin{center}
+\begin{fmpage}{#1}
+}%
+{%
+\end{fmpage}
+\end{center}
+}
+
+\newenvironment{boxedTable}[2][tbp]%
+{%
+\begin{table}[#1]
+  \refstepcounter{table}
+  \begin{center}
+\begin{fmpage}{.98\textwidth}
+  \begin{center}
+	\sf \large Box~\expandafter\thetable. #2
+\end{center}
+\medskip
+}%
+{%
+\end{fmpage}
+\end{center}
+\end{table}		% need to do something about exercises that follow boxedTable
+}
+
+
+\newcommand{\cran}{\href{http://www.R-project.org/}{CRAN}}
+
+\title{The Statistical Sleuth in R: \\
+Chapter 3}
+
+\author{
+Linda Loi \and Ruobing Zhang \and Kate Aloisio \and Nicholas J. Horton\thanks{Department of Mathematics and Statistics, Smith College, nhorton at smith.edu}
+} 
+
+\date{\today}
+
+\begin{document}
+
+
+\maketitle
+\tableofcontents
+
+%\parindent=0pt
+
+
+<<setup, include=FALSE, cache=FALSE>>=
+require(knitr)
+opts_chunk$set(
+  dev="pdf",
+  fig.path="figures/",
+        fig.height=3,
+        fig.width=4,
+        out.width=".47\\textwidth",
+        fig.keep="high",
+        fig.show="hold",
+        fig.align="center",
+        prompt=TRUE,  # show the prompts; but perhaps we should not do this
+        comment=NA    # turn off commenting of ouput (but perhaps we should not do this either
+  )
+@
+
+
+<<pvalues, echo=FALSE, message=FALSE>>=
+print.pval = function(pval) {
+  threshold = 0.0001
+    return(ifelse(pval < threshold, paste("p<", sprintf("%.4f", threshold), sep=""),
+                ifelse(pval > 0.1, paste("p=",round(pval, 2), sep=""),
+                       paste("p=", round(pval, 3), sep=""))))
+}
+@
+
+<<setup2,echo=FALSE,message=FALSE>>=
+require(Sleuth3)
+require(mosaic)
+trellis.par.set(theme=col.mosaic())  # get a better color scheme 
+set.seed(123)
+# this allows for code formatting inline.  Use \Sexpr{'function(x,y)'}, for exmaple.
+knit_hooks$set(inline = function(x){
+if (is.numeric(x)) return(knitr:::format_sci(x, 'latex'))
+x = as.character(x)
+h = knitr:::hilight_source(x, 'latex', list(prompt=FALSE, size='normalsize'))
+h = gsub("([_#$%&])", "\\\\\\1", h)
+h = gsub('(["\'])', '\\1{}', h)
+gsub('^\\\\begin\\{alltt\\}\\s*|\\\\end\\{alltt\\}\\s*$', '', h)
+})
+showOriginal=FALSE
+showNew=TRUE
+@ 
+
+\section{Introduction}
+
+This document is intended to help describe how to undertake analyses introduced as examples in the Third Edition of the \emph{Statistical Sleuth} (2013) by Fred Ramsey and Dan Schafer.
+More information about the book can be found at \url{http://www.proaxis.com/~panorama/home.htm}.
+This
+file as well as the associated \pkg{knitr} reproducible analysis source file can be found at
+\url{http://www.math.smith.edu/~nhorton/sleuth3}.
+
+
+This work leverages initiatives undertaken by Project MOSAIC (\url{http://www.mosaic-web.org}), an NSF-funded effort to improve the teaching of statistics, calculus, science and computing in the undergraduate curriculum. In particular, we utilize the 
+\pkg{mosaic} package, which was written to simplify the use of R for introductory statistics courses. A short summary of the R needed to teach introductory statistics can be found in the mosaic package vignette (\url{http://cran.r-project.org/web/packages/mosaic/vignettes/MinimalR.pdf}). 
+
+To use a package within R, it must be installed (one time), and loaded (each session). The package can be installed using the following command:
+<<install_mosaic,eval=FALSE>>=
+install.packages('mosaic')               # note the quotation marks
+@
+Once this is installed, it can be loaded by running the command:
+<<load_mosaic,eval=FALSE>>=
+require(mosaic)
+@
+This
+needs to be done once per session.
+
+In addition the data files for the \emph{Sleuth} case studies can be accessed by installing the \pkg{Sleuth3} package.
+<<install_Sleuth3,eval=FALSE>>=
+install.packages('Sleuth3')               # note the quotation marks
+@
+<<load_Sleuth3,eval=FALSE>>=
+require(Sleuth3)
+@
+
+We also set some options to improve legibility of graphs and output.
+<<eval=TRUE>>=
+trellis.par.set(theme=col.mosaic())  # get a better color scheme for lattice
+options(digits=3, show.signif.stars=FALSE)
+@
+
+The specific goal of this document is to demonstrate how to calculate the quantities described in \emph{Sleuth} Chapter 3: A Closer Look at Assumptions using R.
+
+\section{Cloud Seeding to Increase Rainfall}
+
+Does seeding clouds lead to more rainfall? This is the question being addressed by case study 3.1 in the \emph{Sleuth}.
+
+\subsection{Summary statistics and graphical displays (untransformed)}
+
+We begin by reading the data and summarizing the variables.
+
+<<>>=
+summary(case0301)
+favstats(Rainfall ~ Treatment, data=case0301)
+@
+
+A total of \Sexpr{nrow(case0301)} subjects were included in this data: \Sexpr{nrow(subset(case0301, Treatment=="Seeded"))} seeded days  and \Sexpr{nrow(subset(case0301, Treatment=="Unseeded"))} 
+unseeded days (Display 3.1, page 59). 
+
+<<fig.height=8, fig.width=8>>=
+bwplot(Rainfall ~ Treatment, data=case0301)
+@
+
+<<fig.height=8, fig.width=8>>=
+densityplot(~Rainfall, groups=Treatment, auto.key=TRUE, data=case0301)
+@
+
+According to the boxplot and the density plot, the rainfall from seeded days seems to be larger than unseeded days. Both density curves are highly skewed to the right.
+
+\subsection{Summary statistics and graphical display (transformed)}
+
+The skewness suggests that there is a need to apply a logarithmic transformation. The transformed data is shown on page 73 (Display 3.9).
+
+<<>>=
+case0301 = transform(case0301, lograin=log(Rainfall))
+favstats(lograin ~ Treatment, data=case0301)
+@
+
+<<fig.height=8, fig.width=8>>=
+bwplot(lograin ~ Treatment, data=case0301)
+@
+
+<<fig.height=8, fig.width=8>>=
+densityplot(~lograin, groups=Treatment, auto.key=TRUE, data=case0301)
+@
+
+The log transformation reduces the skewness of these two distributions.
+
+\subsection{Inferential procedures (two-sample t-test)}
+
+ 
+<<>>=
+t.test(Rainfall ~ Treatment, var.equal=FALSE, data=case0301)
+t.test(Rainfall ~ Treatment, var.equal=TRUE, data=case0301)
+@
+
+The following corresponds to the calculations on page 73.
+
+<<>>=
+summary(lm(lograin ~ Treatment, data=case0301))
+ttestlog = t.test(lograin ~ Treatment, data=case0301); ttestlog
+@
+
+\subsection{Interpretation of log model}
+
[TRUNCATED]

To get the complete diff run:
    svnlook diff /svnroot/sleuth2 -r 63


More information about the Sleuth2-commits mailing list