[Lme4-commits] r1716 - in www/slides: . 2012-03-22-Paris

Thu May 3 15:57:23 CEST 2012

Author: dmbates
Date: 2012-05-03 15:57:22 +0200 (Thu, 03 May 2012)
New Revision: 1716

Added:
   www/slides/2012-03-22-Paris/
   www/slides/2012-03-22-Paris/Profiling.Rnw
   www/slides/2012-03-22-Paris/Profiling.pdf
   www/slides/2012-03-22-Paris/Profiling.tex
   www/slides/2012-03-22-Paris/SweaveSlides.cfg
   www/slides/2012-03-22-Paris/SweaveSlides.sty
   www/slides/2012-03-22-Paris/figs/
   www/slides/2012-03-22-Paris/pr1.rda
Log:
Slides from presentation at PODE 2012

Added: www/slides/2012-03-22-Paris/Profiling.Rnw
===================================================================

--- www/slides/2012-03-22-Paris/Profiling.Rnw	                        (rev 0)
+++ www/slides/2012-03-22-Paris/Profiling.Rnw	2012-05-03 13:57:22 UTC (rev 1716)
@@ -0,0 +1,436 @@
+ % NOTE -- ONLY EDIT THE .Rnw FILE!!!  The .tex file is
+ % likely to be overwritten.
+
+\documentclass[dvipsnames,pdflatex,beamer]{beamer}
+% \documentclass[letterpaper,11pt,notitlepage]{article}\usepackage{beamerarticle}
+\mode<article>{\usepackage[text={6.2in,9in},centering]{geometry}}
+\mode<beamer>{\usetheme{Boadilla}\usecolortheme{seahorse}\usecolortheme{rose}}
+\usepackage{SweaveSlides}
+\title[Scales for D-optimal design]{Design for nonlinear mixed-effects\\Are variances a reasonable scale?}
+\SweaveOpts{engine=R,eps=FALSE,pdf=TRUE,width=10,height=6.5,strip.white=all}
+\SweaveOpts{prefix=TRUE,prefix.string=figs/profiling,include=TRUE}
+\SweaveOpts{keep.source=TRUE}
+\mode<beamer>{\setkeys{Gin}{width=\textwidth}}
+\mode<article>{\setkeys{Gin}{width=0.8\textwidth}}
+<<preliminaries,echo=FALSE,results=hide>>=
+options(width=69,show.signif.stars=FALSE,str=strOptions(strict.width="cut"))
+library(PKPDmodels)
+library(lme4)
+library(NRAIA)
+lattice.options(default.theme = function() standard.theme())
+# lattice.options(default.theme = function() standard.theme(color=FALSE))
+if (file.exists("pr1.rda")) {
+    load("pr1.rda")
+} else {
+    pr1 <- profile(fm1M <- lmer(Yield ~ 1+(1|Batch), Dyestuff, REML=FALSE))
+    save(pr1, fm1M, file="pr1.rda")
+}
+@ 
+\newcommand{\bLt}{\ensuremath{\bm\Lambda_\theta}}
+\begin{document}
+
+\mode<article>{\maketitle\tableofcontents}
+\mode<presentation>{\frame{\titlepage}}
+\mode<presentation>{\frame{\frametitle{Outline}\tableofcontents[pausesections,hideallsubsections]}}
+
+\section[Overview]{Overview}
+\begin{frame}
+  \frametitle{D-optimal experimental design for fixed-effects models}
+  \begin{itemize}
+  \item The purpose of D-optimal experimental design is to minimize
+    the volume of confidence regions or likelihood contours or HPD
+    regions on the parameters.
+  \item For simple cases (e.g. linear models with no random effects)
+    the choice of parameters does not affect the design.  In some ways
+    the only parameters that make sense are the coefficients of the
+    linear predictor and these are all equivalent up to linear
+    transformation.
+  \item For a nonlinear model the choice of parameters is less
+    obvious.  Nonlinear transformations of parameters can produce
+    dramatically better or worse linear approximations.  In terms of
+    likelihood contours or H.P.D. regions the ideal shape is
+    elliptical (i.e. a locally quadratic deviance function) but the
+    actual shape can be quite different.
+  \end{itemize}
+\end{frame}
+\begin{frame}
+  \frametitle{D-optimal design for mixed-effects models}
+  \begin{itemize}
+  \item For a linear mixed-effects model the choice of scale of the
+    variance components affects the shape of deviance or posterior
+    density contours.
+  \item For a nonlinear mixed-effects model, both the scale of the
+    variance components and the choice of model parameters affect the
+    shape of such contours.
+  \item These distorsions of shape are more dramatic when there are
+    fewer observations per group (i.e. per \code{Subject} or whatever
+    is the grouping factor).  But that is exactly the situation we are
+    trying to achieve.
+  \end{itemize}
+\end{frame}
+
+\section{Profiling nonlinear regression models}
+\label{sec:nonlinear}
+
+\begin{frame}[fragile]
+  \frametitle{Profiling nonlinear regression models}
+  \begin{itemize}
+  \item This is a very brief example of profiling nonlinear regression
+    models with a change of parameters.
+  \item Take data from a single subject in the \code{Theoph} data set
+    in \code{R}
+  \end{itemize}
+<<Theophdat,fig=TRUE,echo=FALSE,height=5>>=
+xyplot(conc ~ Time, subset(Theoph, Subject==1), type=c("p","g"), xlab="Time since drug administration (hr)", ylab="Theophylline concentration")
+@     
+\end{frame}
+\begin{frame}[fragile]
+  \frametitle{My initial naive fit}
+<<fm1>>=
+Theo.1 <- droplevels(subset(Theoph, Subject==1))
+summary(fm1 <- nls(conc ~ SSfol(Dose, Time, lKe, lKa, lCl), Theo.1), corr=TRUE)
+@   
+\end{frame}
+\begin{frame}[fragile]
+  \frametitle{Following a suggestion from France Mentr\'{e}}
+<<gm1>>=
+oral1cptSdlkalVlCl <- PKmod("oral", "sd", list(ka ~ exp(lka), k ~ exp(lCl)/V, V ~ exp(lV)))
+summary(gm1 <- nls(conc ~ oral1cptSdlkalVlCl(Dose, Time, lV, lka, lCl), Theo.1, start=c(lV=-1, lka=0.5, lCl=-4)), corr=TRUE)
+@   
+\end{frame}
+\begin{frame}[fragile]
+  \frametitle{Contours based on profiling the objective, original}
+<<fm1splom,fig=TRUE,echo=FALSE>>=
+prf1 <- profile(fm1, alpha=0.001)
+splom(prf1,sub=NULL)
+@   
+\end{frame}
+\begin{frame}[fragile]
+  \frametitle{Contours based on profiling the objective, revised}
+<<gm1splom,fig=TRUE,echo=FALSE>>=
+prg1 <- profile(gm1, alpha=0.001)
+splom(prg1,sub=NULL)
+@   
+\end{frame}
+\begin{frame}
+  \frametitle{Estimates based on optimizing a criterion}
+  \begin{itemize}
+  \item Maximum-likelihood estimators are an example of estimators
+    defined as the values that optimize a criterion -- maximizing the
+    log-likelihood or, equivalently, minimizing the deviance (negative
+    twice the log-likelihood).
+  \item Deriving the distribution of such an estimator can be
+    difficult (which is why we fall back on asymptotic properties) but,
+    for a given data set and model, we can assess the sensitivity of
+    the objective (e.g. the deviance) to the values of the parameters.
+  \item We can do this systematically by evaluating one-dimensional
+    ``profiles'' of the objective, through conditional optimization of
+    the objective.
+  \end{itemize}
+\end{frame}
+\begin{frame}
+  \frametitle{Profiling the objective}
+  \begin{itemize}
+  \item Profiling is based on conditional optimization of the
+    objective, fixing one or more parameters at particular values and
+    optimizing over the rest.
+  \item We will concentrate on one-dimensional profiles of the
+    deviance for mixed-effects models but the technique can be used
+    more generally.
+  \item We write the deviance as $d(\bm\phi|\bm y)$ where $\bm\phi$ is
+    the parameter vector of length $p$ and $\bm y$ is the vector of observed
+    responses.  Write the individual components of $\bm\phi$ as
+    $\phi_k,k=1,\dots,p$ and the complement of $\phi_i$ as $\bm\phi_{-i}$.
+  \item The profile deviance is $\tilde{d}_i(\phi_i)=\min_{\bm\phi_{-i}}d((\phi_i,\bm\phi_{-i})|\bm y)$.
+    The values of the other parameters at the optimum form the
+    \emph{profile traces}
+  \item If estimates and standard errors are an adequate summary then
+    the deviance should be a quadratic function of $\bm\phi$, i.e.{}
+    $\tilde{d}_i(\phi_i)$ should be a quadratic centered at $\hat{\phi}_i$
+    and the profile traces should be straight.
+  \end{itemize}
+\end{frame}
+
+\section{A Simple Example}
+
+\begin{frame}[fragile]
+  \frametitle{A Simple Example: the Dyestuff data}
+  The \code{Dyestuff} data in the \pkg{lme4} package for \proglang{R} are from the 
+  the classic book \Emph{Statistical Methods in
+  Research and Production}, edited by O.L. Davies  and first published
+  in 1947.
+  \mode<article>{Figure~\ref{fig:Dyestuffplot} shows these data}
+  \begin{figure}[tb]
+    \centering
+<<Dyestuffplot,fig=TRUE,echo=FALSE,height=3.5>>=
+set.seed(1234543)
+print(dotplot(reorder(Batch, Yield) ~ Yield, Dyestuff,
+              ylab = "Batch", jitter.y = TRUE, pch = 21, aspect = 0.32,
+              xlab = "Yield of dyestuff (grams of standard color)",
+              type = c("p", "a")))
+@ 
+    \mode<article>{\caption{The \code{Dyestuff} data from the \pkg{lme4} package for \proglang{R}.}\label{fig:Dyestuffplot}}
+  \end{figure}
+The line joins the mean yields of the six batches, which have been
+reordered by increasing mean yield.
+\end{frame}
+
+\begin{frame}
+  \frametitle{The effect of the batches}
+  \begin{itemize}
+  \item The particular batches observed are just a selection of the
+    possible batches and are entirely used up during the course of
+    the experiment.  
+  \item It is not particularly important to estimate and compare
+    yields from these batches.  Instead we wish to estimate the
+    variability in yields due to batch-to-batch variability.
+  \item The \code{Batch} factor will be used in \Emph{random-effects}
+    terms in models that we fit.
+  \item In the ``subscript fest'' notation such a model is
+    \begin{displaymath}
+      y_{i,j}=\mu + b_i+\epsilon_{i.j},\quad i=1,\dots,6;\: j=1,\dots,5
+    \end{displaymath}
+    with $\epsilon_{i,j}\sim\mathcal{N}(0,\sigma^2)$ and $b_i\sim\mathcal{N}(0,\sigma_1^2)$.
+  \item We obtain the maximum-likelihood estimates for such a model
+    using \code{lmer} with the optional argument, \code{REML=FALSE}.
+  \end{itemize}
+\end{frame}
+
+\begin{frame}[fragile]
+  \frametitle{Fitted model}
+<<fm1>>=
+(fm1 <- lmer(Yield ~ 1 + (1|Batch), Dyestuff, REML=FALSE))
+@   
+\end{frame}
+
+\section{Profiling the fitted model}
+
+
+\begin{frame}[fragile]
+  \frametitle{Profiling the fitted model}
+<<pr1headshow,eval=FALSE>>=
+head(pr1 <- profile(fm1))
+@
+<<pr1head,echo=FALSE>>=
+head(pr1)
+@ 
+$\dots$
+<<tailpr1,echo=FALSE>>=
+print(tail(pr1), colnames=FALSE)
+@ 
+\end{frame}
+
+\begin{frame}
+  \frametitle{Reconstructing the profiled deviance}
+  In \code{pr1} the profiled deviance, $\tilde{d}_i(\phi_i)$ is
+  expressed on the \emph{signed square root} scale
+  \begin{displaymath}
+    \zeta_i(\phi_i)=\sgn(\phi_i-\hat{\phi_i})\sqrt{\tilde{d}_i(\phi_i)-d(\widehat{\bm\phi}|\bm y)}
+  \end{displaymath}
+  In the original scale, $\tilde{d}_i(\phi_i)$, the profiles are
+  \mode<article>{as shown in Figure~\ref{fig:pr1dev}}
+  \begin{figure}[tb]
+    \centering
+<<pr1dev,fig=TRUE,echo=FALSE,height=3.8>>=
+spl <- attr(pr1, "forward")
+dev <- deviance(fm1)
+fr <- do.call(rbind, lapply(levels(pr1$.par)[c(2:3,1L)], function(nm)
+                            cbind(within(as.data.frame(predict(spl[[nm]],
+                                                               seq(min(pr1[[nm]]), max(pr1[[nm]]), len=101L))),
+                               y <- dev + y^2), .par=nm)))
+print(xyplot(y ~ x|.par, fr, type=c("g","l"), scales=list(x=list(relation="free")), xlab=NULL,
+       ylab="Profiled deviance", layout=c(3,1),
+       strip=strip.custom(factor.levels = expression(sigma[1], sigma, "(Intercept)"))))
+
+@     
+    \mode<article>{\caption{Profiled deviance for each of the parameters in fitted model \code{fm1}}\label{fig:pr1dev}}
+  \end{figure}
+\end{frame}
+
+\begin{frame}
+  \frametitle{After applying the square root}
+On the scale of $\sqrt{\tilde{d}_i(\phi_i)-d(\widehat{\bm\phi}|\bm y)}$ the profiles are
+  \mode<article>{as shown in Figure~\ref{fig:pr1abs}}
+  \begin{figure}[tb]
+    \centering
+<<pr1abs,fig=TRUE,echo=FALSE,height=2.5>>=
+print(xyplot(pr1, aspect=0.7, absVal=TRUE, strip=FALSE, strip.left=TRUE,layout=c(3,1)))
+@ 
+    \mode<article>{\caption{Profiled deviance for each of the parameters in fitted model \code{fm1}}\label{fig:pr1abs}}
+  \end{figure}
+  We have added intervals corresponding to 50\%, 80\%, 90\%,
+  95\% and 99\% confidence intervals derived from the profiled
+  deviance.
+\end{frame}
+
+\begin{frame}
+  \frametitle{And, finally, on the $\zeta$ scale}
+  \mode<article>{the profiles are as in Figure~\ref{fig:pr1zeta}}
+  \begin{figure}[tb]
+    \centering
+<<pr1zeta,fig=TRUE,echo=FALSE,height=4.8>>=
+print(xyplot(pr1, aspect=1.3, layout=c(3,1)))
+@ 
+    \mode<article>{\caption{Profiled deviance for each of the parameters in fitted model \code{fm1}}\label{fig:pr1zeta}}
+  \end{figure}
+  The intervals are created by ``inverting'' likelihood ratio tests on
+  particular values of the parameter.
+\end{frame}
+
+\section{Representing profiles as densities}
+
+\begin{frame}
+  \frametitle{Representing profiles as densities}
+  \begin{itemize}
+  \item A univariate profile $\zeta$ plot is read like a normal probability plot
+    \begin{itemize}
+    \item a sigmoidal (elongated ``S''-shaped) pattern like that for
+      the \code{(Intercept)} parameter indicates overdispersion
+      relative to the normal distribution.
+    \item a bending pattern, usually flattening to the right of the
+      estimate, indicates skewness of the estimator and warns us that
+      the confidence intervals will be asymmetric
+    \item a straight line indicates that the confidence intervals
+      based on the quantiles of the standard normal distribution are suitable
+    \end{itemize}
+  \item If we map the $\zeta$ scale through the cdf, $\Phi$, for the
+    standard normal, $\mathcal{N}(0,1)$, we can derive a cdf and a
+    density for a distribution that would produce this profile.
+  \end{itemize}
+\end{frame}
+
+\begin{frame}[fragile]
+  \mode<beamer>{\frametitle{Profiles for parameters in \code{fm1} as densities}}
+  \mode<article>{The profiles for the parameters in model \code{fm1} are shown as densities in Fig.~\ref{fig:pr1density}}
+\begin{figure}[tb]
+  \centering
+<<pr1density,echo=FALSE,fig=TRUE,height=6>>=
+print(densityplot(pr1, layout=c(1,3), strip=FALSE, strip.left=TRUE))
+@   
+  \mode<article>{\caption{Profile densities for model \code{fm1}}\label{fig:pr1density}}
+\end{figure}
+\end{frame}
+
+\begin{frame}[fragile]
+  \mode<beamer>{\frametitle{Profile $\zeta$ plots on the scale of the variance components}}
+
+  Usually the variability estimates in a mixed-effects model are
+  quoted on the scale of the ``variance components'', $\sigma_1^2$ and
+  $\sigma^2$, not the standard deviations as we have shown.  On the variance scale the profiles are
+  \mode<article>{as shown in Fig.~\ref{fig:varianceProf}}
+  \begin{figure}[tb]
+    \centering
+<<varianceprof,fig=TRUE,echo=FALSE,height=4>>=
+print(xyplot(varianceProf(pr1), aspect=0.7, absVal=TRUE,
+             strip=strip.custom(factor.levels=expression(sigma[1]^2, sigma^2))))
+@   
+    \mode<article>{\caption{Profile $\zeta$ plots for the variance components in model \code{fm1ML}}\label{fig:varianceProf}}
+  \end{figure}
+\end{frame}
+
+\begin{frame}[fragile]
+  \frametitle{Densities of variance components}
+\begin{figure}[tb]
+  \centering
+<<pr1vardensity,echo=FALSE,fig=TRUE,height=6>>=
+print(densityplot(varianceProf(pr1), layout=c(1,2), strip=FALSE,
+                  strip.left=strip.custom(factor.levels=expression(sigma[1]^2, sigma^2))))
+@   
+  \mode<article>{\caption{Profile densities of variance components in model \code{fm1}}\label{fig:pr1vardensity}}
+\end{figure}
+\end{frame}
+
+\section{Practical implications}
+
+\begin{frame}
+  \frametitle{Some practical implications}
+  \begin{itemize}
+  \item We have been using maximum likelihood estimates.  For linear
+    mixed-effects models the REML estimates are often preferred
+    because they are assumed to be less biased.  (Many people assume
+    they are unbiased but, except in certain special cases, they're not.)
+  \item But bias is a property of the expected value of the
+    estimator.  So bias of a variance estimator relates to the mean of
+    one of those badly skewed distributions.  Why should we use the mean?
+  \item Similarly, it is common in simulation studies to compare
+    estimators or computational methods based on mean squared error.
+    That's not a meaningful criterion for skewed distributions of
+    estimators.
+  \end{itemize}
+\end{frame}
+
+\begin{frame}
+  \frametitle{A more reasonable scale} 
+  Distributions of the estimators are closer to being symmetric on the
+  scale of $\log(\sigma)$ and $\log(\sigma_1)$ (or, equivalently,
+  $\log(\sigma^2)$ and $\log(\sigma_1^2)$) except when 0 is in the
+  range of reasonable values,
+  \mode<article>{as shown in Fig.~\ref{fig:logsigProf}}
+  \begin{figure}[tb]
+    \centering
+<<logsigProf,fig=TRUE,echo=FALSE,height=4>>=
+print(xyplot(log(pr1), aspect=0.7, absVal=TRUE))
+@   
+    \mode<article>{\caption{Profile $\zeta$ plots for the logarithms of the variance components in model \code{fm1}}\label{fig:logsigProf}}
+  \end{figure}
+\end{frame}
+
+\begin{frame}
+  \mode<beamer>{\frametitle{Densities on the logarithm scale}}
+  \mode<article>{The profile densities on the scale of the logarithms of the variance components are whown in Fig.~\ref{fig:logsigpr1density}}
+    \begin{figure}[tb]
+      \centering
+<<logsigpr1density,echo=FALSE,fig=TRUE,height=6>>=
+print(densityplot(log(pr1), layout=c(1,3), strip=FALSE, strip.left=TRUE))
+@   
+  \mode<article>{\caption{Profile densities for model \code{fm1} using the logarithm scale for variance components}\label{fig:logsigpr1density}}
+    \end{figure}
+\end{frame}
+
+\section{Profile pairs}
+
+\begin{frame}\frametitle{Profile pairs plots}
+  \begin{itemize}
+  \item The information from the profile can be used to produce
+    pairwise projections of likelihood contours.  These correspond to
+    pairwise joint confidence regions.
+  \item Such a plot (next slide) can be somewhat confusing at first
+    glance.
+  \item Concentrate initially on the panels above the diagonal where
+    the axes are the parameters in the scale shown in the diagonal
+    panels.  The contours correspond to 50\%, 80\%, 90\%, 95\% and
+    99\% pairwise confidence regions.
+  \item The two lines in each panel are ``profile traces'', which are
+    the conditional estimate of one parameter given a value of the other.
+  \item The actual interpolation of the contours is performed on the
+    $\zeta$ scale which is shown in the panels below the diagonal.
+  \end{itemize}
+\end{frame}
+
+\begin{frame}[fragile]
+  \mode<beamer>{\frametitle{Profile pairs for model}}
+  \mode<article>{Figure~\ref{fig:pr1pairs} is produced by}
+<<pr1pairsshow,eval=FALSE>>=
+splom(pr1)
+@   
+\begin{figure}[tb]
+  \centering
+<<pr1pairs,echo=FALSE,fig=TRUE,height=6>>=
+print(splom(pr1))
+@   
+  \mode<article>{\caption{Profile pairs for model \code{fm1}}\label{fig:pr1pairs}}
+\end{figure}
+\end{frame}
+\begin{frame}[fragile]
+  \mode<beamer>{\frametitle{Profile pairs for the variance components}}
+  \mode<article>{The corresponding figure for the variance components is shown in Fig.~\ref{fig:pr1varpairs}}
+\begin{figure}[tb]
+  \centering
+<<pr1varpairs,echo=FALSE,fig=TRUE,height=6>>=
+print(splom(varianceProf(pr1)))
+@   
+  \mode<article>{\caption{Profile pairs for variance components for fmodel \code{fm1} }\label{fig:pr1varpairs}}
+\end{figure}
+\end{frame}
+
+\end{document}

Added: www/slides/2012-03-22-Paris/Profiling.pdf
===================================================================
(Binary files differ)


Property changes on: www/slides/2012-03-22-Paris/Profiling.pdf
___________________________________________________________________
Added: svn:mime-type
   + application/octet-stream

Added: www/slides/2012-03-22-Paris/Profiling.tex
===================================================================
--- www/slides/2012-03-22-Paris/Profiling.tex	                        (rev 0)
+++ www/slides/2012-03-22-Paris/Profiling.tex	2012-05-03 13:57:22 UTC (rev 1716)
@@ -0,0 +1,456 @@
+ % NOTE -- ONLY EDIT THE .Rnw FILE!!!  The .tex file is
+ % likely to be overwritten.
+
+\documentclass[dvipsnames,pdflatex,beamer]{beamer}
+% \documentclass[letterpaper,11pt,notitlepage]{article}\usepackage{beamerarticle}
+\mode<article>{\usepackage[text={6.2in,9in},centering]{geometry}}
+\mode<beamer>{\usetheme{Boadilla}\usecolortheme{seahorse}\usecolortheme{rose}}
+\usepackage{SweaveSlides}
+\title[Scales for D-optimal design]{Design for nonlinear mixed-effects\\Are variances a reasonable scale?}
+
+
+
+\mode<beamer>{\setkeys{Gin}{width=\textwidth}}
+\mode<article>{\setkeys{Gin}{width=0.8\textwidth}}
+\newcommand{\bLt}{\ensuremath{\bm\Lambda_\theta}}
+\begin{document}
+
+\mode<article>{\maketitle\tableofcontents}
+\mode<presentation>{\frame{\titlepage}}
+\mode<presentation>{\frame{\frametitle{Outline}\tableofcontents[pausesections,hideallsubsections]}}
+
+\section[Overview]{Overview}
+\begin{frame}
+  \frametitle{D-optimal experimental design for fixed-effects models}
+  \begin{itemize}
+  \item The purpose of D-optimal experimental design is to minimize
+    the volume of confidence regions or likelihood contours or HPD
+    regions on the parameters.
+  \item For simple cases (e.g. linear models with no random effects)
+    the choice of parameters does not affect the design.  In some ways
+    the only parameters that make sense are the coefficients of the
+    linear predictor and these are all equivalent up to linear
+    transformation.
+  \item For a nonlinear model the choice of parameters is less
+    obvious.  Nonlinear transformations of parameters can produce
+    dramatically better or worse linear approximations.  In terms of
+    likelihood contours or H.P.D. regions the ideal shape is
+    elliptical (i.e. a locally quadratic deviance function) but the
+    actual shape can be quite different.
+  \end{itemize}
+\end{frame}
+\begin{frame}
+  \frametitle{D-optimal design for mixed-effects models}
+  \begin{itemize}
+  \item For a linear mixed-effects model the choice of scale of the
+    variance components affects the shape of deviance or posterior
+    density contours.
+  \item For a nonlinear mixed-effects model, both the scale of the
+    variance components and the choice of model parameters affect the
+    shape of such contours.
+  \item These distorsions of shape are more dramatic when there are
+    fewer observations per group (i.e. per \code{Subject} or whatever
+    is the grouping factor).  But that is exactly the situation we are
+    trying to achieve.
+  \end{itemize}
+\end{frame}
+
+\section{Profiling nonlinear regression models}
+\label{sec:nonlinear}
+
+\begin{frame}[fragile]
+  \frametitle{Profiling nonlinear regression models}
+  \begin{itemize}
+  \item This is a very brief example of profiling nonlinear regression
+    models with a change of parameters.
+  \item Take data from a single subject in the \code{Theoph} data set
+    in \code{R}
+  \end{itemize}
+\includegraphics{figs/profiling-Theophdat}
+\end{frame}
+\begin{frame}[fragile]
+  \frametitle{My initial naive fit}
+\begin{Schunk}
+\begin{Sinput}
+> Theo.1 <- droplevels(subset(Theoph, Subject==1))
+> summary(fm1 <- nls(conc ~ SSfol(Dose, Time, lKe, lKa, lCl), Theo.1), corr=TRUE)
+\end{Sinput}
+\begin{Soutput}
+Formula: conc ~ SSfol(Dose, Time, lKe, lKa, lCl)
+Parameters:
+    Estimate Std. Error t value Pr(>|t|)
+lKe  -2.9196     0.1709 -17.085 1.40e-07
+lKa   0.5752     0.1728   3.328   0.0104
+lCl  -3.9159     0.1273 -30.768 1.35e-09
+
+Residual standard error: 0.732 on 8 degrees of freedom
+
+Correlation of Parameter Estimates:
+    lKe   lKa  
+lKa -0.56      
+lCl  0.96 -0.43
+
+Number of iterations to convergence: 8 
+Achieved convergence tolerance: 4.907e-06 
+\end{Soutput}
+\end{Schunk}
+\end{frame}
+\begin{frame}[fragile]
+  \frametitle{Following a suggestion from France Mentr\'{e}}
+\begin{Schunk}
+\begin{Sinput}
+> oral1cptSdlkalVlCl <- PKmod("oral", "sd", list(ka ~ exp(lka), k ~ exp(lCl)/V, V ~ exp(lV)))
+> summary(gm1 <- nls(conc ~ oral1cptSdlkalVlCl(Dose, Time, lV, lka, lCl), Theo.1, start=c(lV=-1, lka=0.5, lCl=-4)), corr=TRUE)
+\end{Sinput}
+\begin{Soutput}
+Formula: conc ~ oral1cptSdlkalVlCl(Dose, Time, lV, lka, lCl)
+Parameters:
+    Estimate Std. Error t value Pr(>|t|)
+lV  -0.99624    0.06022 -16.543 1.80e-07
+lka  0.57516    0.17282   3.328   0.0104
+lCl -3.91586    0.12727 -30.768 1.35e-09
+
+Residual standard error: 0.732 on 8 degrees of freedom
+
+Correlation of Parameter Estimates:
+    lV    lka  
+lka  0.68      
+lCl -0.61 -0.43
+
+Number of iterations to convergence: 9 
+Achieved convergence tolerance: 4.684e-06 
+\end{Soutput}
+\end{Schunk}
+\end{frame}
+\begin{frame}[fragile]
+  \frametitle{Contours based on profiling the objective, original}
+\includegraphics{figs/profiling-fm1splom}
+\end{frame}
+\begin{frame}[fragile]
+  \frametitle{Contours based on profiling the objective, revised formulation}
+\includegraphics{figs/profiling-gm1splom}
+\end{frame}
+\begin{frame}
+  \frametitle{Estimates based on optimizing a criterion}
+  \begin{itemize}
+  \item Maximum-likelihood estimators are an example of estimators
+    defined as the values that optimize a criterion -- maximizing the
+    log-likelihood or, equivalently, minimizing the deviance (negative
+    twice the log-likelihood).
+  \item Deriving the distribution of such an estimator can be
+    difficult (which is why we fall back on asymptotic properties) but,
+    for a given data set and model, we can assess the sensitivity of
+    the objective (e.g. the deviance) to the values of the parameters.
+  \item We can do this systematically by evaluating one-dimensional
+    ``profiles'' of the objective, through conditional optimization of
+    the objective.
+  \end{itemize}
+\end{frame}
+\begin{frame}
+  \frametitle{Profiling the objective}
+  \begin{itemize}
+  \item Profiling is based on conditional optimization of the
+    objective, fixing one or more parameters at particular values and
+    optimizing over the rest.
+  \item We will concentrate on one-dimensional profiles of the
+    deviance for mixed-effects models but the technique can be used
+    more generally.
+  \item We write the deviance as $d(\bm\phi|\bm y)$ where $\bm\phi$ is
+    the parameter vector of length $p$ and $\bm y$ is the vector of observed
+    responses.  Write the individual components of $\bm\phi$ as
+    $\phi_k,k=1,\dots,p$ and the complement of $\phi_i$ as $\bm\phi_{-i}$.
+  \item The profile deviance is $\tilde{d}_i(\phi_i)=\min_{\bm\phi_{-i}}d((\phi_i,\bm\phi_{-i})|\bm y)$.
+    The values of the other parameters at the optimum form the
+    \emph{profile traces}
+  \item If estimates and standard errors are an adequate summary then
+    the deviance should be a quadratic function of $\bm\phi$, i.e.{}
+    $\tilde{d}_i(\phi_i)$ should be a quadratic centered at $\hat{\phi}_i$
+    and the profile traces should be straight.
+  \end{itemize}
+\end{frame}
+
+\section{A Simple Example}
+
+\begin{frame}[fragile]
+  \frametitle{A Simple Example: the Dyestuff data}
+  The \code{Dyestuff} data in the \pkg{lme4} package for \proglang{R} are from the 
+  the classic book \Emph{Statistical Methods in
+  Research and Production}, edited by O.L. Davies  and first published
+  in 1947.
+  \mode<article>{Figure~\ref{fig:Dyestuffplot} shows these data}
+  \begin{figure}[tb]
+    \centering
+\includegraphics{figs/profiling-Dyestuffplot}
+    \mode<article>{\caption{The \code{Dyestuff} data from the \pkg{lme4} package for \proglang{R}.}\label{fig:Dyestuffplot}}
+  \end{figure}
+The line joins the mean yields of the six batches, which have been
+reordered by increasing mean yield.
+\end{frame}
+
+\begin{frame}
+  \frametitle{The effect of the batches}
+  \begin{itemize}
+  \item The particular batches observed are just a selection of the
+    possible batches and are entirely used up during the course of
+    the experiment.  
+  \item It is not particularly important to estimate and compare
+    yields from these batches.  Instead we wish to estimate the
+    variability in yields due to batch-to-batch variability.
+  \item The \code{Batch} factor will be used in \Emph{random-effects}
+    terms in models that we fit.
+  \item In the ``subscript fest'' notation such a model is
+    \begin{displaymath}
+      y_{i,j}=\mu + b_i+\epsilon_{i.j},\quad i=1,\dots,6;\: j=1,\dots,5
+    \end{displaymath}
+    with $\epsilon_{i,j}\sim\mathcal{N}(0,\sigma^2)$ and $b_i\sim\mathcal{N}(0,\sigma_1^2)$.
+  \item We obtain the maximum-likelihood estimates for such a model
+    using \code{lmer} with the optional argument, \code{REML=FALSE}.
+  \end{itemize}
+\end{frame}
+
+\begin{frame}[fragile]
+  \frametitle{Fitted model}
+\begin{Schunk}
+\begin{Sinput}
+> (fm1 <- lmer(Yield ~ 1 + (1|Batch), Dyestuff, REML=FALSE))
+\end{Sinput}
+\begin{Soutput}
+Linear mixed model fit by maximum likelihood ['lmerMod']
+Formula: Yield ~ 1 + (1 | Batch) 
+   Data: Dyestuff 
+      AIC       BIC    logLik  deviance 
+ 333.3271  337.5307 -163.6635  327.3271 
+
+Random effects:
+ Groups   Name        Variance Std.Dev.
+ Batch    (Intercept) 1388     37.26   
+ Residual             2451     49.51   
+Number of obs: 30, groups: Batch, 6
+
+Fixed effects:
+            Estimate Std. Error t value
+(Intercept)  1527.50      17.69   86.33
+\end{Soutput}
+\end{Schunk}
+\end{frame}
+
+\section{Profiling the fitted model}
+
+
+\begin{frame}[fragile]
+  \frametitle{Profiling the fitted model}
+\begin{Schunk}
+\begin{Sinput}
+> head(pr1 <- profile(fm1))
+\end{Sinput}
+\end{Schunk}
+\begin{Schunk}
+\begin{Soutput}
+       .zeta    .sig01   .sigma (Intercept)   .par
+1 -2.3243980  0.000000 61.96437      1527.5 .sig01
+2 -2.2270119  6.315107 60.74307      1527.5 .sig01
+3 -1.9527642 12.317030 57.71369      1527.5 .sig01
+4 -1.5986116 17.365631 54.69985      1527.5 .sig01
+5 -1.1872929 22.297821 52.32154      1527.5 .sig01
+6 -0.7326184 27.624184 50.72564      1527.5 .sig01
+\end{Soutput}
+\end{Schunk}
+$\dots$
+\begin{Schunk}
+\begin{Soutput}
+      .zeta    .sig01  .sigma (Intercept)        .par
+55 1.950491  55.23929 49.5101    1568.280 (Intercept)
+56 2.305893  63.77264 49.5101    1579.255 (Intercept)
+57 2.653119  74.71123 49.5101    1592.257 (Intercept)
+58 2.993207  88.72455 49.5101    1608.022 (Intercept)
+59 3.327036 106.75187 49.5101    1627.538 (Intercept)
+60 3.655342 130.10234 49.5101    1652.153 (Intercept)
+\end{Soutput}
+\end{Schunk}
+\end{frame}
+
+\begin{frame}
+  \frametitle{Reconstructing the profiled deviance}
+  In \code{pr1} the profiled deviance, $\tilde{d}_i(\phi_i)$ is
+  expressed on the \emph{signed square root} scale
+  \begin{displaymath}
+    \zeta_i(\phi_i)=\sgn(\phi_i-\hat{\phi_i})\sqrt{\tilde{d}_i(\phi_i)-d(\widehat{\bm\phi}|\bm y)}
+  \end{displaymath}
+  In the original scale, $\tilde{d}_i(\phi_i)$, the profiles are
+  \mode<article>{as shown in Figure~\ref{fig:pr1dev}}
+  \begin{figure}[tb]
+    \centering
+\includegraphics{figs/profiling-pr1dev}
+    \mode<article>{\caption{Profiled deviance for each of the parameters in fitted model \code{fm1}}\label{fig:pr1dev}}
+  \end{figure}
+\end{frame}
+
+\begin{frame}
+  \frametitle{After applying the square root}
+On the scale of $\sqrt{\tilde{d}_i(\phi_i)-d(\widehat{\bm\phi}|\bm y)}$ the profiles are
+  \mode<article>{as shown in Figure~\ref{fig:pr1abs}}
+  \begin{figure}[tb]
+    \centering
+\includegraphics{figs/profiling-pr1abs}
+    \mode<article>{\caption{Profiled deviance for each of the parameters in fitted model \code{fm1}}\label{fig:pr1abs}}
+  \end{figure}
+  We have added intervals corresponding to 50\%, 80\%, 90\%,
+  95\% and 99\% confidence intervals derived from the profiled
+  deviance.
+\end{frame}
+
+\begin{frame}
+  \frametitle{And, finally, on the $\zeta$ scale}
+  \mode<article>{the profiles are as in Figure~\ref{fig:pr1zeta}}
+  \begin{figure}[tb]
+    \centering
+\includegraphics{figs/profiling-pr1zeta}
+    \mode<article>{\caption{Profiled deviance for each of the parameters in fitted model \code{fm1}}\label{fig:pr1zeta}}
+  \end{figure}
+  The intervals are created by ``inverting'' likelihood ratio tests on
+  particular values of the parameter.
+\end{frame}
+
+\section{Representing profiles as densities}
+
+\begin{frame}
+  \frametitle{Representing profiles as densities}
+  \begin{itemize}
+  \item A univariate profile $\zeta$ plot is read like a normal probability plot
+    \begin{itemize}
+    \item a sigmoidal (elongated ``S''-shaped) pattern like that for
+      the \code{(Intercept)} parameter indicates overdispersion
+      relative to the normal distribution.
+    \item a bending pattern, usually flattening to the right of the
+      estimate, indicates skewness of the estimator and warns us that
+      the confidence intervals will be asymmetric
+    \item a straight line indicates that the confidence intervals
+      based on the quantiles of the standard normal distribution are suitable
+    \end{itemize}
+  \item If we map the $\zeta$ scale through the cdf, $\Phi$, for the
+    standard normal, $\mathcal{N}(0,1)$, we can derive a cdf and a
+    density for a distribution that would produce this profile.
+  \end{itemize}
+\end{frame}
+
+\begin{frame}[fragile]
+  \mode<beamer>{\frametitle{Profiles for parameters in \code{fm1} as densities}}
+  \mode<article>{The profiles for the parameters in model \code{fm1} are shown as densities in Fig.~\ref{fig:pr1density}}
+\begin{figure}[tb]
+  \centering
+\includegraphics{figs/profiling-pr1density}
+  \mode<article>{\caption{Profile densities for model \code{fm1}}\label{fig:pr1density}}
+\end{figure}
+\end{frame}
+
+\begin{frame}[fragile]
+  \mode<beamer>{\frametitle{Profile $\zeta$ plots on the scale of the variance components}}
+
+  Usually the variability estimates in a mixed-effects model are
+  quoted on the scale of the ``variance components'', $\sigma_1^2$ and
+  $\sigma^2$, not the standard deviations as we have shown.  On the variance scale the profiles are
+  \mode<article>{as shown in Fig.~\ref{fig:varianceProf}}
+  \begin{figure}[tb]
+    \centering
+\includegraphics{figs/profiling-varianceprof}
+    \mode<article>{\caption{Profile $\zeta$ plots for the variance components in model \code{fm1ML}}\label{fig:varianceProf}}
+  \end{figure}
+\end{frame}
+
[TRUNCATED]

To get the complete diff run:
    svnlook diff /svnroot/lme4 -r 1716