[Lme4-commits] r1480 - www/JSS

noreply at r-forge.r-project.org noreply at r-forge.r-project.org
Wed Dec 21 00:42:06 CET 2011


Author: dmbates
Date: 2011-12-21 00:42:05 +0100 (Wed, 21 Dec 2011)
New Revision: 1480

Modified:
   www/JSS/lmer.Rnw
Log:
Added material on the Laplace approximation and a figure.


Modified: www/JSS/lmer.Rnw
===================================================================
--- www/JSS/lmer.Rnw	2011-12-16 19:33:56 UTC (rev 1479)
+++ www/JSS/lmer.Rnw	2011-12-20 23:42:05 UTC (rev 1480)
@@ -64,10 +64,11 @@
 \newcommand*{\eq}[1]{eqn.~\ref{#1}}% or just {(\ref{#1})}
 %
 \SweaveOpts{engine=R,eps=FALSE,pdf=TRUE,width=10,height=6.5,strip.white=all}
-\SweaveOpts{prefix=TRUE,prefix.string=figs/simple,include=TRUE}
+\SweaveOpts{prefix=TRUE,prefix.string=figs/,include=TRUE}
 \SweaveOpts{keep.source=TRUE}
 <<preliminaries,echo=FALSE,results=hide>>=
 options(width=69,show.signif.stars=FALSE,str=strOptions(strict.width="cut"))
+library(lme4Eigen)
 @
 \begin{document}
 \section{Introduction}
@@ -80,12 +81,13 @@
 that incorporates both fixed-effects terms and random-effects terms in
 a linear predictor expression from which the conditional mean of the
 response can be evaluated.  In this paper we describe the formulation
-and representation of linear mixed models.  The techniques used for
-the other types of models will be described separately.
+and representation of linear and generalized linear mixed models.  The
+techniques used for nonlinear mixed models will be described
+separately.
 
 Just as a linear model can be described in terms of the distribution
 of $\mc{Y}$, the vector-valued random variable whose observed value is
-$\bm y$, the observed response vector, a linear mixed model can be
+$\yobs$, the observed response vector, a linear mixed model can be
 described by the distribution of two vector-valued random variables:
 $\mc{Y}$, the response and $\mc{B}$, the vector of random effects.  In
 a linear model the distribution of $\mc Y$ is multivariate normal,
@@ -200,10 +202,10 @@
   \end{aligned}
 \end{equation}
 
-To obtain an expression for the likelihood it is convenient to
-distinguish between a general argument, $\bm y$, and the particular
-value of the observed response, which we will write as $\yobs$
-for the remainder of this section.
+% To obtain an expression for the likelihood it is convenient to
+% distinguish between a general argument, $\bm y$, and the particular
+% value of the observed response, which we will write as $\yobs$
+% for the remainder of this section.
 The \emph{likelihood} of the parameters, $\bm\theta$, $\bm\beta$ and
 $\sigma^2$, given the observed data is the value of the
 marginal density of $\mc Y$, evaluated at $\yobs$.  That is
@@ -253,9 +255,9 @@
     \begin{bmatrix}\bm Z\bLt\\\bm I_q\end{bmatrix}
     \bm u\right\|^2.
 \end{equation}
-The contribution to the residual sum of squares from the additional
-observations, or pseudo-data, is exactly the penalty term, $\left\|\bm
-  u\right\|^2$.
+The contribution to the residual sum of squares from the ``pseudo''
+observations appended to $\yobs-\bm X\bm\beta$, is exactly the penalty
+term, $\left\|\bm u\right\|^2$.
 
 From \eq{eq:pseudoData} we can see that the conditional mean satisfies
 \begin{equation}
@@ -290,10 +292,10 @@
 than the numeric phase.  During the course of determining the maximum
 likelihood or REML estimates of the parameters in a linear
 mixed-effects model we may need to evaluate $\bm L_\theta$ for many
-different values of $\bm\theta$, but doing so only involves the
-numeric phase.  Changing $\bm\theta$ can change the values of the
-non-zero elements in $\bm L$ but does not change their positions.
-That is, the symbolic phase only needs to be done once.
+different values of $\bm\theta$, but each evaluation after the first
+requires only the numeric phase.  Changing $\bm\theta$ can change the
+values of the non-zero elements in $\bm L$ but does not change their
+positions.  Hence, the symbolic phase must be done only once.
 
 The \code{Cholesky} function in the \pkg{Matrix} package for
 \proglang{R} performs both the symbolic and numeric phases of the
@@ -373,7 +375,7 @@
 straightforward, we can form the conditional estimate
 \begin{equation}
   \label{eq:conddev}
-  \widehat{\sigma^2}(\bm\theta,\bm\beta)=\frac{r^2(\bm\theta,\bm\beta)}{n}
+  \widehat{\sigma^2}(\bm\theta,\bm\beta)=\frac{r^2(\bm\theta,\bm\beta)}{n} ,
 \end{equation}
 producing the \emph{profiled deviance}
 \begin{equation}
@@ -425,7 +427,7 @@
 As before we will use the sparse Cholesky decomposition producing,
 $\bm L_\theta$, the sparse Cholesky factor, and $\bm P$, the
 permutation matrix, satisfying $\bm L_\theta\bm L_\theta\trans=\bm
-P(\bLt\trans\bm Z\trans\bm Z\bLt+\bm I)$ and $\bm c_{\bm u}$, the
+P(\bLt\trans\bm Z\trans\bm Z\bLt+\bm I)\bm P\trans$ and $\bm c_{\bm u}$, the
 solution to $\bm L_\theta\bm c_{\bm u}=\bm P\bLt\trans\bm Z\trans\yobs$.
 
 We extend the decomposition with the $q\times p$ matrix $\bm
@@ -495,12 +497,12 @@
 % \label{sec:GLMMs}
 
 The structures in \pkg{lme4} for representing mixed-models are
-somewhat more general than is required for linear mixed models.  In
-the remainder of this section we briefly describe some of the
-computational requirements for generalized linear mixed models, to
-show why these more general structures are employed.
+somewhat more general than is required for linear mixed models.  They
+are designed to also be used for generalized linear mixed models
+(GLMMs) and nonlinear mixed models (NLMMs), which we describe in the
+next sections.
 
-\subsection{Definition of GLMMs}
+\section{Generalized Linear Mixed Models}
 \label{sec:GLMMdef}
 
 The generalized linear mixed models (GLMMs) that can be fit by the
@@ -509,7 +511,8 @@
 (eqn.~\ref{eq:LMMuncondB}).  Because most families used for the conditional
 distribution, $\mc Y|\mc B=\bm b$, do not incorporate a separate scale
 factor, $\sigma$, we remove it from the definition of $\bm\Sigma$ and
-from the distribution of the spherical random effects, $\mc U$.  That is
+from the distribution of the spherical random effects, $\mc U$.  That
+is, 
 \begin{equation}
   \label{eq:UdistGLMM}
   \mc U\sim\mc N(\bm0,\bm I_q)
@@ -579,18 +582,47 @@
 to $\bm u$ so the unscaled conditional density is indeed well-defined
 as a density, up to a scale factor.
 
-\subsection{Determining the conditional mode}
-\label{sec:conditionalMode}
+To evaluate the integrand in (\ref{eq:GLMMlike}) we use the value of
+the \code{dev.resids} function in the GLM family.  This vector,
+$\bm d(\yobs,\bm u)$, with elements, $d_i(\yobs,\bm u), i=1,\dots,n$,
+provides the deviance of a generalized linear model as
+\begin{displaymath}
+  \sum_{i=1}^n d_i(\yobs,\bm u) .
+\end{displaymath}
+(We should note that there 
+some confusion in \proglang{R} (and in its predecessor,
+\proglang{S}) about what exactly the deviance residuals
+for a family are.  As indicated above, we will use this name for the
+value of the \code{dev.resids} function in the family.  The signed
+square root of this vector, using the signs of $\yobs-\mu$, is returned
+from \code{residuals} applied to a fitted model
+of class \code{"glm"} when \code{type="deviance"}, the
+default, is specified.  Both are called ``deviance residuals''
+in the documentation but, although they are related, they are not the same.)
 
+The likelihood can now be expressed as
+\begin{equation}
+  \label{eq:GLMMlike1}
+  L(\bm\beta,\bm\theta|\yobs)=
+  \int_{\mathbb{R}^q}\exp\left(-\frac{\sum_{i=1}^nd_i(\yobs,\bm u)+\|\bm u\|^2}{2}\right)\,(2\pi)^{-q/2}\,d\bm u
+\end{equation}
+
 As for linear mixed models, we simplify evaluation of the integral
 (\ref{eq:GLMMlike}) by determining the value, $\tilde{\bm
-  u}_{\beta,\theta}$, that maximizes the unscaled conditional density.
+  u}_{\beta,\theta}$, that maximizes the integrand.
 When the conditional density, $\mc U|\mc Y=\yobs$, is multivariate
 Gaussian, this conditional mode will also be the conditional mean.
 However, for most families used in GLMMs, the mode and the mean need
 not coincide so use the more general term and call $\tilde{\bm
-  u}_{\beta,\theta}$ the \emph{conditional mode}.
+  u}_{\beta,\theta}$ the \emph{conditional mode}.  We first describe 
+the numerical methods
+for determining the conditional mode using the Penalized Iteratively
+Reweighted Least Squares (PIRLS) algorithm then return to the question
+of evaluating the integral (\ref{eq:GLMMlike}).
 
+\subsection{Determining the conditional mode}
+\label{sec:conditionalMode}
+
 The iteratively reweighted least squares (IRLS) algorithm is an
 incredibly efficient method of determining the maximum likelihood
 estimates of the coefficients in a generalized linear model.  We
@@ -656,17 +688,137 @@
   \bm L_{\beta,\theta}\bm L_{\beta,\theta}\trans =
   \bm P\left(\bLt\trans\bm Z\trans\bm M\bm W\bm M\bm Z\bLt+\bm I_q\right)\bm P\trans
 \end{equation}
-The integrand in the likelihood (\ref{eq:GLMMlike}) is approximately a constant
-times the density of the $\mathcal{N}(\tilde{\bm u},\bm L\bm L\trans)$
-distribution.  The \emph{Laplace approximation} to the deviance is
+
+\subsection{Evaluating the likelihood for GLMMs using the Laplace approximation}
+\label{sec:Laplace}
+
+A second-order Taylor series approximation to $-2\log[f_{\mc
+  Y,\mc U}(\yobs,\bm u)]$ based at $\tilde{\bm u}$ provides an approximation of
+unscaled conditional density as a multiple of the density for the
+multivariate Gaussian $\mathcal{N}(\tilde{\bm u},\bm L\bm L\trans)$.
+The change of variable
 \begin{equation}
-  \label{eq:Laplace}
-  d(\bm\beta,\bm\theta|\bm y)=d_g(\yobs,\bm\mu(\tilde{\bm u}))+
-  \|\tilde{\bm u}\|^2+\log(|\bm L|^2)
+  \label{eq:LaplaceChg}
+  \bm u = \tilde{\bm u} + \bm L\bm z
 \end{equation}
-where $d_g(\yobs,\bm\mu(\tilde{\bm u}))$ is the GLM deviance for
-$\yobs$ and $\bm\mu(\tilde{\bm u})$.
+provides
+\begin{equation}
+  \label{eq:GLMMLaplace}
+  \begin{aligned}
+    L(\bm\beta,\bm\theta|\yobs)&=\int_{\mathbb{R}^q}f_{\mc Y,\mc U}(\yobs,\bm u)\,d\bm u\\
+    &\approx \tilde{f}\,|\bm L|\, \int_{\mathbb{R}^q}e^{-\|\bm z\|^2/2}\,(2\pi)^{-q/2}\,d\bm z\\
+    =\tilde{f}\,\abs(|\bm L|)
+  \end{aligned}
+\end{equation}
+or, on the deviance scale,
+\begin{equation}
+  \label{eq:LaplaceDev}
+  -2\ell(\bm\beta,\bm\theta|\yobs)\approx\sum_{i=1}^nd_i(\yobs,\tilde{\bm u}) +
+    \|\tilde{\bm u}\|^2 + \log(|\bm L|^2)+\frac{q}{2}\log(2\pi)
+\end{equation}
 
+\subsubsection{Decomposing the deviance for simple models}
+\label{sec:simplescalar}
+
+A special, but not uncommon, case is that of scalar random effects
+associated with levels of a single grouping factor, $\bm h$.  In this
+case the dimension, $q$, of the random effects is the number of levels
+of $\bm h$ --- i.e.{} there is exactly one random effect associated
+with each level of $\bm h$.  We will write the vector of
+variance-covariance parameters, which is one-dimensional, as a scalar,
+$\theta$.  The matrix $\bm\Lambda_{\bm\theta}$ is a multiple of the
+identity, $\theta\bm I_q$, and $\bm Z$ is the $n\times q$ matrix of
+indicators of the levels of $\bm f$.  The permutation matrix, $\bm
+P$, can be set to the identity and $\bm L$ is diagonal, but not
+necessarily a multiple of the identity.  
+
+Because each element of $\bm\mu$ depends on only one element of $\bm
+u$ and the elements of $\mc Y$ are conditionally independent, given
+$\mc U=\bm u$, the conditional densities of the $u_j,j=1,\dots,q$
+given $\mc Y=\yobs$ are independent.  We partition the indices
+$1,\dots,n$ as $\mathbb{I}_j,j=1,\dots,q$ according to the levels of
+$\bm h$.  That is, the index $i$ is in $\mathbb{I}_j$ if $h_i=j$.
+This partitioning also applies to the deviance residuals in that
+the $i$th deviance residual depends only on $u_j$ when $i\in\mathbb{I}_j$.
+
+Writing the univariate conditional densities as
+\begin{equation}
+  \label{eq:univariateCondDens}
+  f_j(\yobs,u_j)=\exp\left(-\frac{\sum_{i\in\mathbb{I}_j}d_i(\yobs, u_j)+u_j^2}{2}\right)(2\pi)^{-1/2}
+\end{equation}
+we have
+\begin{equation}
+  \label{eq:vectorCondDens}
+  f_{\mc Y,\mc U}(\yobs,\bm u)=\prod_{j=1}^q f_j(\yobs,u_j)
+\end{equation}
+and
+\begin{equation}
+  \label{eq:ssLike}
+  \begin{aligned}
+    L(\bm\beta,\bm\theta|\yobs)=\prod_{j=1}^q\int_{\mathbb{R}}f_j(\yobs,u)\,du
+  \end{aligned}
+\end{equation}
+
+We consider this special case both because it occurs frequently and
+because, for some software, it is the only type of GLMM that can be
+fit.  Also, in this particular case we can graphically assess the
+quality of the Laplace approximation by comparing the actual integrand
+to its approximation.
+
+Consider the \code{cbpp} data on contagious bovine pleuropneumonia
+incidence according to season and herd, available in the \pkg{lme4} package.
+<<strcbpp>>=
+str(cbpp)
+@ 
+and the model
+<<m1>>=
+(m1 <- glmer(cbind(incidence, size-incidence) ~ period + (1|herd),
+             cbpp, binomial))
+@ 
+As indicated, this model has been fit using the Laplace approximation.
+One way of assessing the quality of the Laplace approximation is to
+evalate the unscaled conditional density at $u_j(z)=\tilde{u_j} +
+z/{\bm L_{j,j}}$ and compare it to $e^{-z^2/2}/\sqrt{2\pi}$.  To avoid
+complications with the vertical scale, we plot the ratio,
+$f_j(\yobs,u)/(\tilde{f_j}\sqrt{2\pi})$
+\begin{figure}[p]
+  \centering
+<<fig=TRUE,echo=FALSE>>=
+zeta <- function(m, zmin=-3, zmax=3, npts=301L) {
+    stopifnot (is(m, "glmerMod"),
+              length(m at flist) == 1L,    # single grouping factor
+              length(m at cnms[[1]]) == 1L) # single column for that grouping factor
+    pp <- m at pp
+    rr <- m at resp
+    u0 <- pp$u0
+    sd <- 1/pp$L()@x
+    ff <- as.integer(m at flist[[1]])
+    fc <- pp$X %*% pp$beta0             # fixed-effects contribution to linear predictor
+    ZL <- t(pp$Lambdat %*% pp$Zt)
+    dc <- function(z) { # evaluate the unscaled conditional density on the deviance scale
+        uu <- u0 + z * sd
+        rr$updateMu(fc + ZL %*% uu)
+        unname(as.vector(tapply(rr$devResid(), ff, sum))) + uu * uu
+    }
+    zvals <- seq(zmin, zmax, length.out = npts)
+    d0 <- dc(0) # because this occurs last, the model is restored to its previous state
+    list(zvals=zvals, sqrtmat=t(sqrt(vapply(zvals, dc, d0, USE.NAMES=FALSE) - d0)) * # signed square root
+         array(ifelse(zpts < 0, -1, 1), c(npts, length(u0))))
+}
+zm <- zeta(m1)
+dmat <- exp(-0.5*zm$sqrtmat^2)/sqrt(2*pi)
+xyplot(as.vector(dmat) ~ rep.int(zm$zvals, ncol(dmat))|herd, type=c("g","l"), panel=function(...){ panel.lines(zm$zvals, dnorm(zm$zvals), lty=2);panel.xyplot(...)})
+@ 
+  \caption{Comparison of univariate integrands (solid line) and standard normal density (dashed line)}
+  \label{fig:densities}
+\end{figure}
+
+\section{Adaptive Gauss-Hermite quadrature for GLMMs}
+
+In a GLMM with a single, scalar random-effects term the Laplace
+approximation can be enhanced using the one-dimensional version
+\emph{adaptive Gauss-Hermite} quadrature.
+
 \bibliography{lmer}
 \end{document}
 



More information about the Lme4-commits mailing list