[Lme4-commits] r1840 - www/JSS
noreply at r-forge.r-project.org
noreply at r-forge.r-project.org
Sun Aug 18 02:13:11 CEST 2013
Author: walker
Date: 2013-08-18 02:13:11 +0200 (Sun, 18 Aug 2013)
New Revision: 1840
Modified:
www/JSS/glmerAppendix.tex
Log:
bit better notation now, but not perfect at all
Modified: www/JSS/glmerAppendix.tex
===================================================================
--- www/JSS/glmerAppendix.tex 2013-08-17 22:58:05 UTC (rev 1839)
+++ www/JSS/glmerAppendix.tex 2013-08-18 00:13:11 UTC (rev 1840)
@@ -134,14 +134,138 @@
is a function of $\bm\eta$, where this function is standardly referred
to as the inverse link function.
-Instead of maximizing $L(\bm\beta, \bm\theta | \bm y, \bm u)$, we
-typically think of minimizing the penalized
-deviance,
+Our goal is to find the values of $\bm u$ that maximize the unscaled
+conditional density, for given $\bm\theta$ and $\bm\beta$
+vectors. These maximizers are the conditional modes, which we require
+for the Laplace approximation and adaptive Gauss-Hermite
+quadrature. To do this maximization we use a variant of the Fisher
+scoring method, which is the basis of the iteratively reweighted least
+squares algorithm for generalized linear models. Fisher scoring is itself
+based on Newton's method, which we apply first.
+
+\subsection{Newton's method}
+
+To apply Newton's method, we need the gradient and the Hessian of the
+unscaled conditional log-likelihood. Following standard GLM theory
+(e.g. McCullagh and Nelder 1989), we use the chain rule,
+\begin{displaymath}
+\frac{d L(\bm\beta, \bm\theta | \bm y, \bm u)}{d \bm u} =
+\frac{d L(\bm\beta, \bm\theta | \bm y, \bm u)}{d \bm\psi}
+\frac{d \bm\psi}{d \bm\mu}
+\frac{d \bm\mu}{d \bm\eta}
+\frac{d \bm\eta}{d \bm u}
+\end{displaymath}
+The first derivative in this chain follow from basic results in GLM theory,
+\begin{displaymath}
+\frac{d L(\bm\beta, \bm\theta | \bm y, \bm u)}{d \bm\psi} =
+(\bm y - \bm\mu)^\top \bm A
+\end{displaymath}
+Again from standard GLM theory, the next two derivatives define the inverse diagonal variance
+matrix,
+\begin{displaymath}
+\frac{d \bm\psi}{d \bm\mu} = \bm V^{-1}
+\end{displaymath}
+and the diagonal Jaccobian matrix,
+\begin{displaymath}
+\frac{d \bm\mu}{d \bm\eta} = \bm M
+\end{displaymath}
+Finally, because $\bm\beta$ affects $\bm\eta$ only linearly,
+\begin{displaymath}
+\frac{d \bm\eta}{d \bm u} = \bm Z \bm\Lambda_\theta
+\end{displaymath}
+Therefore we have,
\begin{equation}
-\mathrm{PDEV} = -2 L(\bm\beta, \bm\theta | \bm y, \bm u)
+\frac{d L(\bm\beta, \bm\theta | \bm y, \bm u)}{d \bm u} =
+(\bm y - \bm\mu)^\top \bm A
+\bm V^{-1}
+\bm M
+\bm Z \bm\Lambda_\theta +
+\bm u^\top
+\label{eq:dPDEVdu}
\end{equation}
-To minimize PDEV, we use a penalized iteratively reweighted least
-squares (PIRLS) algorithm. There are five steps to deriving PIRLS:
+This is very similar to the gradient for GLMs with respect to fixed
+effects coefficients, $\bm\beta$. The only difference induced by
+differentiating with respect to the random effects, $\bm u$, is the
+addition of the $\bm u^\top$ term.
+
+Again we apply the chain rule to take the Hessian,
+\begin{equation}
+\frac{d^2 L(\bm\beta, \bm\theta | \bm y, \bm u)}{d \bm u d \bm u}
+\frac{d^2 L(\bm\beta, \bm\theta | \bm y, \bm u)}{d \bm u d \bm\mu}
+\frac{d \bm\mu}{d \bm\eta}
+\frac{d \bm\eta}{d \bm u} + \bm I_q
+\end{equation}
+which leads to,
+\begin{equation}
+\frac{d^2 L(\bm\beta, \bm\theta | \bm y, \bm u)}{d \bm u d \bm u} =
+\frac{d^2 L(\bm\beta, \bm\theta | \bm y, \bm u)}{d \bm u d \bm\mu}\bm M \bm Z \bm\Lambda_\theta
+\end{equation}
+The first derivative in this chain can be expressed as,
+\begin{equation}
+\frac{d^2 L(\bm\beta, \bm\theta | \bm y, \bm u)}{d \bm u d \bm\mu} =
+-\bm\Lambda_\theta^\top \bm Z^\top \bm M \bm V^{-1} \bm A +
+\bm\Lambda_\theta^\top \bm Z^\top \left[ \frac{d \bm M \bm V^{-1}}{d \bm\mu} \right] \bm A \bm R
+\end{equation}
+where $\bm R$ is a diagonal residuals matrix with $\bm y-\bm\mu$ on the
+diagonal. The two terms arise from a type of product rule, where we
+first differentiate the residuals, $\bm y-\bm\mu$, and then the diagonal
+matrix, $\bm M \bm V^{-1}$, with respect to $\bm\mu$.
+
+The Hessian can therefore be expressed as,
+\begin{equation}
+\frac{d^2 L(\bm\beta, \bm\theta | \bm y, \bm u)}{d \bm u d \bm u} =
+-\bm \Lambda_\theta^\top \bm Z^\top \bm M \bm A^{1/2}\bm V^{-1/2}\left(
+\bm I_n -
+\bm V \bm M^{-1}\left[ \frac{d \bm M \bm V^{-1}}{d \bm\mu} \right] \bm R
+\right) \bm V^{-1/2}\bm A^{1/2} \bm M \bm Z \bm\Lambda_\theta + \bm I_q
+\label{eq:betaHessian}
+\end{equation}
+This result can be simplified by expressing it in terms of a weighted
+random-effects design matrix, $\bm U = \bm A^{1/2}\bm V^{-1/2}\bm M \bm Z \bm\Lambda_\theta$,
+\begin{equation}
+\frac{d^2 L(\bm\beta, \bm\theta | \bm y, \bm u)}{d \bm u d \bm u} =
+-\bm U^\top\left(
+\bm I_n -
+\bm V \bm M^{-1}\left[ \frac{d \bm V^{-1}\bm M}{d \bm\mu} \right] \bm R
+\right) \bm U + \bm I_q
+\label{eq:betaHessian}
+\end{equation}
+
+\subsection{Fisher-like scoring}
+
+
+
+There are two ways to further simplify this expression for $\bm U^\top \bm U$. The
+first is to use the canonical link function for the family being
+used. Canonical links have the property that $\bm V = \bm M$, which means
+that for canonical links,
+\begin{equation}
+\frac{d^2 L(\bm\beta, \bm\theta | \bm y, \bm u)}{d \bm u d \bm u} =
+-\bm U^\top\left(
+\bm I_n -
+\bm I_n \left[ \frac{d \bm I_n}{d \bm\mu} \right] \bm R
+\right) \bm U + \bm I_q = \bm U^\top \bm U + \bm I_q
+\end{equation}
+The second way to simplify the Hessian is to take its expectation with
+respect to the distribution of the response, conditional on the current
+values of the spherized random effects coefficients, $\bm u$. The
+diagonal residual matrix, $\bm R$, has expectation 0. Therefore,
+because the response only enters into the expression for the Hessian
+via $\bm R$, we have that,
+\begin{equation}
+E\left(\frac{d^2 L(\bm\beta, \bm\theta | \bm y, \bm u)}{d \bm u d \bm
+ u} | \bm u \right) =
+-\bm U^\top\left(
+\bm I_n -
+\bm U \bm M^{-1}\left[ \frac{d \bm V^{-1}\bm M}{d\mu} \right] E(\bm R)
+\right) \bm U + \bm I_q = \bm U^\top \bm U + \bm I_q
+\label{eq:betaHessian}
+\end{equation}
+
+\subsection{Gauss-Markov}
+
+\subsection{Outline of derivation (inconsistent currently)}
+
\begin{enumerate}
\item \textbf{Stationary points of PDEV}
\begin{itemize}
More information about the Lme4-commits
mailing list