[Lme4-commits] r1817 - www/JSS
noreply at r-forge.r-project.org
noreply at r-forge.r-project.org
Sun Jun 2 18:30:52 CEST 2013
Author: bbolker
Date: 2013-06-02 18:30:52 +0200 (Sun, 02 Jun 2013)
New Revision: 1817
Modified:
www/JSS/glmer.Rnw
www/JSS/lmer.Rnw
Log:
updates (linenomath solution, tweaks)
Modified: www/JSS/glmer.Rnw
===================================================================
--- www/JSS/glmer.Rnw 2013-05-31 17:30:10 UTC (rev 1816)
+++ www/JSS/glmer.Rnw 2013-06-02 16:30:52 UTC (rev 1817)
@@ -387,7 +387,7 @@
panel.xyplot(...)}
))
@
- \caption{Comparison of univariate integrands (solid line) and standard normal density function (dashed line) \bmb{is something wrong? Do these agree TOO well?}}
+ \caption{Comparison of univariate integrands (solid line) and standard normal density function (dashed line) \bmb{consider Q-Q plots to emphasize deviations from normality?}
\label{fig:densities}
\end{figure}
Modified: www/JSS/lmer.Rnw
===================================================================
--- www/JSS/lmer.Rnw 2013-05-31 17:30:10 UTC (rev 1816)
+++ www/JSS/lmer.Rnw 2013-06-02 16:30:52 UTC (rev 1817)
@@ -96,9 +96,9 @@
that incorporates both fixed-effects terms and random-effects terms in
a linear predictor expression from which the conditional mean of the
response can be evaluated. In this paper we describe the formulation
-and representation of linear and generalized linear mixed models. The
-techniques used for nonlinear mixed models will be described
-separately.
+and representation of linear mixed models. The
+techniques used for generalized linear and nonlinear
+mixed models will be described separately.
Just as a linear model can be described in terms of the distribution
of $\mc{Y}$, the vector-valued random variable whose observed value is
@@ -106,10 +106,12 @@
described by the distribution of two vector-valued random variables:
$\mc{Y}$, the response and $\mc{B}$, the vector of random effects. In
a linear model the distribution of $\mc Y$ is multivariate normal,
+\begin{linenomath}
\begin{equation}
\label{eq:linearmodel}
\mc Y\sim\mc{N}(\bm X\bm\beta,\sigma^2\bm I_n),
\end{equation}
+\end{linenomath}
where $n$ is the dimension of the response vector, $\bm I_n$ is the
identity matrix of size $n$, $\bm\beta$ is a $p$-dimensional
coefficient vector and $\bm X$ is an $n\times p$ model matrix. The
@@ -118,29 +120,35 @@
In a linear mixed model it is the \emph{conditional} distribution of
$\mc Y$ given $\mc B=\bm b$ that has such a form,
+\begin{linenomath}
\begin{equation}
\label{eq:LMMcondY}
(\mc Y|\mc B=\bm b)\sim\mc{N}(\bm X\bm\beta+\bm Z\bm b,\sigma^2\bm I_n)
\end{equation}
+\end{linenomath}
where $\bm Z$ is the $n\times q$ model matrix for the $q$-dimensional
vector-valued random effects variable, $\mc B$, whose value we are
fixing at $\bm b$. The unconditional distribution of $\mc B$ is also
multivariate normal with mean zero and a parameterized $q\times q$
variance-covariance matrix, $\bm\Sigma$,
+\begin{linenomath}
\begin{equation}
\label{eq:LMMuncondB}
\mc B\sim\mc N(\bm0,\bm\Sigma) .
\end{equation}
+\end{linenomath}
As a variance-covariance matrix, $\bm\Sigma$ must be positive
semidefinite. It is convenient to express the model in terms of a
\emph{relative covariance factor}, $\bm\Lambda_\theta$, which is a
$q\times q$ matrix, depending on the \emph{variance-component
parameter}, $\bm\theta$, and generating the symmetric $q\times q$
variance-covariance matrix, $\bm\Sigma$, according to
+\begin{linenomath}
\begin{equation}
\label{eq:relcovfac}
\bm\Sigma_\theta=\sigma^2\bm\Lambda_\theta\bm\Lambda_\theta\trans ,
\end{equation}
+\end{linenomath}
where $\sigma$ is the same scale factor as in the conditional
distribution (\ref{eq:LMMcondY}).
@@ -168,16 +176,16 @@
\hline
$\bm X$ & Fixed-effect design matrix & \code{getME(.,"X")} \\
$\bm Z$ & Sparse random-effect design matrix & \code{getME(.,"Z")} \\
- $\bm \theta$ & Variance-covariance parameter vector (log-Cholesky scale) &
+ $\bm \theta$ & Variance-covariance parameter vector \newline (Cholesky scale) &
\code{getME(.,"theta")} \\
$\bm \beta$ & Fixed-effect coefficients & \code{fixef(.)} [\code{getME(.,"beta")}]\\
$\sigma^2$ & Residual variance & \verb+sigma(.)^2+ \\
- $\mc{Y}$ & Response variable & \\
- $\bm\Lambda_{\bm\theta}$ & Relative covariance factor & \\
- $\bm L_\theta$ & Sparse Cholesky factor & \code{getME(.,"Lambda")} \\
+ $\mc{Y}$ & Response variable & \code{getME(.,"y")} \\
+ $\bm\Lambda_{\bm\theta}$ & Relative covariance factor & \code{getME(.,"Lambda")} \\
+ $\bm L_\theta$ & Sparse Cholesky factor & \code{getME(.,"L")}\\
$\mc B$ & Random effects & \\
$\mc U$ & Spherical random effects & \\
- $\bm u$ & Conditional modes of random effects & \code{ranef(.)} [\code{getME(.,"u")}] ? \\
+ $\bm u$ & Conditional modes of random effects & \code{getME(.,"u")} \\
$\bm P$ & Fill-reducing permutation &
\end{tabular}
\caption[Table of notation]{Table of notation and \pkg{lme4} equivalents
@@ -197,15 +205,13 @@
distributions are called ``spherical'' because contours of the
probability density are spheres.}
\emph{random effects} variable, $\mc U$, with distribution
+
+\begin{linenomath}
\begin{equation}
- \label{eq:sphericalRE}
- \mc U\sim\mathcal{N}(\bm 0,\sigma^2\bm I_q),
-\end{equation}
-and set
-\begin{equation}
\label{eq:BU}
\mc B=\bm\Lambda_\theta\mc U,
\end{equation}
+\end{linenomath}
then $\mc B$ will have the desired $\mathcal{N}(\bm
0,\bm\Sigma_\theta)$ distribution.
@@ -221,11 +227,14 @@
optimization of the criterion.
The model can now be defined in terms of
+\begin{linenomath}
\begin{equation}
\label{eq:LMMcondYU}
(\mc Y|\mc U=\bm u)\sim\mc{N}(\bm Z\bm\Lambda_\theta\bm u+\bm X\bm\beta,\sigma^2\bm I_n)
\end{equation}
+\end{linenomath}
producing the joint density function
+\begin{linenomath}
\begin{equation}
\label{eq:jointDens}
\begin{aligned}
@@ -239,6 +248,7 @@
{(2\pi\sigma^2)^{(n+q)/2}} .
\end{aligned}
\end{equation}
+\end{linenomath}
% To obtain an expression for the likelihood it is convenient to
% distinguish between a general argument, $\bm y$, and the particular
@@ -247,19 +257,23 @@
The \emph{likelihood} of the parameters, $\bm\theta$, $\bm\beta$ and
$\sigma^2$, given the observed data is the value of the
marginal density of $\mc Y$, evaluated at $\yobs$. That is
+\begin{linenomath}
\begin{equation}
\label{eq:likelihoodLMM}
L(\bm\theta,\bm\beta,\sigma^2|\yobs)=\int_{\mathbb{R}^q}f_{\mc Y,\mc
U}(\yobs,\bm u)\,d\bm u .
\end{equation}
+\end{linenomath}
The integrand of \eq{eq:likelihoodLMM} is the \emph{unscaled
conditional density} of $\mc U$ given $\mc Y=\yobs$. The
conditional density of $\mc U$ given $\mc Y=\yobs$ is
+\begin{linenomath}
\begin{equation}
\label{eq:condDens}
f_{\mc U|\mc Y}(\bm u|\yobs)=\frac{f_{\mc Y,\mc U}(\yobs,\bm u)}
{\int f_{\mc Y,\mc U}(\yobs,\bm u)\,d\bm u}
\end{equation}
+\end{linenomath}
which is, up to a scale factor, the joint density, $f_{\mc Y,\mc
U}(\yobs,\bm u)$. The unscaled conditional density will be, up to a
scale factor, a $q$-dimensional multivariate Gaussian with an integral
@@ -272,12 +286,14 @@
the conditional mode, and hence the conditional mean, by maximizing
the unscaled conditional density. This is in the form of a
\emph{penalized linear least squares} problem,
+\begin{linenomath}
\begin{equation}
\label{eq:PLSprob}
\bm\mu_{\mc U|\mc Y=\yobs}=\arg\min_{\bm u}
\left(\left\|\yobs-\bm X\bm\beta-\bm Z\bLt\bm u\right\|^2 +
\left\|\bm u\right\|^2\right) .
\end{equation}
+\end{linenomath}
\subsection{Solving the penalized least squares problem}
\label{sec:solvingPLS}
@@ -285,6 +301,7 @@
In the so-called ``pseudo-data'' approach to penalized least squares
problems we write the objective as a residual sum of squares for an
extended response vector and model matrix
+\begin{linenomath}
\begin{equation}
\label{eq:pseudoData}
\left\|\yobs-\bm X\bm\beta-\bm Z\bLt\bm u\right\|^2 +
@@ -293,16 +310,19 @@
\begin{bmatrix}\bm Z\bLt\\\bm I_q\end{bmatrix}
\bm u\right\|^2.
\end{equation}
+\end{linenomath}
The contribution to the residual sum of squares from the ``pseudo''
observations appended to $\yobs-\bm X\bm\beta$, is exactly the penalty
term, $\left\|\bm u\right\|^2$.
From \eq{eq:pseudoData} we can see that the conditional mean satisfies
+\begin{linenomath}
\begin{equation}
\label{eq:PLSsol}
\left(\bLt\trans\bm Z\trans\bm Z\bLt+\bm I_q\right)
\bm\mu_{\mc U|\mc Y=\yobs}=\bLt\trans\bm Z\trans(\yobs-\bm X\bm\beta),
\end{equation}
+\end{linenomath}
which would be interesting, but not terribly useful, were it not for
the fact that we can determine the solution to \eq{eq:PLSsol}
quickly and accurately, even when $q$, the size of the system to
@@ -312,12 +332,14 @@
The key to solving \eq{eq:PLSsol} is the \emph{sparse Cholesky
factor}, $\bm L_\theta$, which is a sparse, lower-triangular matrix
such that
+\begin{linenomath}
\begin{equation}
\label{eq:sparseChol}
\bm L_\theta\bm L\trans_\theta=\bm P
\left(\bLt\trans\bm Z\trans\bm Z\bLt+\bm I_q\right)
\bm P\trans,
\end{equation}
+\end{linenomath}
where $\bm P$ is a permutation matrix representing a fill-reducing
permutation~\citep[Ch.~7]{davis06:csparse_book}.
@@ -335,6 +357,7 @@
values of the non-zero elements in $\bm L$ but does not change their
positions. Hence, the symbolic phase must be done only once.
+\bmb{Update: refer to RcppEigen methods rather than Matrix methods?}
The \code{Cholesky} function in the \pkg{Matrix} package for
\proglang{R} performs both the symbolic and numeric phases of the
factorization to produce $\bm L_\theta$ from $\bLt\trans\bm Z\trans\bm
@@ -351,12 +374,14 @@
Although the \code{solve} method for the \code{"CHMfactor"} class has
an option to evaluate $\bm\mu_{\mc U|\mc Y=\yobs}$ directly as the solution to
+\begin{linenomath}
\begin{equation}
\label{eq:Cholsol}
\bm P\trans\bm L_\theta\bm L\trans_\theta\bm P
\bm\mu_{\mc U|\mc Y=\yobs}=
\bLt\trans\bm Z\trans(\bm y-\bm X\bm\beta) .
\end{equation}
+\end{linenomath}
we will express the solution in two stages:
\begin{enumerate}
\item Solve $\bm L\bm c_{\bm u}=\bm P\bLt\trans\bm Z\trans(\bm y-\bm
@@ -371,12 +396,14 @@
After solving for $\bm\mu_{\mc U|\mc Y=\yobs}$ the exponent in $f_{\mc Y,\mc
U}(\yobs, \bm u)$ can be written
+\begin{linenomath}
\begin{equation}
\label{eq:PLS}
\|\yobs-\bm X\bm\beta-\bm Z\bLt\bm u\|^2+\|\bm u\|^2=
r^2(\bm\theta,\bm\beta)+
\|\bm L\trans\bm P(\bm u-\bm\mu_{\mc U|\mc Y=\yobs})\|^2.
\end{equation}
+\end{linenomath}
where $r^2(\bm\theta,\bm\beta)=\|\yobs-\bm X\bm\beta-
\bm Z\bm\Lambda_\theta\bm\mu_{\mc U|\mc Y=\yobs}\|^2+
\|\bm\mu_{\mc U|\mc Y=\yobs}\|^2$, is the minimum
@@ -386,6 +413,7 @@
With expression (\ref{eq:PLS}) and the change of variable $\bm v=\bm
L\trans\bm P(\bm u-\bm\mu_{\mc U|\mc Y=\yobs})$, for which $d\bm
v=\abs(|\bm L||\bm P|)\,d\bm u$, we have
+\begin{linenomath}
\begin{equation}
\label{eq:intExpr}
\int\frac{\exp\left(\frac{-\|\bm L\trans\bm P(\bm u-\bm\mu_{\mc U|\mc Y})\|^2}
@@ -396,6 +424,7 @@
v}{\abs(|\bm L||\bm P|)} = \frac{1}{\abs(|\bm L||\bm
P|)}=\frac{1}{|\bm L|}
\end{equation}
+\end{linenomath}
because $\abs|\bm P|=1$ (one property of a permutation matrix is $|\bm
P|=\pm1$) and $|\bm L|$, which, because $\bm L$ is triangular, is the
product of its diagonal elements, all of which are positive, is
@@ -403,24 +432,30 @@
Using this expression we can write the deviance (negative twice the
log-likelihood) as
+\begin{linenomath}
\begin{equation}
\label{eq:deviance}
-2\ell(\bm\theta,\bm\beta,\sigma^2|\yobs)=-2\log L(\bm\theta,\bm\beta,\sigma^2|\yobs)=
n\log(2\pi\sigma^2)+\frac{r^2(\bm\theta,\bm\beta)}{\sigma^2}+
\log(|\bm L_\theta|^2)
\end{equation}
+\end{linenomath}
Because the dependence of \eq{eq:deviance} on $\sigma^2$ is
straightforward, we can form the conditional estimate
+\begin{linenomath}
\begin{equation}
\label{eq:conddev}
\widehat{\sigma^2}(\bm\theta,\bm\beta)=\frac{r^2(\bm\theta,\bm\beta)}{n} ,
\end{equation}
+\end{linenomath}
producing the \emph{profiled deviance}
+\begin{linenomath}
\begin{equation}
\label{eq:profdev1}
-2\tilde{\ell}(\bm\theta,\bm\beta|\yobs)=\log(|\bm L_\theta|^2)+
n\left[1+\log\left(\frac{2\pi r^2(\bm\theta,\bm\beta)}{n}\right)\right]
\end{equation}
+\end{linenomath}
However, observing that \eq{eq:profdev1} depends on $\bm\beta$
only through $r^2(\bm\theta,\bm\beta)$ provides a much greater
@@ -429,18 +464,22 @@
deviance. The conditional estimate, $\widehat{\bm\beta}_\theta$, is
the value of $\bm\beta$ at the solution of the joint penalized least
squares problem
+\begin{linenomath}
\begin{equation}
\label{eq:jointPLS}
r^2_\theta=\min_{\bm u,\bm\beta}
\left(\left\|\bm y-\bm X\bm\beta-\bm Z\bLt\bm u\right\|^2 +
\left\|\bm u\right\|^2\right) ,
\end{equation}
+\end{linenomath}
producing the profiled deviance,
+\begin{linenomath}
\begin{equation}
\label{eq:profdev2}
-2\tilde{\ell}(\bm\theta)=\log(|\bm L_\theta|^2)+
n\left[1+\log\left(\frac{2\pi r^2_\theta}{n}\right)\right],
\end{equation}
+\end{linenomath}
which is a function of $\bm\theta$ only. Eqn.~\ref{eq:profdev2} is a
remarkably compact expression for the deviance.
@@ -449,6 +488,7 @@
The solutions, $\bm\mu_{\mc U|\mc Y=\yobs}$ and
$\widehat{\bm\beta}_\theta$, of the joint penalized least squares
problem (\ref{eq:jointPLS}) satisfy
+\begin{linenomath}
\begin{equation}
\label{eq:jointPLSeqn}
\begin{bmatrix}
@@ -462,6 +502,7 @@
\begin{bmatrix}\bLt\trans\bm Z\trans\yobs\\\bm X\trans\yobs .
\end{bmatrix}
\end{equation}
+\end{linenomath}
As before we will use the sparse Cholesky decomposition producing,
$\bm L_\theta$, the sparse Cholesky factor, and $\bm P$, the
permutation matrix, satisfying $\bm L_\theta\bm L_\theta\trans=\bm
@@ -477,6 +518,7 @@
\bm R_X\trans\bm c_{\bm\beta}&=\bm X\trans\yobs-\bm R_{ZX}\trans\bm c_{\bm u}
\end{align*}
so that
+\begin{linenomath}
\begin{equation}
\label{eq:fulldecomp}
\begin{bmatrix}
@@ -492,6 +534,7 @@
\bm X\trans\bm Z\bLt & \bm X\trans\bm X
\end{bmatrix} ,
\end{equation}
+\end{linenomath}
and the solutions, $\bm\mu_{\mc U|\mc Y=\yobs}$ and
$\widehat{\bm\beta}_\theta$, satisfy
\begin{align}
@@ -505,11 +548,13 @@
\citet{laird_ware_1982} show that the criterion to be optimized by the
REML estimates can be expressed as
+\begin{linenomath}
\begin{equation}
\label{eq:REMLcrit}
L_R(\bm\theta,\sigma^2|\yobs)=\int
L(\bm\theta,\bm\beta,\sigma^2|\yobs)\,d\bm\beta .
\end{equation}
+\end{linenomath}
Because the joint solutions, $\bm\mu_{\mc U|\mc Y=\yobs}$ and
$\widehat{\bm\beta}_\theta$, to the penalized least squares problem
@@ -525,17 +570,19 @@
we can use a change of variable, similar to that in
\eq{eq:intExpr}, to evaluate the profiled REML criterion. On the
deviance scale the criterion can be evaluated as
+\begin{linenomath}
\begin{equation}
\label{eq:profiledREML}
-2\tilde{\ell}_R(\bm\theta)=\log(|\bm L|^2)+\log(|\bm R_X|^2)+
(n-p)\left[1+\log\left(\frac{2\pi r^2_\theta}{n-p}\right)\right] .
\end{equation}
+\end{linenomath}
The structures in \pkg{lme4} for representing mixed-models are
somewhat more general than is required for linear mixed models. They
are designed to also be used for generalized linear mixed models
-(GLMMs) and nonlinear mixed models (NLMMs), which we describe in the
-next sections.
+(GLMMs) and nonlinear mixed models (NLMMs), which we will describe
+in future publications.
\section{Implementation details}
@@ -573,7 +620,8 @@
\begin{itemize}
\item example \#1 (small/simple: \code{Orthodont}?)
-\item example \#2 (large/complex/crossed)
+\item example \#2 (large/complex/crossed: Maybe the \code{ScotsSec} example from the \code{mlmRev} package?
+ (Are we going to need to re-implement sparse fixed-effect matrices to get this one done? \code{sparse=TRUE} is used in Doug's Potsdam notes at \url{http://www.stat.wisc.edu/~bates/PotsdamGLMM/Large.Rnw} \ldots)
\end{itemize}
Show model calls: fixed and var-cov estimates; conditional modes;
More information about the Lme4-commits
mailing list