[Lme4-commits] r1817 - www/JSS

Sun Jun 2 18:30:52 CEST 2013

Author: bbolker
Date: 2013-06-02 18:30:52 +0200 (Sun, 02 Jun 2013)
New Revision: 1817

Modified:
   www/JSS/glmer.Rnw
   www/JSS/lmer.Rnw
Log:
updates (linenomath solution, tweaks)



Modified: www/JSS/glmer.Rnw
===================================================================

--- www/JSS/glmer.Rnw	2013-05-31 17:30:10 UTC (rev 1816)
+++ www/JSS/glmer.Rnw	2013-06-02 16:30:52 UTC (rev 1817)
@@ -387,7 +387,7 @@
            panel.xyplot(...)}
        ))
 @ 
-  \caption{Comparison of univariate integrands (solid line) and standard normal density function (dashed line) \bmb{is something wrong?  Do these agree TOO well?}}
+  \caption{Comparison of univariate integrands (solid line) and standard normal density function (dashed line) \bmb{consider Q-Q plots to emphasize deviations from normality?}
   \label{fig:densities}
 \end{figure}
 

Modified: www/JSS/lmer.Rnw
===================================================================
--- www/JSS/lmer.Rnw	2013-05-31 17:30:10 UTC (rev 1816)
+++ www/JSS/lmer.Rnw	2013-06-02 16:30:52 UTC (rev 1817)
@@ -96,9 +96,9 @@
 that incorporates both fixed-effects terms and random-effects terms in
 a linear predictor expression from which the conditional mean of the
 response can be evaluated.  In this paper we describe the formulation
-and representation of linear and generalized linear mixed models.  The
-techniques used for nonlinear mixed models will be described
-separately.
+and representation of linear mixed models.  The
+techniques used for generalized linear  and nonlinear 
+mixed models will be described separately.
 
 Just as a linear model can be described in terms of the distribution
 of $\mc{Y}$, the vector-valued random variable whose observed value is
@@ -106,10 +106,12 @@
 described by the distribution of two vector-valued random variables:
 $\mc{Y}$, the response and $\mc{B}$, the vector of random effects.  In
 a linear model the distribution of $\mc Y$ is multivariate normal,
+\begin{linenomath}
 \begin{equation}
   \label{eq:linearmodel}
   \mc Y\sim\mc{N}(\bm X\bm\beta,\sigma^2\bm I_n),
 \end{equation}
+\end{linenomath}
 where $n$ is the dimension of the response vector, $\bm I_n$ is the
 identity matrix of size $n$, $\bm\beta$ is a $p$-dimensional
 coefficient vector and $\bm X$ is an $n\times p$ model matrix. The
@@ -118,29 +120,35 @@
 
 In a linear mixed model it is the \emph{conditional} distribution of
 $\mc Y$ given $\mc B=\bm b$ that has such a form,
+\begin{linenomath}
 \begin{equation}
   \label{eq:LMMcondY}
   (\mc Y|\mc B=\bm b)\sim\mc{N}(\bm X\bm\beta+\bm Z\bm b,\sigma^2\bm I_n)
 \end{equation}
+\end{linenomath}
 where $\bm Z$ is the $n\times q$ model matrix for the $q$-dimensional
 vector-valued random effects variable, $\mc B$, whose value we are
 fixing at $\bm b$.  The unconditional distribution of $\mc B$ is also
 multivariate normal with mean zero and a parameterized $q\times q$
 variance-covariance matrix, $\bm\Sigma$,
+\begin{linenomath}
 \begin{equation}
   \label{eq:LMMuncondB}
   \mc B\sim\mc N(\bm0,\bm\Sigma) .
 \end{equation}
+\end{linenomath}
 As a variance-covariance matrix, $\bm\Sigma$ must be positive
 semidefinite.  It is convenient to express the model in terms of a
 \emph{relative covariance factor}, $\bm\Lambda_\theta$, which is a
 $q\times q$ matrix, depending on the \emph{variance-component
   parameter}, $\bm\theta$, and generating the symmetric $q\times q$
 variance-covariance matrix, $\bm\Sigma$, according to
+\begin{linenomath}
 \begin{equation}
   \label{eq:relcovfac}
   \bm\Sigma_\theta=\sigma^2\bm\Lambda_\theta\bm\Lambda_\theta\trans ,
 \end{equation}
+\end{linenomath}
 where $\sigma$ is the same scale factor as in the conditional
 distribution (\ref{eq:LMMcondY}).
 
@@ -168,16 +176,16 @@
     \hline
     $\bm X$ & Fixed-effect design matrix & \code{getME(.,"X")} \\
     $\bm Z$ & Sparse random-effect design matrix & \code{getME(.,"Z")} \\
-    $\bm \theta$ & Variance-covariance parameter vector (log-Cholesky scale) &
+    $\bm \theta$ & Variance-covariance parameter vector \newline (Cholesky scale) &
     \code{getME(.,"theta")} \\
     $\bm \beta$ & Fixed-effect coefficients & \code{fixef(.)}  [\code{getME(.,"beta")}]\\
     $\sigma^2$ & Residual variance & \verb+sigma(.)^2+ \\
-    $\mc{Y}$ & Response variable & \\
-    $\bm\Lambda_{\bm\theta}$ & Relative covariance factor & \\
-    $\bm L_\theta$ & Sparse Cholesky factor & \code{getME(.,"Lambda")} \\
+    $\mc{Y}$ & Response variable & \code{getME(.,"y")} \\
+    $\bm\Lambda_{\bm\theta}$ & Relative covariance factor & \code{getME(.,"Lambda")} \\
+    $\bm L_\theta$ & Sparse Cholesky factor & \code{getME(.,"L")}\\
     $\mc B$ & Random effects & \\
     $\mc U$ & Spherical random effects & \\
-    $\bm u$ & Conditional modes of random effects & \code{ranef(.)} [\code{getME(.,"u")}] ? \\
+    $\bm u$ & Conditional modes of random effects & \code{getME(.,"u")} \\
     $\bm P$ & Fill-reducing permutation & 
   \end{tabular}
   \caption[Table of notation]{Table of notation and \pkg{lme4} equivalents
@@ -197,15 +205,13 @@
   distributions are called ``spherical'' because contours of the
   probability density are spheres.}
 \emph{random effects} variable, $\mc U$, with distribution
+
+\begin{linenomath}
 \begin{equation}
-  \label{eq:sphericalRE}
-  \mc U\sim\mathcal{N}(\bm 0,\sigma^2\bm I_q),
-\end{equation}
-and set
-\begin{equation}
   \label{eq:BU}
   \mc B=\bm\Lambda_\theta\mc U,
 \end{equation}
+\end{linenomath}
 then $\mc B$ will have the desired $\mathcal{N}(\bm
 0,\bm\Sigma_\theta)$ distribution.
 
@@ -221,11 +227,14 @@
 optimization of the criterion.
 
 The model can now be defined in terms of
+\begin{linenomath}
 \begin{equation}
   \label{eq:LMMcondYU}
   (\mc Y|\mc U=\bm u)\sim\mc{N}(\bm Z\bm\Lambda_\theta\bm u+\bm X\bm\beta,\sigma^2\bm I_n)
 \end{equation}
+\end{linenomath}
 producing the joint density function
+\begin{linenomath}
 \begin{equation}
   \label{eq:jointDens}
   \begin{aligned}
@@ -239,6 +248,7 @@
     {(2\pi\sigma^2)^{(n+q)/2}} .
   \end{aligned}
 \end{equation}
+\end{linenomath}
 
 % To obtain an expression for the likelihood it is convenient to
 % distinguish between a general argument, $\bm y$, and the particular
@@ -247,19 +257,23 @@
 The \emph{likelihood} of the parameters, $\bm\theta$, $\bm\beta$ and
 $\sigma^2$, given the observed data is the value of the
 marginal density of $\mc Y$, evaluated at $\yobs$.  That is
+\begin{linenomath}
 \begin{equation}
   \label{eq:likelihoodLMM}
   L(\bm\theta,\bm\beta,\sigma^2|\yobs)=\int_{\mathbb{R}^q}f_{\mc Y,\mc
     U}(\yobs,\bm u)\,d\bm u .
 \end{equation}
+\end{linenomath}
 The integrand of \eq{eq:likelihoodLMM} is the \emph{unscaled
   conditional density} of $\mc U$ given $\mc Y=\yobs$.  The
 conditional density of $\mc U$ given $\mc Y=\yobs$ is
+\begin{linenomath}
 \begin{equation}
   \label{eq:condDens}
   f_{\mc U|\mc Y}(\bm u|\yobs)=\frac{f_{\mc Y,\mc U}(\yobs,\bm u)}
   {\int f_{\mc Y,\mc U}(\yobs,\bm u)\,d\bm u}
 \end{equation}
+\end{linenomath}
 which is, up to a scale factor, the joint density, $f_{\mc Y,\mc
   U}(\yobs,\bm u)$.  The unscaled conditional density will be, up to a
 scale factor, a $q$-dimensional multivariate Gaussian with an integral
@@ -272,12 +286,14 @@
 the conditional mode, and hence the conditional mean, by maximizing
 the unscaled conditional density.  This is in the form of a
 \emph{penalized linear least squares} problem,
+\begin{linenomath}
 \begin{equation}
   \label{eq:PLSprob}
   \bm\mu_{\mc U|\mc Y=\yobs}=\arg\min_{\bm u}
   \left(\left\|\yobs-\bm X\bm\beta-\bm Z\bLt\bm u\right\|^2 +
     \left\|\bm u\right\|^2\right) .
 \end{equation}
+\end{linenomath}
 
 \subsection{Solving the penalized least squares problem}
 \label{sec:solvingPLS}
@@ -285,6 +301,7 @@
 In the so-called ``pseudo-data'' approach to penalized least squares
 problems we write the objective as a residual sum of squares for an
 extended response vector and model matrix
+\begin{linenomath}
 \begin{equation}
   \label{eq:pseudoData}
   \left\|\yobs-\bm X\bm\beta-\bm Z\bLt\bm u\right\|^2 +
@@ -293,16 +310,19 @@
     \begin{bmatrix}\bm Z\bLt\\\bm I_q\end{bmatrix}
     \bm u\right\|^2.
 \end{equation}
+\end{linenomath}
 The contribution to the residual sum of squares from the ``pseudo''
 observations appended to $\yobs-\bm X\bm\beta$, is exactly the penalty
 term, $\left\|\bm u\right\|^2$.
 
 From \eq{eq:pseudoData} we can see that the conditional mean satisfies
+\begin{linenomath}
 \begin{equation}
   \label{eq:PLSsol}
   \left(\bLt\trans\bm Z\trans\bm Z\bLt+\bm I_q\right)
   \bm\mu_{\mc U|\mc Y=\yobs}=\bLt\trans\bm Z\trans(\yobs-\bm X\bm\beta),
 \end{equation}
+\end{linenomath}
 which would be interesting, but not terribly useful, were it not for
 the fact that we can determine the solution to \eq{eq:PLSsol}
 quickly and accurately, even when $q$, the size of the system to
@@ -312,12 +332,14 @@
 The key to solving \eq{eq:PLSsol} is the \emph{sparse Cholesky
   factor}, $\bm L_\theta$, which is a sparse, lower-triangular matrix
 such that
+\begin{linenomath}
 \begin{equation}
   \label{eq:sparseChol}
   \bm L_\theta\bm L\trans_\theta=\bm P
   \left(\bLt\trans\bm Z\trans\bm Z\bLt+\bm I_q\right)
   \bm P\trans,
 \end{equation}
+\end{linenomath}
 where $\bm P$ is a permutation matrix representing a fill-reducing
 permutation~\citep[Ch.~7]{davis06:csparse_book}.
 
@@ -335,6 +357,7 @@
 values of the non-zero elements in $\bm L$ but does not change their
 positions.  Hence, the symbolic phase must be done only once.
 
+\bmb{Update: refer to RcppEigen methods rather than Matrix methods?}
 The \code{Cholesky} function in the \pkg{Matrix} package for
 \proglang{R} performs both the symbolic and numeric phases of the
 factorization to produce $\bm L_\theta$ from $\bLt\trans\bm Z\trans\bm
@@ -351,12 +374,14 @@
 
 Although the  \code{solve} method for the \code{"CHMfactor"} class has
 an option to evaluate $\bm\mu_{\mc U|\mc Y=\yobs}$ directly as the solution to
+\begin{linenomath}
 \begin{equation}
   \label{eq:Cholsol}
   \bm P\trans\bm L_\theta\bm L\trans_\theta\bm P
   \bm\mu_{\mc U|\mc Y=\yobs}=
   \bLt\trans\bm Z\trans(\bm y-\bm X\bm\beta) .
 \end{equation}
+\end{linenomath}
 we will express the solution in two stages:
 \begin{enumerate}
 \item Solve $\bm L\bm c_{\bm u}=\bm P\bLt\trans\bm Z\trans(\bm y-\bm
@@ -371,12 +396,14 @@
 
 After solving for $\bm\mu_{\mc U|\mc Y=\yobs}$ the exponent in $f_{\mc Y,\mc
   U}(\yobs, \bm u)$ can be written
+\begin{linenomath}
 \begin{equation}
   \label{eq:PLS}
   \|\yobs-\bm X\bm\beta-\bm Z\bLt\bm u\|^2+\|\bm u\|^2=
   r^2(\bm\theta,\bm\beta)+
   \|\bm L\trans\bm P(\bm u-\bm\mu_{\mc U|\mc Y=\yobs})\|^2.
 \end{equation}
+\end{linenomath}
 where $r^2(\bm\theta,\bm\beta)=\|\yobs-\bm X\bm\beta-
 \bm Z\bm\Lambda_\theta\bm\mu_{\mc U|\mc Y=\yobs}\|^2+
 \|\bm\mu_{\mc U|\mc Y=\yobs}\|^2$, is the minimum
@@ -386,6 +413,7 @@
 With expression (\ref{eq:PLS}) and the change of variable $\bm v=\bm
 L\trans\bm P(\bm u-\bm\mu_{\mc U|\mc Y=\yobs})$, for which $d\bm
 v=\abs(|\bm L||\bm P|)\,d\bm u$, we have
+\begin{linenomath}
 \begin{equation}
   \label{eq:intExpr}
   \int\frac{\exp\left(\frac{-\|\bm L\trans\bm P(\bm u-\bm\mu_{\mc U|\mc Y})\|^2}
@@ -396,6 +424,7 @@
     v}{\abs(|\bm L||\bm P|)} = \frac{1}{\abs(|\bm L||\bm
     P|)}=\frac{1}{|\bm L|}
 \end{equation}
+\end{linenomath}
 because $\abs|\bm P|=1$ (one property of a permutation matrix is $|\bm
 P|=\pm1$) and $|\bm L|$, which, because $\bm L$ is triangular, is the
 product of its diagonal elements, all of which are positive, is
@@ -403,24 +432,30 @@
 
 Using this expression we can write the deviance (negative twice the
 log-likelihood) as
+\begin{linenomath}
 \begin{equation}
   \label{eq:deviance}
   -2\ell(\bm\theta,\bm\beta,\sigma^2|\yobs)=-2\log L(\bm\theta,\bm\beta,\sigma^2|\yobs)=
   n\log(2\pi\sigma^2)+\frac{r^2(\bm\theta,\bm\beta)}{\sigma^2}+
   \log(|\bm L_\theta|^2)
 \end{equation}
+\end{linenomath}
 Because the dependence of \eq{eq:deviance} on $\sigma^2$ is
 straightforward, we can form the conditional estimate
+\begin{linenomath}
 \begin{equation}
   \label{eq:conddev}
   \widehat{\sigma^2}(\bm\theta,\bm\beta)=\frac{r^2(\bm\theta,\bm\beta)}{n} ,
 \end{equation}
+\end{linenomath}
 producing the \emph{profiled deviance}
+\begin{linenomath}
 \begin{equation}
   \label{eq:profdev1}
   -2\tilde{\ell}(\bm\theta,\bm\beta|\yobs)=\log(|\bm L_\theta|^2)+
   n\left[1+\log\left(\frac{2\pi r^2(\bm\theta,\bm\beta)}{n}\right)\right]
 \end{equation}
+\end{linenomath}
 
 However, observing that \eq{eq:profdev1} depends on $\bm\beta$
 only through $r^2(\bm\theta,\bm\beta)$ provides a much greater
@@ -429,18 +464,22 @@
 deviance.  The conditional estimate, $\widehat{\bm\beta}_\theta$, is
 the value of $\bm\beta$ at the solution of the joint penalized least
 squares problem
+\begin{linenomath}
 \begin{equation}
   \label{eq:jointPLS}
   r^2_\theta=\min_{\bm u,\bm\beta}
   \left(\left\|\bm y-\bm X\bm\beta-\bm Z\bLt\bm u\right\|^2 +
     \left\|\bm u\right\|^2\right) ,
 \end{equation}
+\end{linenomath}
 producing the profiled deviance,
+\begin{linenomath}
 \begin{equation}
   \label{eq:profdev2}
   -2\tilde{\ell}(\bm\theta)=\log(|\bm L_\theta|^2)+
   n\left[1+\log\left(\frac{2\pi r^2_\theta}{n}\right)\right],
 \end{equation}
+\end{linenomath}
 which is a function of $\bm\theta$ only.  Eqn.~\ref{eq:profdev2} is a
 remarkably compact expression for the deviance.
 
@@ -449,6 +488,7 @@
 The solutions, $\bm\mu_{\mc U|\mc Y=\yobs}$ and
 $\widehat{\bm\beta}_\theta$, of the joint penalized least squares
 problem (\ref{eq:jointPLS}) satisfy
+\begin{linenomath}
 \begin{equation}
   \label{eq:jointPLSeqn}
   \begin{bmatrix}
@@ -462,6 +502,7 @@
   \begin{bmatrix}\bLt\trans\bm Z\trans\yobs\\\bm X\trans\yobs .
   \end{bmatrix}
 \end{equation}
+\end{linenomath}
 As before we will use the sparse Cholesky decomposition producing,
 $\bm L_\theta$, the sparse Cholesky factor, and $\bm P$, the
 permutation matrix, satisfying $\bm L_\theta\bm L_\theta\trans=\bm
@@ -477,6 +518,7 @@
   \bm R_X\trans\bm c_{\bm\beta}&=\bm X\trans\yobs-\bm R_{ZX}\trans\bm c_{\bm u}
 \end{align*}
 so that
+\begin{linenomath}
 \begin{equation}
   \label{eq:fulldecomp}
   \begin{bmatrix}
@@ -492,6 +534,7 @@
     \bm X\trans\bm Z\bLt       & \bm X\trans\bm X
   \end{bmatrix} ,
 \end{equation}
+\end{linenomath}
 and the solutions, $\bm\mu_{\mc U|\mc Y=\yobs}$ and
 $\widehat{\bm\beta}_\theta$, satisfy
 \begin{align}
@@ -505,11 +548,13 @@
 
 \citet{laird_ware_1982} show that the criterion to be optimized by the
 REML estimates can be expressed as
+\begin{linenomath}
 \begin{equation}
   \label{eq:REMLcrit}
   L_R(\bm\theta,\sigma^2|\yobs)=\int
   L(\bm\theta,\bm\beta,\sigma^2|\yobs)\,d\bm\beta .
 \end{equation}
+\end{linenomath}
 
 Because the joint solutions, $\bm\mu_{\mc U|\mc Y=\yobs}$ and
 $\widehat{\bm\beta}_\theta$, to the penalized least squares problem
@@ -525,17 +570,19 @@
 we can use a change of variable, similar to that in
 \eq{eq:intExpr}, to evaluate the profiled REML criterion.  On the
 deviance scale the criterion can be evaluated as
+\begin{linenomath}
 \begin{equation}
   \label{eq:profiledREML}
   -2\tilde{\ell}_R(\bm\theta)=\log(|\bm L|^2)+\log(|\bm R_X|^2)+
   (n-p)\left[1+\log\left(\frac{2\pi r^2_\theta}{n-p}\right)\right] .
 \end{equation}
+\end{linenomath}
 
 The structures in \pkg{lme4} for representing mixed-models are
 somewhat more general than is required for linear mixed models.  They
 are designed to also be used for generalized linear mixed models
-(GLMMs) and nonlinear mixed models (NLMMs), which we describe in the
-next sections.
+(GLMMs) and nonlinear mixed models (NLMMs), which we will describe
+in future publications.
 
 \section{Implementation details}
 
@@ -573,7 +620,8 @@
 
 \begin{itemize}
 \item example \#1 (small/simple: \code{Orthodont}?)
-\item example \#2 (large/complex/crossed)
+\item example \#2 (large/complex/crossed: Maybe the \code{ScotsSec} example from the \code{mlmRev} package?
+  (Are we going to need to re-implement sparse fixed-effect matrices to get this one done? \code{sparse=TRUE} is used in Doug's Potsdam notes at \url{http://www.stat.wisc.edu/~bates/PotsdamGLMM/Large.Rnw} \ldots)
  \end{itemize}
 
 Show model calls: fixed and var-cov estimates; conditional modes;