[Lme4-commits] r1820 - www/JSS

Wed Jun 5 23:47:08 CEST 2013

Author: dmbates
Date: 2013-06-05 23:47:08 +0200 (Wed, 05 Jun 2013)
New Revision: 1820

Modified:
   www/JSS/lmer.Rnw
Log:
Write in terms of the formulation with vectors of matrices.


Modified: www/JSS/lmer.Rnw
===================================================================

--- www/JSS/lmer.Rnw	2013-06-05 21:46:40 UTC (rev 1819)
+++ www/JSS/lmer.Rnw	2013-06-05 21:47:08 UTC (rev 1820)
@@ -5,7 +5,7 @@
 \newcommand{\scw}[1]{{\color{blue} \emph{#1}}}
 \usepackage[american]{babel}  %% for texi2dvi ~ bug
 \usepackage{bm,amsmath,thumbpdf,amsfonts}
-\author{Douglas Bates\\University of Wisconsin - Madison\And
+\author{Douglas Bates\\U. of Wisconsin - Madison\And
   Martin M\"achler\\ETH Zurich\And
   Ben Bolker\\McMaster University\And
   Steven C. Walker\\McMaster University
@@ -28,7 +28,7 @@
   is optimized, using one of the constrained optimization functions in
   \proglang{R}, to provide the parameter estimates.  We describe the
   structure of the model, the steps in evaluating the profiled
-  deviance or REML criterion and the structure of the class
+  deviance or REML criterion and the structure of the S4 class
   that represents such a model.  Sufficient detail is
   included to allow specialization of these structures by those who
   wish to write functions to fit specialized linear mixed models, such
@@ -89,22 +89,23 @@
 \section{Introduction}
 \label{sec:intro}
 
-The \code{lme4} package for \proglang{R} provides functions to fit and
-analyze linear mixed models (LMMs), generalized linear mixed models
-(GLMMs) and nonlinear mixed models (NLMMs).  In each of these names,
-the term ``mixed'' or, more fully, ``mixed-effects'', denotes a model
-that incorporates both fixed-effects terms and random-effects terms in
-a linear predictor expression from which the conditional mean of the
+The \pkg{lme4} package for \proglang{R} and the \pkg{MixedModels}
+package for \proglang{Julia} provide functions to fit and analyze
+linear mixed models (LMMs), generalized linear mixed models (GLMMs)
+and nonlinear mixed models (NLMMs).  In each of these names, the term
+``mixed'' or, more fully, ``mixed-effects'', denotes a model that
+incorporates both fixed-effects terms and random-effects terms in a
+linear predictor expression from which the conditional mean of the
 response can be evaluated.  In this paper we describe the formulation
-and representation of linear mixed models.  The
-techniques used for generalized linear  and nonlinear 
-mixed models will be described separately.
+and representation of linear mixed models.  The techniques used for
+generalized linear and nonlinear mixed models will be described
+separately.
 
 Just as a linear model can be described in terms of the distribution
 of $\mc{Y}$, the vector-valued random variable whose observed value is
 $\yobs$, the observed response vector, a linear mixed model can be
 described by the distribution of two vector-valued random variables:
-$\mc{Y}$, the response and $\mc{B}$, the vector of random effects.  In
+$\mc{Y}$, the response, and $\mc{B}$, the vector of random effects.  In
 a linear model the distribution of $\mc Y$ is multivariate normal,
 \begin{linenomath}
 \begin{equation}
@@ -123,7 +124,7 @@
 \begin{linenomath}
 \begin{equation}
   \label{eq:LMMcondY}
-  (\mc Y|\mc B=\bm b)\sim\mc{N}(\bm X\bm\beta+\bm Z\bm b,\sigma^2\bm I_n)
+  (\mc Y|\mc B=\bm b)\sim\mc{N}(\bm X\bm\beta+\bm Z\bm b,\sigma^2\bm I_n),
 \end{equation}
 \end{linenomath}
 where $\bm Z$ is the $n\times q$ model matrix for the $q$-dimensional
@@ -168,31 +169,8 @@
 \ref{eq:LMMuncondB} is particularly useful, we first show that the
 profiled deviance (negative twice the log-likelihood) and the profiled
 REML criterion can be expressed as a function of $\bm\theta$ only.
-Furthermore, these criteria can be evaluated quickly and accurately.
+Furthermore these criteria can be evaluated quickly and accurately.
 
-\begin{table}
-  \begin{tabular}{cp{3in}l}
-    \textbf{Symbol} & \textbf{Meaning} & \textbf{lme4 equivalent} \\
-    \hline
-    $\bm X$ & Fixed-effect design matrix & \code{getME(.,"X")} \\
-    $\bm Z$ & Sparse random-effect design matrix & \code{getME(.,"Z")} \\
-    $\bm \theta$ & Variance-covariance parameter vector \newline (Cholesky scale) &
-    \code{getME(.,"theta")} \\
-    $\bm \beta$ & Fixed-effect coefficients & \code{fixef(.)}  [\code{getME(.,"beta")}]\\
-    $\sigma^2$ & Residual variance & \verb+sigma(.)^2+ \\
-    $\mc{Y}$ & Response variable &  \\
-    $\bm\Lambda_{\bm\theta}$ & Relative covariance factor & \code{getME(.,"Lambda")} \\
-    $\bm L_\theta$ & Sparse Cholesky factor & \code{getME(.,"L")}\\
-    $\mc B$ & Random effects & \\
-    $\mc U$ & Spherical random effects & \\
-    $\bm u$ & Conditional modes of random effects & \code{getME(.,"u")} \\
-    $\bm P$ & Fill-reducing permutation & 
-  \end{tabular}
-  \caption[Table of notation]{Table of notation and \pkg{lme4} equivalents
-  \bmb{finish this table!}
-}
-  \label{tab:notation}
-\end{table}
 \section{Profiling the deviance and the REML criterion}
 \label{sec:profdev}
 
@@ -205,9 +183,15 @@
   distributions are called ``spherical'' because contours of the
   probability density are spheres.}
 \emph{random effects} variable, $\mc U$, with distribution
-
 \begin{linenomath}
 \begin{equation}
+  \label{eq:sphericalRE}
+  \mc U\sim\mathcal{N}(\bm 0,\sigma^2\bm I_q),
+\end{equation}
+\end{linenomath}
+and set
+\begin{linenomath}
+\begin{equation}
   \label{eq:BU}
   \mc B=\bm\Lambda_\theta\mc U,
 \end{equation}
@@ -250,10 +234,6 @@
 \end{equation}
 \end{linenomath}
 
-% To obtain an expression for the likelihood it is convenient to
-% distinguish between a general argument, $\bm y$, and the particular
-% value of the observed response, which we will write as $\yobs$
-% for the remainder of this section.
 The \emph{likelihood} of the parameters, $\bm\theta$, $\bm\beta$ and
 $\sigma^2$, given the observed data is the value of the
 marginal density of $\mc Y$, evaluated at $\yobs$.  That is
@@ -264,7 +244,7 @@
     U}(\yobs,\bm u)\,d\bm u .
 \end{equation}
 \end{linenomath}
-The integrand of \eq{eq:likelihoodLMM} is the \emph{unscaled
+The integrand in \eq{eq:likelihoodLMM} is the \emph{unscaled
   conditional density} of $\mc U$ given $\mc Y=\yobs$.  The
 conditional density of $\mc U$ given $\mc Y=\yobs$ is
 \begin{linenomath}
@@ -277,8 +257,8 @@
 which is, up to a scale factor, the joint density, $f_{\mc Y,\mc
   U}(\yobs,\bm u)$.  The unscaled conditional density will be, up to a
 scale factor, a $q$-dimensional multivariate Gaussian with an integral
-that is easily evaluated if we know the mean and variance-covariance
-of the conditional density.
+that is easily evaluated if we know its mean and variance-covariance
+matrix.
 
 The conditional mean, $\bm\mu_{\mc U|\mc Y=\yobs}$, is also the mode
 of the conditional distribution.  Because a constant factor in a
@@ -295,12 +275,9 @@
 \end{equation}
 \end{linenomath}
 
-\subsection{Solving the penalized least squares problem}
-\label{sec:solvingPLS}
-
-In the so-called ``pseudo-data'' approach to penalized least squares
-problems we write the objective as a residual sum of squares for an
-extended response vector and model matrix
+In the so-called ``pseudo-data'' approach to solving such penalized
+least squares problems we write the objective as a residual sum of
+squares for an extended response vector and model matrix
 \begin{linenomath}
 \begin{equation}
   \label{eq:pseudoData}
@@ -324,12 +301,19 @@
 \end{equation}
 \end{linenomath}
 which would be interesting, but not terribly useful, were it not for
-the fact that we can determine the solution to \eq{eq:PLSsol}
-quickly and accurately, even when $q$, the size of the system to
-solve, is very large indeed.  (We have done so in cases where $q$ is
-in the millions.)
+the fact that we can determine the solution to \eq{eq:PLSsol} quickly
+and accurately, even when $q$, the size of the system to solve, is
+very large indeed.  As described in sect.~\ref{sect:PLSnumer}, the
+special structures of the matrices $\bm Z$ and $\bLt$ for
+mixed-effects models described by a mixed-model formula result in
+$\bLt\trans\bm Z\trans\bm Z\bLt$ being very sparse, so that the
+sparse Cholesky factorization~\citep[Ch.~4]{davis06:csparse_book} can
+be used to solve eqn.~\ref{eq:PLSsol}.
 
-The key to solving \eq{eq:PLSsol} is the \emph{sparse Cholesky
+We can write the general approach in terms of a fill-reducing
+permutation matrix $\bm P$~\citep[Ch.~7]{davis06:csparse_book}, which
+is determined by the pattern of non-zeros in $\bm Z\bLt$ but does
+not depend on $\bm\theta$ itself, and the \emph{sparse Cholesky
   factor}, $\bm L_\theta$, which is a sparse, lower-triangular matrix
 such that
 \begin{linenomath}
@@ -337,70 +321,16 @@
   \label{eq:sparseChol}
   \bm L_\theta\bm L\trans_\theta=\bm P
   \left(\bLt\trans\bm Z\trans\bm Z\bLt+\bm I_q\right)
-  \bm P\trans,
+  \bm P\trans.
 \end{equation}
 \end{linenomath}
-where $\bm P$ is a permutation matrix representing a fill-reducing
-permutation~\citep[Ch.~7]{davis06:csparse_book}.
 
-As for most sparse matrix methods, the sparse Cholesky factorization
-can be split into two phases: a symbolic phase in which the positions
-of the non-zero elements in the result are determined and a numeric
-phase in which the actual numeric values in these positions are
-determined.  Determining the fill-reducing permutation represented by
-$\bm P$ is part of the symbolic phase, which often takes much longer
-than the numeric phase.  During the course of determining the maximum
-likelihood or REML estimates of the parameters in a linear
-mixed-effects model we may need to evaluate $\bm L_\theta$ for many
-different values of $\bm\theta$, but each evaluation after the first
-requires only the numeric phase.  Changing $\bm\theta$ can change the
-values of the non-zero elements in $\bm L$ but does not change their
-positions.  Hence, the symbolic phase must be done only once.
-
-\bmb{Finish updating to 
-  refer to [Rcpp]Eigen methods rather than Matrix methods}
-%% The \code{Cholesky} function in the \pkg{Matrix} package for
-%%\proglang{R} performs both the symbolic and numeric phases of the
-%% factorization to produce $\bm L_\theta$ from $\bLt\trans\bm Z\trans\bm
-%% Z\bLt$.  The resulting object has S4 class \code{"CHMsuper"} or
-%% \code{"CHMsimp"} depending on whether it is in the
-%% supernodal~\citep[\S~4.8]{davis06:csparse_book} or simplicial form.
-%% Both these classes inherit from the virtual class \code{"CHMfactor"}.
-%% Optional arguments to the \code{Cholesky} function control
-%% determination of a fill-reducing permutation and addition of multiple
-%% of the identity to the symmetric matrix before factorization.  
-The \code{analyzePattern} method from the \code{Eigen} linear algebra
-package performs a symbolic decomposition of the sparsity pattern
-\ldots
-Once
-the factor has been determined for the initial value, $\bm\theta_0$,
-it can be updated for new values of $\bm\theta$ in a single call to
-the \code{update} method.
-
-Although the  \code{solve} method for the \code{"CHMfactor"} class has
-an option to evaluate $\bm\mu_{\mc U|\mc Y=\yobs}$ directly as the solution to
-\begin{linenomath}
-\begin{equation}
-  \label{eq:Cholsol}
-  \bm P\trans\bm L_\theta\bm L\trans_\theta\bm P
-  \bm\mu_{\mc U|\mc Y=\yobs}=
-  \bLt\trans\bm Z\trans(\bm y-\bm X\bm\beta) .
-\end{equation}
-\end{linenomath}
-we will express the solution in two stages:
-\begin{enumerate}
-\item Solve $\bm L\bm c_{\bm u}=\bm P\bLt\trans\bm Z\trans(\bm y-\bm
-  X\bm\beta)$ for $\bm c_{\bm u}$.
-\item Solve $\bm L\trans\bm P\bm\mu_{\mc U|\mc Y=\yobs}=\bm c_{\bm u}$ for $\bm
-  P\bm\mu_{\mc U|\mc Y=\yobs}$ and then for
-  $\bm\mu_{\mc U|\mc Y=\yobs}=\bm P\trans\left(\bm P\bm\mu_{\mc U|\mc Y=\yobs}\right)$.
-\end{enumerate}
-
 \subsection{Evaluating the likelihood}
 \label{sec:evallike}
 
-After solving for $\bm\mu_{\mc U|\mc Y=\yobs}$ the exponent in $f_{\mc Y,\mc
-  U}(\yobs, \bm u)$ can be written
+After solving eqn.~\ref{eq:PLSsol} for $\bm\mu_{\mc U|\mc Y=\yobs}$,
+the exponent of $f_{\mc Y,\mc U}(\yobs, \bm u)$
+(eqn.~\ref{eq:jointDens}) can be written
 \begin{linenomath}
 \begin{equation}
   \label{eq:PLS}
@@ -412,7 +342,7 @@
 where $r^2(\bm\theta,\bm\beta)=\|\yobs-\bm X\bm\beta-
 \bm Z\bm\Lambda_\theta\bm\mu_{\mc U|\mc Y=\yobs}\|^2+
 \|\bm\mu_{\mc U|\mc Y=\yobs}\|^2$, is the minimum
-penalized residual sum of squares for these values of $\bm\theta$ and
+penalized residual sum of squares for the given values of $\bm\theta$ and
 $\bm\beta$.
 
 With expression (\ref{eq:PLS}) and the change of variable $\bm v=\bm
@@ -458,38 +388,25 @@
 \begin{equation}
   \label{eq:profdev1}
   -2\tilde{\ell}(\bm\theta,\bm\beta|\yobs)=\log(|\bm L_\theta|^2)+
-  n\left[1+\log\left(\frac{2\pi r^2(\bm\theta,\bm\beta)}{n}\right)\right]
+  n\left[1+\log\left(\frac{2\pi r^2(\bm\theta,\bm\beta)}{n}\right)\right] .
 \end{equation}
 \end{linenomath}
 
-However, observing that \eq{eq:profdev1} depends on $\bm\beta$
-only through $r^2(\bm\theta,\bm\beta)$ provides a much greater
-simplification because it allows us to ``profile out'' the
-fixed-effects parameter, $\bm\beta$, from the evaluation of the
-deviance.  The conditional estimate, $\widehat{\bm\beta}_\theta$, is
-the value of $\bm\beta$ at the solution of the joint penalized least
-squares problem
+However, observing that \eq{eq:profdev1} depends on $\bm\beta$ only
+through $r^2(\bm\theta,\bm\beta)$ allows us to perform a much greater
+simplification by ``profiling out'' the fixed-effects parameter,
+$\bm\beta$, from the evaluation of the deviance.  The conditional
+estimate, $\widehat{\bm\beta}_\theta$, is the value of $\bm\beta$ at
+the solution of the joint penalized least squares problem
 \begin{linenomath}
 \begin{equation}
   \label{eq:jointPLS}
   r^2_\theta=\min_{\bm u,\bm\beta}
   \left(\left\|\bm y-\bm X\bm\beta-\bm Z\bLt\bm u\right\|^2 +
-    \left\|\bm u\right\|^2\right) ,
+    \left\|\bm u\right\|^2\right).
 \end{equation}
 \end{linenomath}
-producing the profiled deviance,
-\begin{linenomath}
-\begin{equation}
-  \label{eq:profdev2}
-  -2\tilde{\ell}(\bm\theta)=\log(|\bm L_\theta|^2)+
-  n\left[1+\log\left(\frac{2\pi r^2_\theta}{n}\right)\right],
-\end{equation}
-\end{linenomath}
-which is a function of $\bm\theta$ only.  Eqn.~\ref{eq:profdev2} is a
-remarkably compact expression for the deviance.
 
-\subsection{Solving the joint penalized least squares problem}
-\label{sec:jointPLS}
 The solutions, $\bm\mu_{\mc U|\mc Y=\yobs}$ and
 $\widehat{\bm\beta}_\theta$, of the joint penalized least squares
 problem (\ref{eq:jointPLS}) satisfy
@@ -508,45 +425,18 @@
   \end{bmatrix}
 \end{equation}
 \end{linenomath}
-As before we will use the sparse Cholesky decomposition producing,
-$\bm L_\theta$, the sparse Cholesky factor, and $\bm P$, the
-permutation matrix, satisfying $\bm L_\theta\bm L_\theta\trans=\bm
-P(\bLt\trans\bm Z\trans\bm Z\bLt+\bm I)\bm P\trans$ and $\bm c_{\bm u}$, the
-solution to $\bm L_\theta\bm c_{\bm u}=\bm P\bLt\trans\bm Z\trans\yobs$.
 
-We extend the decomposition with the $q\times p$ matrix $\bm
-R_{ZX}$, the upper triangular $p\times p$ matrix $\bm R_X$, and
-the $p$-vector $\bm c_{\bm\beta}$ satisfying
-\begin{align*}
-  \bm L\bm R_{ZX}&=\bm P\bLt\trans\bm Z\trans\bm X\\
-  \bm R_X\trans\bm R_X&=\bm X\trans\bm X-\bm R_{ZX}\trans\bm R_{ZX}\\
-  \bm R_X\trans\bm c_{\bm\beta}&=\bm X\trans\yobs-\bm R_{ZX}\trans\bm c_{\bm u}
-\end{align*}
-so that
+After solving eqn.~\ref{eq:jointPLS} we can write the \emph{profiled
+  deviance} as
 \begin{linenomath}
 \begin{equation}
-  \label{eq:fulldecomp}
-  \begin{bmatrix}
-    \bm P\trans\bm L& \bm 0\\
-    \bm R_{ZX}\trans & \bm R_X\trans
-  \end{bmatrix}
-  \begin{bmatrix}
-    \bm L\trans\bm P & \bm R_{ZX}\\
-    \bm 0            & \bm R_X
-  \end{bmatrix}=
-  \begin{bmatrix}
-    \bLt\trans\bm Z\trans\bm Z\bLt+\bm I & \bLt\trans\bm Z\trans\bm X\\
-    \bm X\trans\bm Z\bLt       & \bm X\trans\bm X
-  \end{bmatrix} ,
+  \label{eq:profdev2}
+  -2\tilde{\ell}(\bm\theta)=\log(|\bm L_\theta|^2)+
+  n\left[1+\log\left(\frac{2\pi r^2_\theta}{n}\right)\right],
 \end{equation}
 \end{linenomath}
-and the solutions, $\bm\mu_{\mc U|\mc Y=\yobs}$ and
-$\widehat{\bm\beta}_\theta$, satisfy
-\begin{align}
-  \bm R_X\widehat{\bm\beta}_\theta&=\bm c_{\bm\beta}\\
-  \bm L\trans\bm P\bm\mu_{\mc U|\mc Y=\yobs}&=\bm c_{\bm u}-\bm
-  R_{ZX}\widehat{\bm\beta}_\theta .
-\end{align}
+which is a function of $\bm\theta$ only.  Eqn.~\ref{eq:profdev2} is a
+remarkably compact expression for the deviance.
 
 \subsection{The profiled REML criterion}
 \label{sec:profiledREML}
@@ -561,89 +451,198 @@
 \end{equation}
 \end{linenomath}
 
+Let us write a partitioned Cholesky factorization of the system matrix
+for the joint penalized least squares problem, eqn.~\ref{eq:jointPLSeqn},
+as
+\begin{linenomath}
+\begin{equation}
+  \label{eq:fulldecomp}
+  \begin{bmatrix}
+    \bm P\trans\bm L& \bm 0\\
+    \bm L_{ZX} & \bm L_X
+  \end{bmatrix}
+  \begin{bmatrix}
+    \bm L\trans\bm P & \bm L_{ZX}\trans\\
+    \bm 0            & \bm L_X\trans
+  \end{bmatrix}=
+  \begin{bmatrix}
+    \bLt\trans\bm Z\trans\bm Z\bLt+\bm I & \bLt\trans\bm Z\trans\bm X\\
+    \bm X\trans\bm Z\bLt       & \bm X\trans\bm X
+  \end{bmatrix} .
+\end{equation}
+\end{linenomath}
+When the fixed-effects model matrix $\bm X$ is dense, the $p\times q$
+matrix $\bm L_{ZX}$ and the $p\times p$ lower triangular matrix $\bm
+L_X$ will also be dense.  Occasionally it makes sense to store $\bm X$
+as a sparse matrix in which case $\bm L_{ZX}$ and $\bm L_X$ could also
+be sparse.
+
 Because the joint solutions, $\bm\mu_{\mc U|\mc Y=\yobs}$ and
-$\widehat{\bm\beta}_\theta$, to the penalized least squares problem
-allow us to express
+$\widehat{\bm\beta}_\theta$, to eqn.~\ref{eq:jointPLSeqn} allow us to express
+\begin{linenomath}
 \begin{multline}
   \label{eq:PLS2}
   \|\yobs-\bm X\bm\beta-\bm Z\bLt\bm u\|^2+\|\bm u\|^2=\\
   r^2_\theta+
   \left\|\bm L\trans\bm P\left[\bm u-\bm\mu_{\mc U|\mc Y=\yobs}-
-  \bm R_{ZX}(\bm\beta - \widehat{\bm\beta}_\theta)\right]\right\|^2+
-  \left\|\bm R_X(\bm\beta - \widehat{\bm\beta}_\theta)\right\|^2
+  \bm L_{ZX}\trans(\bm\beta - \widehat{\bm\beta}_\theta)\right]\right\|^2+
+  \left\|\bm L_X\trans(\bm\beta - \widehat{\bm\beta}_\theta)\right\|^2
 \end{multline}
+\end{linenomath}
 we can use a change of variable, similar to that in
 \eq{eq:intExpr}, to evaluate the profiled REML criterion.  On the
 deviance scale the criterion can be evaluated as
 \begin{linenomath}
 \begin{equation}
   \label{eq:profiledREML}
-  -2\tilde{\ell}_R(\bm\theta)=\log(|\bm L|^2)+\log(|\bm R_X|^2)+
+  -2\tilde{\ell}_R(\bm\theta)=\log(|\bm L|^2)+\log(|\bm L_X|^2)+
   (n-p)\left[1+\log\left(\frac{2\pi r^2_\theta}{n-p}\right)\right] .
 \end{equation}
 \end{linenomath}
 
-The structures in \pkg{lme4} for representing mixed-models are
-somewhat more general than is required for linear mixed models.  They
-are designed to also be used for generalized linear mixed models
-(GLMMs) and nonlinear mixed models (NLMMs), which we will describe
-in future publications.
+\section{Random-effects terms and resulting matrix properties}
+\label{sec:PLSnumer}
 
-\section{Implementation details}
+In fitting linear mixed models an instance of the penalized least
+squares (PLS) problem (\ref{eq:jointPLSeqn}) must be solved at each
+evaluation of the objective function - either the profiled deviance or
+the profiled REML criterion - during the optimization with respect to
+$\bm\theta$.  Because this operation must be performed many times it
+is worthwhile studying the properties of $\bm Z$ and $\bLt$ in general
+and for common special cases.
 
-\subsection{Setting up the deviance function}
+\subsection{Random-effects terms in mixed-model formulas}
+\label{sec:LMMmatrix}
 
-\subsection{Optimization}
+A mixed-model formula incorporates $k\ge1$ random-effects terms of the
+form \code{(r|f)} where \code{r} is a linear model formula and
+\code{f} is an expression that evaluates to a factor, called the
+\code{grouping factor}, for the term.  In practice, fixed-effects
+model matrices and random-effects terms are evaluated with respect to
+a \emph{model frame}, ensuring that any expressions for grouping
+factors have been evaluated to factors and any unused levels of these
+factors have been dropped.  That is, $\ell_i$, the number of levels in
+the grouping factor for the $i$ random-effects term, is well-defined.
+The vector $\bm i_i$ of \emph{factor indices} for the $i$th term is an
+$n$-vector of values from $1,\dots,\ell_i$.  Let $\bm J_i$ be the
+$n\times \ell_i$ matrix of indicator columns for $\bm i_i$.
 
-\subsection{Working with a fitted model}
+When $k>1$ we order the random-effects terms so that
+$\ell_1\ge\ell_2\ge\dots\ell_k$.
 
+Let $\bm X_i$ of size $n\times p_i$ be the model matrix of the linear
+model formula, \code{r}, from the $i$th random-effects term,
+$i=1,\dots,k$.  A term is said to be a \emph{scalar} random-effects
+term when $p_i=1$, otherwise it is \emph{vector-valued}.  For a
+\emph{simple, scalar} random-effects term of the form \code{(1|f)},
+$\bm X_i$ is a matrix with a single column, each of whose elements are
+unity.
 
-\begin{table}
-  \begin{tabular}{cp{3in}l}
-    \textbf{Method} & \textbf{Use} & \textbf{lme4 equivalent} \\
-    \hline
-    
-  \end{tabular}
-  \caption{Methods for \code{lmerMod} objects}
-  \label{tab:methods}
-\end{table}
+The $i$th random effects term contributes $q_i=\ell_ip_i$ columns,
+which we will write as $\bm Z_i$, to
+the model matrix $\bm Z$.  Thus $q$, the number of columns in $\bm Z$
+and the dimension of the random variables $\mc{B}$ and $\mc{U}$, is
+\begin{linenomath}
+  \begin{equation}
+    \label{eq:qcalc}
+    q=\sum_{i=1}^k q_i = \sum_{i=1}^k \ell_i\,p_i .
+  \end{equation}
+\end{linenomath}
+The creation of $\bm Z_i$ from $\bm X_i$ and $\bm i_i$ is a
+straightforward concept that is somewhat difficult to describe.
+Consider $\bm Z_i$ as being further decomposed into $\ell_i$ blocks of
+$p_i$ columns.  The rows in the first block are the rows of $\bm X_i$
+multiplied by the 0/1 values in the first column of $\bm J_i$.
+Similarly for the subsequent blocks.
 
-\section{Examples}
+In particular, for a simple, scalar term, $\bm Z_i$ is exactly $\bm
+J_i$, the matrix of indicator columns.  For other scalar terms, $\bm
+Z_i$ is formed by element-wise multiplication of the single column of
+$\bm X_j$ by each of the columns of $\bm J_i$.
 
-It is challenging to pick examples, not because there are too few
-examples but because there are too many.  Should we pick a simple
-example with a small data set, one categorical predictor, and a single
-random grouping factor, so that the data can be examined easily and
-the results compared with classical method-of-moments results or with
-other mixed modeling software?  Or should we pick a
-complex example, with a large data set, a mixture of
-categorical and continuous predictors, and a complicated grouping
-structure (multi-level, crossed or partially crossed) to show off
-\pkg{lme4}'s capabilities? There are many potential degrees
-of freedom (\{ single, nested, or crossed random factors \} $\times$
-\{ categorical, continuous, or both types of predictors \} $\times$
-data set size): we choose one example from each end of the spectrum.
+Because each $\bm Z_i,i=1,\dots,k$ is generated from indicator
+columns, its cross-product, $\bm Z_i\trans\bm Z_i$ is block-diagonal
+consisting of $\ell_i$ diagonal blocks each of size $p_i$.
+Note that this means that when $k=1$ (i.e. there is only one
+random-effects term), $\bm Z\trans\bm Z$ will be block diagonal.
 
-\begin{itemize}
-\item example \#1 (small/simple: \code{Orthodont}?)
-\item example \#2 (large/complex/crossed: Maybe the \code{ScotsSec} example from the \code{mlmRev} package?
-  (Are we going to need to re-implement sparse fixed-effect matrices to get this one done? \code{sparse=TRUE} is used in Doug's Potsdam notes at \url{http://www.stat.wisc.edu/~bates/PotsdamGLMM/Large.Rnw} \ldots)
- \end{itemize}
+The $q\times q$ covariance factor, $\bLt$, is a block diagonal matrix
+whose $i$ diagonal block, $\bm\Lambda_i$, is of size $q_i,i=1,\dots,k$.
+Furthermore, $\bm\Lambda_i$ is itself block diagonal consisting of $\ell_i$
+lower-triangular blocks of size $p_i$ and all these inner blocks in
+$\bm\Lambda_i$ are identical.  Thus $\bm\Lambda_i$ is determined by
+$p_i(p_i+1)/2$ parameters corresponding to the positions on and below
+the diagonal in the $p_i\times p_i$ template of the $\ell_i$
+lower-triangular blocks on the diagonal.  To provide a unique
+representation we require that the diagonal elements of the size $p_i$
+lower triangular template be non-negative.
 
-Show model calls: fixed and var-cov estimates; conditional modes;
-predictions?
+\subsection{General computational structures and methods}
+\label{sec:generalcomputational}
 
-\section{Discussion}
-What do we want to say here?
+The heavily structured nature of $\bm Z$ and $\bLt$ allows us to
+represent the data compactly and to provide for general update
+methods.  To represent $\bm Z$ we only need the index vectors, $\bm i_i$,
+and the dense model matrices, $\bm X_i,i=1,\dots,k$.  To determine
+$\bLt$ we only need the lower-triangular template
+matrices of sizes $p_i,i=1,dots,k$.  In this form we can evaluate the non-zero
+elements of $\bm Z_i\bm\Lambda_i$ from the dense $n\times p_i$ product
+of $\bm X_i$ and the $i$th template matrix for $\bLt$.
 
-\begin{itemize}
-\item Point to alternatives in the R/FOSS ecosystem
-\item State $p$-value issues for the record
-\item \emph{Advantages} of \pkg{lme4}: fast, handles multiple/crossed RE, 
-  provides profiling/parametric bootstrapping; flexible (sort of)
-\item \emph{Disadvantages}: no R-side effects; (no $p$-values); not as flexible as e.g. AS-REML
-\end{itemize}
+The random-effects value, $\bm u$, is represented as matrices $\bm
+U_i,i=1,\dots,k$ of sizes $\ell_i\times p_i$.  Although when displaying the
+conditional modes of $\mc{U}$ or $\mn{B}$ we usually transpose these
+matrices it is more convenient to manipulate them in this form.
 
+There is an advantage in delaying this expansion, however, because
+non-zero values of $\bm Z_i\bm\Lambda_i,i=1,\dots,k$ can be evaluated
+as the product of the dense matrix $\bm X_i$ with the template
+diagonal element of $\bm\Lambda_i$.  Thus we never need to create the
+repeated block diagonal matrix $\bm\Lambda_i$ if we do the
+multiplication in the dense format and multiplying a dense $n\times p_i$
+of data
+Furthermore, if there are case
+weights to be applied they can be applied to $\bm X_i$ before
+expansion.  Case weights are not an important consideration when
+fitting LMMs because they can be applied to the model matrices and
+response when setting up the numerical representation.  However,
+solving iteratively reweighted penalized least squares problems is
+central to fitting GLMMs and easy updating for new case weights is
+important.
+
+
+
+
+\subsection{Models with a single random-effects term}
+\label{sec:1term}
+
+If there is only one random-effects term in the model, the matrices
+$\bm Z\trans\bm Z$, $\bLt$ and $\bm L$ will all be diagonal, for a
+single, scalar term, or block-diagonal, for a single vector-valued
+terms.  Handling this case separately results in sufficient
+simplification to make doing so worthwhile.
+
+The simplest special case is a mixed-model formula with a single, simple, scalar
+random-effects term, \code{(1|f)}.  In this case $q=\ell_1$, the number of levels of
+the factor \code{f} and the $n\times q$ model matrix, $\bm Z$ is $\bm J_1$, the 
+indicator matrix for the levels of \code{f}.  The cross-product,
+$\bm Z\trans\bm Z$, is diagonal.  In this case the diagonal elements are
+the frequencies of the levels of \code{f}.  The variance-component
+parameter vector, $\bm\theta$, is one dimensional so we represent it
+as a scalar, $\theta$, subject to the constraint $\theta\le0$.
+
+The covariance factor is a multiple of the identity,
+$\bLt=\theta\bm I_q$.  The Cholesky factor, $\bm L$, will be diagonal
+for any permutation, $\bm P$.  That is, there is no fill-in during the
+sparse Cholesky decomposition, we use the identity permutation, $\bm P=\bm I_q$.
+
+It is very unusual for a model to have a single, scalar term that is
+other than a simple, scalar term.  In this case $\bm Z$ would have the
+same pattern of non-zeros as does $\bm J_1$ but not all the non-zeros
+would be unity.  The crossproduct $\bm Z\trans\bm Z$ will be diagonal
+with easily calculated diagonal elements. Both the covariance factor,
+$\bLt$, and the Cholesky factor, $\bm L$, are diagonal.
+
 \bibliography{lmer}
 \end{document}