[Lme4-commits] r1846 - www/JSS

Fri Aug 23 22:54:20 CEST 2013

Author: dmbates
Date: 2013-08-23 22:54:20 +0200 (Fri, 23 Aug 2013)
New Revision: 1846

Added:
   www/JSS/lmer.tex
Log:
Add LaTeX file using minted (this can be backed out if undesired).


Added: www/JSS/lmer.tex
===================================================================

--- www/JSS/lmer.tex	                        (rev 0)
+++ www/JSS/lmer.tex	2013-08-23 20:54:20 UTC (rev 1846)
@@ -0,0 +1,1116 @@
+\documentclass{jss}
+%% need no \usepackage{Sweave.sty}
+%\usepackage{lineno}
+%\newenvironment{linenomath}{}{}
+\newcommand{\bmb}[1]{{\color{red} \emph{#1}}}
+\newcommand{\scw}[1]{{\color{blue} \emph{#1}}}
+\usepackage[american]{babel}  %% for texi2dvi ~ bug
+\usepackage{bm,amsmath,thumbpdf,amsfonts,minted}
+\author{Douglas Bates\\U. of Wisconsin - Madison\And
+  Martin M\"achler\\ETH Zurich\And
+  Ben Bolker\\McMaster University\And
+  Steven C. Walker\\McMaster University
+}
+\Plainauthor{Douglas Bates, Martin M\"achler, Ben Bolker, Steve Walker}
+\title{Fitting linear mixed-effects models using \pkg{lme4}}
+\Plaintitle{Fitting linear mixed-effects models using lme4}
+\Shorttitle{Linear mixed models with lme4}
+\Abstract{%
+  Maximum likelihood or restricted maximum likelihood (REML)
+  estimates of the parameters in linear
+  mixed-effects models can be determined using the \code{lmer}
+  function in the \pkg{lme4} package for \proglang{R}. As in most
+  model-fitting functions, the model is described in an \code{lmer}
+  call by a formula, in this case including both fixed-effects terms
+  and random-effects terms. The formula and data together determine a
+  numerical representation of the model from which the profiled
+  deviance or the profiled REML criterion can be evaluated as a
+  function of some of the model parameters.  The appropriate criterion
+  is optimized, using one of the constrained optimization functions in
+  \proglang{R}, to provide the parameter estimates.  We describe the
+  structure of the model, the steps in evaluating the profiled
+  deviance or REML criterion and the structure of the S4 class
+  that represents such a model.  Sufficient detail is
+  included to allow specialization of these structures by those who
+  wish to write functions to fit specialized linear mixed models, such
+  as models incorporating pedigrees or smoothing splines, that are not
+  easily expressible in the formula language used by \code{lmer}.
+}
+\Keywords{%
+  sparse matrix methods,
+  linear mixed models,
+  penalized least squares,
+  Cholesky decomposition}
+\Address{
+  Douglas Bates\\
+  Department of Statistics, University of Wisconsin\\
+  1300 University Ave.\\
+  Madison, WI 53706, U.S.A.\\
+  E-mail: \email{bates at stat.wisc.edu}
+  \par\bigskip
+  Martin M\"achler\\
+  Seminar f\"ur Statistik, HG G~16\\
+  ETH Zurich\\
+  8092 Zurich, Switzerland\\
+  E-mail: \email{maechler at stat.math.ethz.ch}\\
+  % URL: \url{http://stat.ethz.ch/people/maechler}
+  \par\bigskip
+  Benjamin M. Bolker\\
+  Departments of Mathematics \& Statistics and Biology \\
+  McMaster University \\
+  1280 Main Street W \\
+  Hamilton, ON L8S 4K1, Canada \\
+  E-mail: \email{bolker at mcmaster.ca}
+  \par\bigskip
+  Steven C. Walker\\
+  Department of Mathematics \& Statistics \\
+  McMaster University \\
+  1280 Main Street W \\
+  Hamilton, ON L8S 4K1, Canada \\
+  E-mail: \email{scwalker at math.mcmaster.ca }
+}
+\newcommand{\Var}{\operatorname{Var}}
+\newcommand{\abs}{\operatorname{abs}}
+\newcommand{\bLt}{\ensuremath{\bm\Lambda_\theta}}
+\newcommand{\mc}[1]{\ensuremath{\mathcal{#1}}}
+\newcommand{\trans}{\ensuremath{^\prime}}
+\newcommand{\yobs}{\ensuremath{\bm y_{\mathrm{obs}}}}
+\newcommand*{\eq}[1]{eqn.~\ref{#1}}% or just {(\ref{#1})}
+%
+%\SweaveOpts{engine=R,eps=FALSE,pdf=TRUE,width=10,height=6.5,strip.white=all}
+%\SweaveOpts{prefix=TRUE,prefix.string=figs/lmer,include=TRUE}
+%\SweaveOpts{keep.source=TRUE}
+%\setkeys{Gin}{width=\textwidth}
+%<<preliminaries,echo=FALSE,results=hide>>=
+%options(width=69,show.signif.stars=FALSE,str=strOptions(strict.width="cut"))
+%library(lme4)
+%@
+\begin{document}
+%\linenumbers
+\section{Introduction}
+\label{sec:intro}
+
+The \pkg{lme4} package for \proglang{R} and the \pkg{MixedModels}
+package for \proglang{Julia} provide functions to fit and analyze
+linear mixed models (LMMs), generalized linear mixed models (GLMMs)
+and nonlinear mixed models (NLMMs).  In each of these names, the term
+``mixed'' or, more fully, ``mixed-effects'', denotes a model that
+incorporates both fixed-effects terms and random-effects terms in a
+linear predictor expression from which the conditional mean of the
+response can be evaluated.  In this paper we describe the formulation
+and representation of linear mixed models.  The techniques used for
+generalized linear and nonlinear mixed models will be described
+separately.
+
+Just as a linear model can be described in terms of the distribution
+of $\mc{Y}$, the vector-valued random variable whose observed value is
+$\yobs$, the observed response vector, a linear mixed model can be
+described by the distribution of two vector-valued random variables:
+$\mc{Y}$, the response, and $\mc{B}$, the vector of random effects.  In
+a linear model the distribution of $\mc Y$ is multivariate normal,%\begin{linenomath}
+\begin{equation}
+  \label{eq:linearmodel}
+  \mc Y\sim\mc{N}(\bm X\bm\beta,\sigma^2\bm I_n),
+\end{equation}%\end{linenomath}
+where $n$ is the dimension of the response vector, $\bm I_n$ is the
+identity matrix of size $n$, $\bm\beta$ is a $p$-dimensional
+coefficient vector and $\bm X$ is an $n\times p$ model matrix. The
+parameters of the model are the coefficients, $\bm\beta$, and the
+scale parameter, $\sigma$.
+
+In a linear mixed model it is the \emph{conditional} distribution of
+$\mc Y$ given $\mc B=\bm b$ that has such a form,%\begin{linenomath}
+\begin{equation}
+  \label{eq:LMMcondY}
+  (\mc Y|\mc B=\bm b)\sim\mc{N}(\bm X\bm\beta+\bm Z\bm b,\sigma^2\bm I_n),
+\end{equation}%\end{linenomath}
+where $\bm Z$ is the $n\times q$ model matrix for the $q$-dimensional
+vector-valued random effects variable, $\mc B$, whose value we are
+fixing at $\bm b$.  The unconditional distribution of $\mc B$ is also
+multivariate normal with mean zero and a parameterized $q\times q$
+variance-covariance matrix, $\bm\Sigma$,%\begin{linenomath}
+\begin{equation}
+  \label{eq:LMMuncondB}
+  \mc B\sim\mc N(\bm0,\bm\Sigma) .
+\end{equation}%\end{linenomath}
+As a variance-covariance matrix, $\bm\Sigma$ must be positive
+semidefinite.  It is convenient to express the model in terms of a
+\emph{relative covariance factor}, $\bm\Lambda_\theta$, which is a
+$q\times q$ matrix, depending on the \emph{variance-component
+  parameter}, $\bm\theta$, and generating the symmetric $q\times q$
+variance-covariance matrix, $\bm\Sigma$, according to%\begin{linenomath}
+\begin{equation}
+  \label{eq:relcovfac}
+  \bm\Sigma_\theta=\sigma^2\bm\Lambda_\theta\bm\Lambda_\theta\trans ,
+\end{equation}%\end{linenomath}
+where $\sigma$ is the same scale factor as in the conditional
+distribution (\ref{eq:LMMcondY}).
+
+Although $q$, the number of columns in $\bm Z$ and the
+size of $\bm\Sigma_{\bm\theta}$, can be very large indeed, the
+dimension of $\bm\theta$ is small, frequently less than 10.
+
+In calls to the \code{lm} function for fitting linear models the form
+of the model matrix $\bm X$ is determined by the \code{formula} and
+\code{data} arguments. The right-hand side of the formula consists of
+one or more terms that each generate one or more columns in the model
+matrix, $\bm X$.  For \code{lmer} the formula language is extended to
+allow for random-effects terms that generate the model matrix $\bm Z$
+and the mapping from $\bm\theta$ to $\bm\Lambda_{\bm\theta}$.
+
+To understand why the formulation in equations \ref{eq:LMMcondY} and
+\ref{eq:LMMuncondB} is particularly useful, we first show that the
+profiled deviance (negative twice the log-likelihood) and the profiled
+REML criterion can be expressed as a function of $\bm\theta$ only.
+Furthermore these criteria can be evaluated quickly and accurately.
+
+\section{Profiling the deviance and the REML criterion}
+\label{sec:profdev}
+
+As stated above, $\bm\theta$ determines the $q\times q$ matrix
+$\bm\Lambda_\theta$ which, together with a value of $\sigma^2$,
+determines
+$\Var(\mc B)=\bm\Sigma_\theta=\sigma^2\bm\Lambda_\theta\bm\Lambda_\theta\trans$.
+If we define a \emph{spherical}%
+\footnote{$\mathcal{N}(\bm\mu,\sigma^2\bm I)$
+  distributions are called ``spherical'' because contours of the
+  probability density are spheres.}
+\emph{random effects} variable, $\mc U$, with distribution%\begin{linenomath}
+\begin{equation}
+  \label{eq:sphericalRE}
+  \mc U\sim\mathcal{N}(\bm 0,\sigma^2\bm I_q),
+\end{equation}%\end{linenomath}
+and set%\begin{linenomath}
+\begin{equation}
+  \label{eq:BU}
+  \mc B=\bm\Lambda_\theta\mc U,
+\end{equation}%\end{linenomath}
+then $\mc B$ will have the desired $\mathcal{N}(\bm
+0,\bm\Sigma_\theta)$ distribution.
+
+Although it may seem more natural to define $\mc U$ in terms of $\mc
+B$ we must write the relationship as in \eq{eq:BU} because
+$\bm\Lambda_\theta$ may be singular.  In fact, it is important to
+allow for $\bm\Lambda_\theta$ to be singular because situations
+where the parameter estimates, $\widehat{\bm\theta}$, produce a
+singular $\bm\Lambda_{\widehat{\theta}}$ do occur in practice.  And
+even if the parameter estimates do not correspond to a singular
+$\bm\Lambda_\theta$, it may be necessary to evaluate the estimation
+criterion at such values during the course of the numerical
+optimization of the criterion.
+
+The model can now be defined in terms of%\begin{linenomath}
+\begin{equation}
+  \label{eq:LMMcondYU}
+  (\mc Y|\mc U=\bm u)\sim\mc{N}(\bm Z\bm\Lambda_\theta\bm u+\bm X\bm\beta,\sigma^2\bm I_n)
+\end{equation}%\end{linenomath}
+producing the joint density function%\begin{linenomath}
+\begin{equation}
+  \label{eq:jointDens}
+  \begin{aligned}
+    f_{\mc Y,\mc U}(\bm y,\bm u)&
+    =f_{\mc Y|\mc U}(\bm y|\bm u)\,f_{\mc U}(\bm u)\\
+    &=\frac{\exp(-\frac{1}{2\sigma^2}\|\bm y-\bm X\bm\beta-\bm Z\bLt\bm u\|^2)}
+    {(2\pi\sigma^2)^{n/2}}\;
+    \frac{\exp(-\frac{1}{2\sigma^2}\|\bm u\|^2)}{(2\pi\sigma^2)^{q/2}}\\
+    &=\frac{\exp(-
+      \left[\|\bm y-\bm X\bm\beta-\bm Z\bLt\bm u\|^2+\|\bm u\|^2\right]/(2\sigma^2))}
+    {(2\pi\sigma^2)^{(n+q)/2}} .
+  \end{aligned}
+\end{equation}%\end{linenomath}
+
+The \emph{likelihood} of the parameters, $\bm\theta$, $\bm\beta$ and
+$\sigma^2$, given the observed data is the value of the
+marginal density of $\mc Y$, evaluated at $\yobs$.  That is%\begin{linenomath}
+\begin{equation}
+  \label{eq:likelihoodLMM}
+  L(\bm\theta,\bm\beta,\sigma^2|\yobs)=\int_{\mathbb{R}^q}f_{\mc Y,\mc
+    U}(\yobs,\bm u)\,d\bm u .
+\end{equation}%\end{linenomath}
+The integrand in \eq{eq:likelihoodLMM} is the \emph{unscaled
+  conditional density} of $\mc U$ given $\mc Y=\yobs$.  The
+conditional density of $\mc U$ given $\mc Y=\yobs$ is%\begin{linenomath}
+\begin{equation}
+  \label{eq:condDens}
+  f_{\mc U|\mc Y}(\bm u|\yobs)=\frac{f_{\mc Y,\mc U}(\yobs,\bm u)}
+  {\int f_{\mc Y,\mc U}(\yobs,\bm u)\,d\bm u}
+\end{equation}%\end{linenomath}
+which is, up to a scale factor, the joint density, $f_{\mc Y,\mc
+  U}(\yobs,\bm u)$ evaluated at $\bm y=\yobs$.  The unscaled conditional density will be, up to a
+scale factor, a $q$-dimensional multivariate Gaussian with an integral
+that is easily evaluated if we know its mean and variance-covariance
+matrix.
+
+The conditional mean, $\bm\mu_{\mc U|\mc Y=\yobs}$, is also the mode
+of the conditional distribution.  Because a constant factor in a
+function does not affect the location of the optimum, we can determine
+the conditional mode, and hence the conditional mean, by maximizing
+the unscaled conditional density.  This takes the form of a
+\emph{penalized linear least squares} problem,%\begin{linenomath}
+\begin{equation}
+  \label{eq:PLSprob}
+  \bm\mu_{\mc U|\mc Y=\yobs}=\arg\min_{\bm u}
+  \left(\left\|\yobs-\bm X\bm\beta-\bm Z\bLt\bm u\right\|^2 +
+    \left\|\bm u\right\|^2\right) .
+\end{equation}%\end{linenomath}
+
+In the so-called ``pseudo-data'' approach to solving such penalized
+least squares problems we write the objective as a residual sum of
+squares for an extended response vector and model matrix%\begin{linenomath}
+\begin{equation}
+  \label{eq:pseudoData}
+  \left\|\yobs-\bm X\bm\beta-\bm Z\bLt\bm u\right\|^2 +
+  \left\|\bm u\right\|^2 =
+  \left\| \begin{bmatrix}\yobs-\bm X\bm\beta\\\bm 0\end{bmatrix}-
+    \begin{bmatrix}\bm Z\bLt\\\bm I_q\end{bmatrix}
+    \bm u\right\|^2.
+\end{equation}%\end{linenomath}
+The contribution to the residual sum of squares from the ``pseudo''
+observations appended to $\yobs-\bm X\bm\beta$ is exactly the penalty
+term, $\left\|\bm u\right\|^2$.
+
+From \eq{eq:pseudoData} we can see that the conditional mean satisfies%\begin{linenomath}
+\begin{equation}
+  \label{eq:PLSsol}
+  \left(\bLt\trans\bm Z\trans\bm Z\bLt+\bm I_q\right)
+  \bm\mu_{\mc U|\mc Y=\yobs}=\bLt\trans\bm Z\trans(\yobs-\bm X\bm\beta),
+\end{equation}%\end{linenomath}
+which would be interesting, but not terribly useful, were it not for
+the fact that we can determine the solution to \eq{eq:PLSsol} quickly
+and accurately, even when $q$, the size of the system to solve, is
+very large indeed.  As described in sect.~\ref{sec:PLSnumer}, the
+special structures of the matrices $\bm Z$ and $\bLt$ for
+mixed-effects models described by a mixed-model formula result in
+$\bLt\trans\bm Z\trans\bm Z\bLt$ being very sparse, so that the
+sparse Cholesky factorization~\citep[Ch.~4]{davis06:csparse_book} can
+be used to solve eqn.~\ref{eq:PLSsol}.
+
+We can write the general approach in terms of a fill-reducing
+permutation matrix $\bm P$~\citep[Ch.~7]{davis06:csparse_book}, which
+is determined by the pattern of non-zeros in $\bm Z\bLt$ but does
+not depend on $\bm\theta$ itself, and the \emph{sparse Cholesky
+  factor}, $\bm L_\theta$, which is a sparse, lower-triangular matrix
+such that%\begin{linenomath}
+\begin{equation}
+  \label{eq:sparseChol}
+  \bm L_\theta\bm L\trans_\theta=\bm P
+  \left(\bLt\trans\bm Z\trans\bm Z\bLt+\bm I_q\right)
+  \bm P\trans.
+\end{equation}%\end{linenomath}
+
+\subsection{Evaluating the likelihood}
+\label{sec:evallike}
+
+After solving eqn.~\ref{eq:PLSsol} for $\bm\mu_{\mc U|\mc Y=\yobs}$,
+the exponent of $f_{\mc Y,\mc U}(\yobs, \bm u)$
+(eqn.~\ref{eq:jointDens}) can be written%\begin{linenomath}
+\begin{equation}
+  \label{eq:PLS}
+  \|\yobs-\bm X\bm\beta-\bm Z\bLt\bm u\|^2+\|\bm u\|^2=
+  r^2(\bm\theta,\bm\beta)+
+  \|\bm L_\theta\trans\bm P(\bm u-\bm\mu_{\mc U|\mc Y=\yobs})\|^2.
+\end{equation}%\end{linenomath}
+where $r^2(\bm\theta,\bm\beta)=\|\yobs-\bm X\bm\beta-
+\bm Z\bm\Lambda_\theta\bm\mu_{\mc U|\mc Y=\yobs}\|^2+
+\|\bm\mu_{\mc U|\mc Y=\yobs}\|^2$, is the minimum
+penalized residual sum of squares for the given values of $\bm\theta$ and
+$\bm\beta$.
+
+With expression (\ref{eq:PLS}) and the change of variable $\bm v=\bm
+L_\theta\trans\bm P(\bm u-\bm\mu_{\mc U|\mc Y=\yobs})$, for which $d\bm
+v=\abs(|\bm L_\theta||\bm P|)\,d\bm u$, we have%\begin{linenomath}
+\begin{equation}
+  \label{eq:intExpr}
+  \int\frac{\exp\left(\frac{-\|\bm L_\theta\trans\bm P(\bm u-\bm\mu_{\mc U|\mc Y})\|^2}
+      {2\sigma^2}\right)}
+  {(2\pi\sigma^2)^{q/2}}\,d\bm u \\
+  = \int\frac{\exp\left(\frac{-\|\bm
+        v\|^2}{2\sigma^2}\right)}{(2\pi\sigma^2)^{q/2}}\,\frac{d\bm
+    v}{\abs(|\bm L_\theta||\bm P|)} = \frac{1}{\abs(|\bm L_\theta||\bm
+    P|)}=\frac{1}{|\bm L_\theta|}
+\end{equation}%\end{linenomath}
+because $\abs|\bm P|=1$ (one property of a permutation matrix is $|\bm
+P|=\pm1$) and $|\bm L_\theta|$, which, because $\bm L_\theta$ is triangular, is the
+product of its diagonal elements, all of which are positive, is
+positive.
+
+Using this expression we can write the deviance (negative twice the
+log-likelihood) as%\begin{linenomath}
+\begin{equation}
+  \label{eq:deviance}
+  -2\ell(\bm\theta,\bm\beta,\sigma^2|\yobs)=-2\log L(\bm\theta,\bm\beta,\sigma^2|\yobs)=
+  n\log(2\pi\sigma^2)+\frac{r^2(\bm\theta,\bm\beta)}{\sigma^2}+
+  \log(|\bm L_\theta|^2)
+\end{equation}%\end{linenomath}
+Because the dependence of \eq{eq:deviance} on $\sigma^2$ is
+straightforward, we can form the conditional estimate%\begin{linenomath}
+\begin{equation}
+  \label{eq:conddev}
+  \widehat{\sigma^2}(\bm\theta,\bm\beta)=\frac{r^2(\bm\theta,\bm\beta)}{n} ,
+\end{equation}%\end{linenomath}
+producing the \emph{profiled deviance}%\begin{linenomath}
+\begin{equation}
+  \label{eq:profdev1}
+  -2\tilde{\ell}(\bm\theta,\bm\beta|\yobs)=\log(|\bm L_\theta|^2)+
+  n\left[1+\log\left(\frac{2\pi r^2(\bm\theta,\bm\beta)}{n}\right)\right] .
+\end{equation}%\end{linenomath}
+
+However, observing that \eq{eq:profdev1} depends on $\bm\beta$ only
+through $r^2(\bm\theta,\bm\beta)$ allows us to perform a much greater
+simplification by ``profiling out'' the fixed-effects parameter,
+$\bm\beta$, from the evaluation of the deviance.  The conditional
+estimate, $\widehat{\bm\beta}_\theta$, is the value of $\bm\beta$ at
+the solution of the joint penalized least squares problem%\begin{linenomath}
+\begin{equation}
+  \label{eq:jointPLS}
+  r^2_\theta=\min_{\bm u,\bm\beta}
+  \left(\left\|\bm y-\bm X\bm\beta-\bm Z\bLt\bm u\right\|^2 +
+    \left\|\bm u\right\|^2\right).
+\end{equation}%\end{linenomath}
+
+The solutions, $\bm\mu_{\mc U|\mc Y=\yobs}$ and
+$\widehat{\bm\beta}_\theta$, of the joint penalized least squares
+problem (\ref{eq:jointPLS}) satisfy%\begin{linenomath}
+\begin{equation}
+  \label{eq:jointPLSeqn}
+  \begin{bmatrix}
+    \bLt\trans\bm Z\trans\bm Z\bLt+\bm I_q & \bm
+    \bLt\trans\bm Z\trans\bm X\\
+    \bm X\trans\bm Z\bLt & \bm X\trans\bm X
+  \end{bmatrix}
+  \begin{bmatrix}
+    \bm\mu_{\mc U|\mc Y=\yobs}\\\widehat{\bm\beta}_\theta
+  \end{bmatrix}=
+  \begin{bmatrix}\bLt\trans\bm Z\trans\yobs\\\bm X\trans\yobs .
+  \end{bmatrix}
+\end{equation}%\end{linenomath}
+
+After solving eqn.~\ref{eq:jointPLS} we can write the \emph{profiled
+  deviance} as%\begin{linenomath}
+\begin{equation}
+  \label{eq:profdev2}
+  -2\tilde{\ell}(\bm\theta)=\log(|\bm L_\theta|^2)+
+  n\left[1+\log\left(\frac{2\pi r^2_\theta}{n}\right)\right],
+\end{equation}%\end{linenomath}
+which is a function of $\bm\theta$ only.  Eqn.~\ref{eq:profdev2} is a
+remarkably compact expression for the deviance.
+
+\subsection{The profiled REML criterion}
+\label{sec:profiledREML}
+
+\citet{laird_ware_1982} show that the criterion to be optimized by the
+REML estimates can be expressed as%\begin{linenomath}
+\begin{equation}
+  \label{eq:REMLcrit}
+  L_R(\bm\theta,\sigma^2|\yobs)=\int
+  L(\bm\theta,\bm\beta,\sigma^2|\yobs)\,d\bm\beta .
+\end{equation}%\end{linenomath}
+
+Let us write a partitioned Cholesky factorization of the system matrix
+for the joint penalized least squares problem, eqn.~\ref{eq:jointPLSeqn},
+as%\begin{linenomath}
+\begin{equation}
+  \label{eq:fulldecomp}
+  \begin{bmatrix}
+    \bm P\trans\bm L& \bm 0\\
+    \bm R_{ZX}\trans & \bm R_X\trans
+  \end{bmatrix}
+  \begin{bmatrix}
+    \bm L\trans\bm P & \bm R_{ZX}\\
+    \bm 0            & \bm R_X
+  \end{bmatrix}=
+  \begin{bmatrix}
+    \bLt\trans\bm Z\trans\bm Z\bLt+\bm I & \bLt\trans\bm Z\trans\bm X\\
+    \bm X\trans\bm Z\bLt       & \bm X\trans\bm X
+  \end{bmatrix} .
+\end{equation}%\end{linenomath}
+When the fixed-effects model matrix $\bm X$ is dense, the $q\times p$
+matrix $\bm R_{ZX}$ and the $p\times p$ upper triangular matrix $\bm
+R_X$ will also be dense.  Occasionally it makes sense to store $\bm X$
+as a sparse matrix in which case $\bm R_{ZX}$ and $\bm R_X$ could also
+be sparse.
+
+Because the joint solutions, $\bm\mu_{\mc U|\mc Y=\yobs}$ and
+$\widehat{\bm\beta}_\theta$, to eqn.~\ref{eq:jointPLSeqn} allow us to express%\begin{linenomath}
+\begin{multline}
+  \label{eq:PLS2}
+  \|\yobs-\bm X\bm\beta-\bm Z\bLt\bm u\|^2+\|\bm u\|^2=\\
+  r^2_\theta+
+  \left\|\bm L_\theta\trans\bm P\left[\bm u-\bm\mu_{\mc U|\mc Y=\yobs}-
+  \bm R_{ZX}(\bm\beta - \widehat{\bm\beta}_\theta)\right]\right\|^2+
+  \left\|\bm R_X(\bm\beta - \widehat{\bm\beta}_\theta)\right\|^2
+\end{multline}%\end{linenomath}
+we can use a change of variable, similar to that in
+\eq{eq:intExpr}, to evaluate the profiled REML criterion.  On the
+deviance scale the criterion can be evaluated as%\begin{linenomath}
+\begin{equation}
+  \label{eq:profiledREML}
+  -2\tilde{\ell}_R(\bm\theta)=\log(|\bm L|^2)+\log(|\bm R_X|^2)+
+  (n-p)\left[1+\log\left(\frac{2\pi r^2_\theta}{n-p}\right)\right] .
+\end{equation}%\end{linenomath}
+
+\section{Random-effects terms and resulting matrix properties}
+\label{sec:PLSnumer}
+
+In fitting linear mixed models an instance of the penalized least
+squares (PLS) problem (\ref{eq:jointPLSeqn}) must be solved at each
+evaluation of the objective function - either the profiled deviance or
+the profiled REML criterion - during the optimization with respect to
+$\bm\theta$.  Because this operation must be performed many times it
+is worthwhile studying the properties of $\bm Z$ and $\bLt$ in general
+and for common special cases.
+
+\subsection{Random-effects terms in mixed-model formulas}
+\label{sec:LMMmatrix}
+
+A mixed-model formula incorporates $k\ge1$ random-effects terms of the
+form \code{(r|f)} where \code{r} is a linear model formula and
+\code{f} is an expression that evaluates to a factor, called the
+\code{grouping factor}, for the term.  In practice, fixed-effects
+model matrices and random-effects terms are evaluated with respect to
+a \emph{model frame}, ensuring that any expressions for grouping
+factors have been evaluated to factors and any unused levels of these
+factors have been dropped.  That is, $\ell_i$, the number of levels in
+the grouping factor for the $i$ random-effects term, is well-defined.
+The vector $\bm i_i$ of \emph{factor indices} for the $i$th term is an
+$n$-vector of values from $1,\dots,\ell_i$.  Let $\bm J_i$ be the
+$n\times \ell_i$ matrix of indicator columns for $\bm i_i$.
+
+When $k>1$ we order the random-effects terms so that
+$\ell_1\ge\ell_2\ge\dots\ge\ell_k$.
+
+Let $\bm X_i$ of size $n\times p_i$ be the model matrix of the linear
+model formula, \code{r}, from the $i$th random-effects term,
+$i=1,\dots,k$.  A term is said to be a \emph{scalar} random-effects
+term when $p_i=1$, otherwise it is \emph{vector-valued}.  For a
+\emph{simple, scalar} random-effects term of the form \code{(1|f)},
+$\bm X_i$ is the $n\times 1$ matrix of ones.
+
+The $i$th random effects term contributes $q_i=\ell_ip_i$ columns,
+which we will write as $\bm Z_i$, to
+the model matrix $\bm Z$.  Thus $q$, the number of columns in $\bm Z$
+and the dimension of the random variables $\mc{B}$ and $\mc{U}$, is%\begin{linenomath}
+  \begin{equation}
+    \label{eq:qcalc}
+    q=\sum_{i=1}^k q_i = \sum_{i=1}^k \ell_i\,p_i .
+  \end{equation}%\end{linenomath}
+The creation of $\bm Z_i$ from $\bm X_i$ and $\bm i_i$ is a
+straightforward concept that is somewhat difficult to describe.
+Consider $\bm Z_i$ as being further decomposed into $\ell_i$ blocks of
+$p_i$ columns.  The rows in the first block are the rows of $\bm X_i$
+multiplied by the 0/1 values in the first column of $\bm J_i$.
+Similarly for the subsequent blocks.
+
+In particular, for a simple, scalar term, $\bm Z_i$ is exactly $\bm
+J_i$, the matrix of indicator columns.  For other scalar terms, $\bm
+Z_i$ is formed by element-wise multiplication of the single column of
+$\bm X_j$ by each of the columns of $\bm J_i$.
+
+Because each $\bm Z_i,i=1,\dots,k$ is generated from indicator
+columns, its cross-product, $\bm Z_i\trans\bm Z_i$ is block-diagonal
+consisting of $\ell_i$ diagonal blocks each of size $p_i$.
+Note that this means that when $k=1$ (i.e. there is only one
+random-effects term), $\bm Z\trans\bm Z$ will be block diagonal.
+
+The $q\times q$ covariance factor, $\bLt$, is a block diagonal matrix
+whose $i$th diagonal block, $\bm\Lambda_i$, is of size
+$q_i,i=1,\dots,k$.  Furthermore, $\bm\Lambda_i$ is a \emph{homogeneous
+  block diagonal} matrix with each of the $\ell_i$ lower-triangular
+blocks on the diagonal being a copy of a $p_i\times p_i$ lower-triangular
+template, $\bm T_i$.  The covariance parameter vector, $\bm\theta$,
+consists of the elements in the lower triangles of the $\bm
+T_i,i=1,\dots,k$.  To provide a unique representation we require that
+the diagonal elements of the $\bm T_i,i=1,\dots,k$ be non-negative.
+
+\subsection{General computational structures and methods}
+\label{sec:generalcomputational}
+
+Although special structures can be exploited to reduce the
+computational load for certain classes of mixed-effects models, we
+will start with the general case to determine suitable data
+structures.  
+
+In a complex model fit to a large data set, the dominant calculation
+in the evaluation of the profiled deviance (eqn.~\ref{eq:profdev2}) or
+REML criterion (eqn.~\ref{eq:profiledREML}) is the sparse Cholesky
+factorization (eqn.~\ref{eq:fulldecomp}) followed by the solution of
+the penalized least squares problem~(eqn.~\ref{eq:jointPLS}).
+
+For fitting other types of models, such as GLMMs, we consider a more
+general problem of%\begin{linenomath}
+\begin{equation}
+  \label{eq:sparseCholW}
+  \bm L_\theta\bm L\trans_\theta=\bm P
+  \left(\bLt\trans\bm Z\trans\bm W\bm Z\bLt+\bm I_q\right)
+  \bm P\trans.
+\end{equation}%\end{linenomath}
+where the $n\times n$ matrix $\bm W$ may change over the course of the
+iterations to optimize the objective.
+
+When fitting a GLMM, $\bm W$ is the diagonal matrix of case weights in
+the Penalized Iteratively-Reweighted Least Squares (PIRLS) algorithm.
+Another example of a varying $\bm W$ would be fitting an LMM in which
+the conditional distribution of the response incorporates a
+correlation matrix that itself depends on
+parameters~\citep[Ch.~5]{R:Pinheiro+Bates:2000}.
+
+In any case, we will assume that $\bm W$ is symmetric and 
+positive semidefinite so that some type of ``square root'' factor,
+$\bm W^{1/2}$, satisfying%\begin{linenomath}
+\begin{equation}
+  \label{eq:Wsqrt}
+  \bm W = \bm W^{1/2}\left(\bm W^{1/2}\right)\trans,
+\end{equation}%\end{linenomath}
+is available.
+
+We wish to use structures and algorithms that allow us to take a new
+value of $\bm\theta$ or a new value of $\bm W^{1/2}$ or both and
+evaluate the $\bm L_\theta$ (eqn.~\ref{eq:sparseChol})
+efficiently.  The key to doing so is the special structure of
+$\bLt\trans\bm Z\trans\bm W^{1/2}$.  To understand why this matrix,
+and not its transpose, is of interest we describe the sparse
+matrix structures used in \proglang{Julia} and in the \pkg{Matrix}
+package for \proglang{R}.
+
+\subsubsection{Compressed sparse column-oriented matrices}
+\label{sec:CSCmats}
+
+Dense matrices are stored in \proglang{Julia} and in \proglang{R} as a
+one-dimensional array of contiguous storage locations addressed in
+\emph{column-major order}.  This means that elements in the same
+column are in adjacent storage locations whereas elements in the same
+row can be widely separated in memory.  For this reason, algorithms
+that work column-wise are preferred to those that work row-wise.
+
+Although a sparse matrix can be stored in a \emph{triplet} format,
+where the row position, column position and element value of each
+nonzero is recorded, the preferred storage forms for actual
+computation with sparse matrices are compressed sparse column (CSC) or
+compressed sparse row (CSR)~\citep[Ch.~2]{davis06:csparse_book}.
+
+The sparse matrix capabilities in \proglang{Julia} and in the
+\pkg{Matrix} package for \proglang{R} are based on SuiteSparse % figure out how to cite this
+which uses the CSC storage format.  In this format the non-zero
+elements in a column are in adjacent storage locations resulting in
+access to a column being much easier than access to a row.
+
+The matrices $\bm Z$ and $\bm Z\bLt$ have the property that number of
+nonzeros in each row, $\sum_{i=1}^k p_i$, is constant.  For CSC
+matrices we want consistency in the columns rather than the rows,
+hence we work with $\bm Z\trans$ and $\bLt\trans\bm Z\trans$ for
+linear mixed-effects models ($\bLt\trans\bm Z\trans\bm W^{1/2}$ for
+GLMMs).
+
+An arbitrary $m\times n$ sparse matrix in CSC format is expressed as
+two vectors of indices, the row indices and the column pointers, and a
+numeric vector of the non-zero values.  The elements of the row
+indices and the nonzeros are aligned and are ordered first by
+increasing column number then by increasing row number within column.
+The column pointers are a vector of size $n+1$ giving the location of
+the start of each column in the vectors of row indices and nonzeros.
+
+Because the number of nonzeros in each column of $\bm Z\trans$, and in
+each column of matrices such as $\bLt\trans\bm Z\trans\bm W^{1/2}$
+derived from $\bm Z\trans$, is constant, the vector of nonzeros in the
+CSC format can be viewed as a dense matrix, say $\bm N$, of size
+$\left(\sum_{i=1}^n p_i\right)\times n$.  We do not need to store the
+column pointers because the columns of $\bm Z\trans$ correspond to
+columns of $\bm N$.  All we need is $\bm N$, the dense matrix of
+nonzeros, and the row indices, which are derived from the grouping
+factor index vectors $\bm i_i,i=1,\dots,k$ and can also be arranged as
+a dense matrix of size $\left(\sum_{i=1}^n p_i\right)\times n$.
+Matrices like $\bm Z\trans$, with the property that there are the same
+number of nonzeros in each column, are sometimes called \emph{regular
+  sparse column-oriented} (RSC) matrices.
+
+\subsubsection{Representing the random-effects vector}
+\label{sec:revector}
+
+Although we solve for the conditional mode, $\tilde{\bm u}$, as a
+vector it is useful to view this vector as being partitioned into $k$
+sections of sizes $p_i\ell_i,i=1,\dots,k$ and consider each section as
+a $p_i\times\ell_i$ matrix.  This allows us to index by term and by
+level of grouping factor within term.
+% We usually consider these
+% matrices transposed (for example, the \code{ranef} extractor method
+% returns the transposes of these matrices) but the ordering of elements
+% within $\tilde{\bm u}$ is more conveniently expressed in this
+% orientation.
+
+\subsection{Minimal representation for general LMMs}
+\label{sec:MinimalRep}
+
+In listing~\ref{lst:LMMGeneral}
+\begin{listing}[tbp]
+  \begin{minted}[linenos=true]{julia}
+type LMMGeneral{Ti<:Union(Int32,Int64)}
+    L::CholmodFactor{Float64,Ti}
+    LambdatZt::CholmodSparse{Float64,Ti}
+    RX::Cholesky{Float64}
+    X::ModelMatrix                      # fixed-effects model matrix
+    Xs::Vector{Matrix{Float64}}         # X_1,X_2,...,X_k
+    beta::Vector{Float64}
+    inds::Vector{Any}
+    lambda::Vector{Matrix{Float64}}     # k lower-triangular matrices
+    mu::Vector{Float64}
+    u::Vector{Matrix{Float64}}
+    y::Vector{Float64}
+    REML::Bool
+end
+  \end{minted}
+  \caption{User-defined \proglang{Julia} type for general linear
+    mixed-effects models}
+  \label{lst:LMMGeneral}
+\end{listing}
+we show a \proglang{Julia} user-defined type that contains the minimal
+information to represent a linear mixed-effects model as described
+above.  The declaration of the type in line 1 creates a templated type
+with actual types using 32-bit integers in the sparse matrix and its
+factor or 64-bit integers.  Because these can be very large objects
+containing long vectors of indices it is worthwhile using 32-bit
+integers when possible.  The fields \code{L} and \code{LambdatZt}
+contain the current sparse Cholesky factor, $\bm L_\theta$, and the
+sparse matrix, $\bLt\trans\bm Z\trans$, in \proglang{Julia} types
+corresponding to CHOLMOD structures \code{cholmod_factor} and
+\code{cholmod_sparse}.  The \code{X} field is a \code{ModelMatrix}
+type, similar to a \code{model.matrix} in \proglang{R}, containing the
+matrix itself (\code{X.m}) and the \code{assign} vector associating
+columns with fixed-effects terms in the model.  The random-effects
+model matrix, $\bm Z$, is represented by the vector of matrices,
+\code{Xs}, containing $k$ matrices of sizes $n\times p_i,i=1,\dots,k$
+and the \code{inds} vector of length $k$ containing the indices, $\bm
+i_i,i=1,\dots,k$, each a numeric vector of length $n$.  These are
+stored as a \code{Vector\{Any\}} (similar to an unnamed \code{list} in
+\proglang{R}) because they may be of different types.  The indices
+themselves are vectors of unsigned integers, either \code{Uint8} or
+\code{Uint16} or \code{Uint32} or \code{Uint64}, whichever size is
+sufficiently large to hold the largest index for a particular grouping
+factor.
+
+The \code{lambda} field contains the $k$ lower-triangular template
+matrices, $\bm T_i$, of size $p_i,i=1,\dots,k$.  The \code{u} field
+stores $\bm u$ in the form described in the previous section.  The
+\code{beta} and \code{y} fields are self-explanatory.  The \code{REML}
+field is \code{true} if the REML criterion is to be used for parameter
+estimation (see listing~\ref{lst:objective}, p.~\pageref{lst:objective}).
+
+We say this is a minimal representation.  In practice it is worthwhile
+caching several of the values calculated during each evaluation of the
+objective function.  To avoid clutter we omit many such fields at this
+point.  We do include the \code{RX} and \code{mu} fields that,
+strictly speaking, can be omitted from a minimal representation, so
+that some of the updates in listings~\ref{lst:solve!} and
+\ref{lst:linpred!} can be done in place.  The \code{RX} field is of
[TRUNCATED]

To get the complete diff run:
    svnlook diff /svnroot/lme4 -r 1846