[Lme4-commits] r1848 - www/JSS

Fri Aug 30 00:34:58 CEST 2013

Author: mmaechler
Date: 2013-08-30 00:34:57 +0200 (Fri, 30 Aug 2013)
New Revision: 1848

Added:
   www/JSS/lmer.Rnw_old
Removed:
   www/JSS/lmer.Rnw
Modified:
   www/JSS/Makefile
   www/JSS/glmer.Rnw
Log:
update Makefile; add current targets

Modified: www/JSS/Makefile
===================================================================

--- www/JSS/Makefile	2013-08-26 14:45:46 UTC (rev 1847)
+++ www/JSS/Makefile	2013-08-29 22:34:57 UTC (rev 1848)
@@ -3,7 +3,7 @@
 TOUCH = touch
 REXE = R --vanilla
 RCMD = $(REXE) CMD
-RSCRIPT = Rscript --vanilla
+RSCRIPT = Rscript --no-save --no-restore
 
 CC = gcc
 CXX = g++
@@ -14,8 +14,8 @@
 FFLAGS = -O4
 INSTALL = install
 
-LATEX = latex
-PDFLATEX = pdflatex
+LATEX = latex -shell-escape
+PDFLATEX = pdflatex -shell-escape
 BIBTEX = bibtex
 MAKEIDX = makeindex
 
@@ -25,13 +25,13 @@
 TTH = htlatex # tth -E2 -e2
 TTM = ttm -E2 -e2
 
-default:
+default: lmer.pdf glmer.pdf glmerAppendix.pdf
 
 .PHONY:
 
 %.tex: %.Rnw
-	$(RSCRIPT) -e "Sweave(\"$*.Rnw\")"
-## $(RSCRIPT) -e "library(knitr); knit(\"$*.Rnw\")"
+	$(RSCRIPT) -e "library(knitr); knit(\"$*.Rnw\")"
+##	$(RSCRIPT) -e "Sweave(\"$*.Rnw\")"
 
 %.tex: %.txt
 	pandoc -s -o $*.tex $*.txt
@@ -89,6 +89,6 @@
 	$(FC) $(FFLAGS) $*.f $(LDLIBS) -o $*
 
 clean:
-	$(RM) *.o *.tex *.log *.aux *.toc *.blg *.bbl *.out *-???.pdf Rplots.ps
+	$(RM) *.o *.log *.aux *.toc *.blg *.bbl *.out *-???.pdf Rplots.ps
 
 .SECONDARY:

Modified: www/JSS/glmer.Rnw
===================================================================
--- www/JSS/glmer.Rnw	2013-08-26 14:45:46 UTC (rev 1847)
+++ www/JSS/glmer.Rnw	2013-08-29 22:34:57 UTC (rev 1848)
@@ -69,13 +69,13 @@
 %
 
 <<preliminaries,include=FALSE,cache=FALSE>>=
-options(width=69,show.signif.stars=FALSE,str=strOptions(strict.width="cut"))
+options(width=69, show.signif.stars=FALSE, str=strOptions(strict.width="cut"))
 library(knitr)
 library(lme4)
 opts_chunk$set(engine='R',dev='pdf',fig.width=10,
                fig.height=6.5,strip.white=all,
                cache=TRUE,tidy=FALSE,prompt=TRUE,comment=NA)
-@
+@ % $ <- for emacs ESS
 \setkeys{Gin}{width=\textwidth}
 \begin{document}
 \section{Introduction}
@@ -88,12 +88,12 @@
 The generalized linear mixed models (GLMMs) that can be fit by the
 \pkg{lme4} package preserve the multivariate Gaussian unconditional
 distribution of the random effects, $\mc B$
-(eqn.~\ref{eq:LMMuncondB}).  
+(eqn.~\ref{eq:LMMuncondB}).
 Because most families used for the conditional
 distribution, $\mc Y|\mc B=\bm b$, do not incorporate a separate scale
 factor, $\sigma$, we remove it from the definition of $\bm\Sigma$ and
 from the distribution of the spherical random effects, $\mc U$.  That
-is, 
+is,
 \begin{equation}
   \label{eq:UdistGLMM}
   \mc U\sim\mc N(\bm0,\bm I_q)
@@ -163,7 +163,7 @@
 to $\bm u$ so the unscaled conditional density is indeed well-defined
 as a density, up to a scale factor.
 
-To evaluate the integrand in (\ref{eq:GLMMlike}) 
+To evaluate the integrand in (\ref{eq:GLMMlike})
 we use the value of
 the \code{dev.resids} function in the GLM family.  This vector,
 $\bm d(\yobs,\bm u)$, with elements, $d_i(\yobs,\bm u), i=1,\dots,n$,
@@ -171,7 +171,7 @@
 \begin{displaymath}
   \sum_{i=1}^n d_i(\yobs,\bm u) .
 \end{displaymath}
-(We should note that there 
+(We should note that there
 some confusion in \proglang{R} (and in its predecessor,
 \proglang{S}) about what exactly the deviance residuals
 for a family are.  As indicated above, we will use this name for the
@@ -196,7 +196,7 @@
 Gaussian, this conditional mode will also be the conditional mean.
 However, for most families used in GLMMs, the mode and the mean need
 not coincide so use the more general term and call $\tilde{\bm
-  u}_{\beta,\theta}$ the \emph{conditional mode}.  We first describe 
+  u}_{\beta,\theta}$ the \emph{conditional mode}.  We first describe
 the numerical methods
 for determining the conditional mode using the Penalized Iteratively
 Reweighted Least Squares (PIRLS) algorithm then return to the question
@@ -312,7 +312,7 @@
 identity, $\theta\bm I_q$, and $\bm Z$ is the $n\times q$ matrix of
 indicators of the levels of $\bm f$.  The permutation matrix, $\bm
 P$, can be set to the identity and $\bm L$ is diagonal, but not
-necessarily a multiple of the identity.  
+necessarily a multiple of the identity.
 
 Because each element of $\bm\mu$ depends on only one element of $\bm
 u$ and the elements of $\mc Y$ are conditionally independent, given
@@ -351,7 +351,7 @@
 incidence according to season and herd, available in the \pkg{lme4} package.
 <<strcbpp>>=
 str(cbpp)
-@ 
+@
 and the model
 <<m1>>=
 print(m1 <- glmer(cbind(incidence, size-incidence) ~ period + (1|herd),
@@ -397,7 +397,7 @@
            panel.lines(zm$zvals, dnorm(zm$zvals), lty=2)
            panel.xyplot(...)}
        )
-@ 
+@
   \caption{Comparison of univariate integrands (solid line) and standard normal density function (dashed line)}
   \label{fig:densities}
 \end{figure}
@@ -425,7 +425,7 @@
 standard normal kernel for orders $k$ from 1 to 25.  For example,
 <<GHrule5>>=
 GHrule(5)
-@ 
+@
 
 The choice of the value of $k$ depends on the behavior of the function
 $t(z)$.  If $t(z)$ is a polynomial of degree $k-1$ then the
@@ -453,7 +453,7 @@
 xyplot(as.vector(dmat/dnorm(zm$zvals)) ~ rep.int(zm$zvals, ncol(dmat))|gl(ncol(dmat), nrow(dmat)),
        type=c("g","l"), aspect=0.6, layout=c(5,3),
        xlab="z", ylab="t(z)")
-@ 
+@
   \caption{The function $t(z)$, which is the ratio of the normalized
     unscaled conditional density to the standard normal density, for
     each of the univariate integrals in the evaluation of the deviance
@@ -468,21 +468,21 @@
 
 <<cbpp>>=
 head(cbpp)
-@ 
+@
 <<cbppPlot>>=
 library(ggplot2)
-ggplot(cbpp) + facet_wrap(~herd) + 
+ggplot(cbpp) + facet_wrap(~herd) +
   geom_point(aes(period, incidence/size, size = size)) +
   scale_y_continuous(limits = c(0, 1))
 boxplot(incidence/size ~ period, data = cbpp, las = 1,
         xlab = 'Period', ylab = 'Probability of sero-positivity')
-@ 
+@
 To account for repeated measures
 <<cbppModelI>>=
 (gm1 <- glmer(incidence/size ~ period + (1 | herd), family = binomial,
       data = cbpp, weights = size))
 summary(gm1)
-@ 
+@
 <<cbppConfidence>>=
 #profile.gm1 <- profile(gm1)
 #confint(gm1)

Deleted: www/JSS/lmer.Rnw
===================================================================
--- www/JSS/lmer.Rnw	2013-08-26 14:45:46 UTC (rev 1847)
+++ www/JSS/lmer.Rnw	2013-08-29 22:34:57 UTC (rev 1848)
@@ -1,741 +0,0 @@
-\documentclass{jss}
-%% need no \usepackage{Sweave.sty}
-\usepackage{lineno}
-\newcommand{\bmb}[1]{{\color{red} \emph{#1}}}
-\newcommand{\scw}[1]{{\color{blue} \emph{#1}}}
-\usepackage[american]{babel}  %% for texi2dvi ~ bug
-\usepackage{bm,amsmath,thumbpdf,amsfonts}
-\author{Douglas Bates\\U. of Wisconsin - Madison\And
-  Martin M\"achler\\ETH Zurich\And
-  Ben Bolker\\McMaster University\And
-  Steven C. Walker\\McMaster University
-}
-\Plainauthor{Douglas Bates, Martin M\"achler, Ben Bolker, Steve Walker}
-\title{Fitting linear mixed-effects models using \pkg{lme4}}
-\Plaintitle{Fitting linear mixed-effects models using lme4}
-\Shorttitle{Linear mixed models with lme4}
-\Abstract{%
-  Maximum likelihood or restricted maximum likelihood (REML)
-  estimates of the parameters in linear
-  mixed-effects models can be determined using the \code{lmer}
-  function in the \pkg{lme4} package for \proglang{R}. As in most
-  model-fitting functions, the model is described in an \code{lmer}
-  call by a formula, in this case including both fixed-effects terms
-  and random-effects terms. The formula and data together determine a
-  numerical representation of the model from which the profiled
-  deviance or the profiled REML criterion can be evaluated as a
-  function of some of the model parameters.  The appropriate criterion
-  is optimized, using one of the constrained optimization functions in
-  \proglang{R}, to provide the parameter estimates.  We describe the
-  structure of the model, the steps in evaluating the profiled
-  deviance or REML criterion and the structure of the S4 class
-  that represents such a model.  Sufficient detail is
-  included to allow specialization of these structures by those who
-  wish to write functions to fit specialized linear mixed models, such
-  as models incorporating pedigrees or smoothing splines, that are not
-  easily expressible in the formula language used by \code{lmer}.
-}
-\Keywords{%
-  sparse matrix methods,
-  linear mixed models,
-  penalized least squares,
-  Cholesky decomposition}
-\Address{
-  Douglas Bates\\
-  Department of Statistics, University of Wisconsin\\
-  1300 University Ave.\\
-  Madison, WI 53706, U.S.A.\\
-  E-mail: \email{bates at stat.wisc.edu}
-  \par\bigskip
-  Martin M\"achler\\
-  Seminar f\"ur Statistik, HG G~16\\
-  ETH Zurich\\
-  8092 Zurich, Switzerland\\
-  E-mail: \email{maechler at stat.math.ethz.ch}\\
-  % URL: \url{http://stat.ethz.ch/people/maechler}
-  \par\bigskip
-  Benjamin M. Bolker\\
-  Departments of Mathematics \& Statistics and Biology \\
-  McMaster University \\
-  1280 Main Street W \\
-  Hamilton, ON L8S 4K1, Canada \\
-  E-mail: \email{bolker at mcmaster.ca}
-  \par\bigskip
-  Steven C. Walker\\
-  Department of Mathematics \& Statistics \\
-  McMaster University \\
-  1280 Main Street W \\
-  Hamilton, ON L8S 4K1, Canada \\
-  E-mail: \email{scwalker at math.mcmaster.ca }
-}
-\newcommand{\Var}{\operatorname{Var}}
-\newcommand{\abs}{\operatorname{abs}}
-\newcommand{\bLt}{\ensuremath{\bm\Lambda_\theta}}
-\newcommand{\mc}[1]{\ensuremath{\mathcal{#1}}}
-\newcommand{\trans}{\ensuremath{^\prime}}
-\newcommand{\yobs}{\ensuremath{\bm y_{\mathrm{obs}}}}
-\newcommand*{\eq}[1]{eqn.~\ref{#1}}% or just {(\ref{#1})}
-%
-
-\SweaveOpts{engine=R,eps=FALSE,pdf=TRUE,width=10,height=6.5,strip.white=all}
-\SweaveOpts{prefix=TRUE,prefix.string=figs/lmer,include=TRUE}
-\SweaveOpts{keep.source=TRUE}
-\setkeys{Gin}{width=\textwidth}
-
-\begin{document}
-
-<<preliminaries>>=
-options(width=69,show.signif.stars=FALSE,str=strOptions(strict.width="cut"))
-library(lme4)
-@
-
-\linenumbers
-\section{Introduction}
-\label{sec:intro}
-
-The \pkg{lme4} package for \proglang{R} and the \pkg{MixedModels}
-package for \proglang{Julia} provide functions to fit and analyze
-linear mixed models (LMMs), generalized linear mixed models (GLMMs)
-and nonlinear mixed models (NLMMs).  In each of these names, the term
-``mixed'' or, more fully, ``mixed-effects'', denotes a model that
-incorporates both fixed-effects terms and random-effects terms in a
-linear predictor expression from which the conditional mean of the
-response can be evaluated.  In this paper we describe the formulation
-and representation of linear mixed models.  The techniques used for
-generalized linear and nonlinear mixed models will be described
-separately.
-
-Just as a linear model can be described in terms of the distribution
-of $\mc{Y}$, the vector-valued random variable whose observed value is
-$\yobs$, the observed response vector, a linear mixed model can be
-described by the distribution of two vector-valued random variables:
-$\mc{Y}$, the response, and $\mc{B}$, the vector of random effects.  In
-a linear model the distribution of $\mc Y$ is multivariate normal,
-\begin{linenomath}
-\begin{equation}
-  \label{eq:linearmodel}
-  \mc Y\sim\mc{N}(\bm X\bm\beta,\sigma^2\bm I_n),
-\end{equation}
-\end{linenomath}
-where $n$ is the dimension of the response vector, $\bm I_n$ is the
-identity matrix of size $n$, $\bm\beta$ is a $p$-dimensional
-coefficient vector and $\bm X$ is an $n\times p$ model matrix. The
-parameters of the model are the coefficients, $\bm\beta$, and the
-scale parameter, $\sigma$.
-
-In a linear mixed model it is the \emph{conditional} distribution of
-$\mc Y$ given $\mc B=\bm b$ that has such a form,
-\begin{linenomath}
-\begin{equation}
-  \label{eq:LMMcondY}
-  (\mc Y|\mc B=\bm b)\sim\mc{N}(\bm X\bm\beta+\bm Z\bm b,\sigma^2\bm I_n),
-\end{equation}
-\end{linenomath}
-where $\bm Z$ is the $n\times q$ model matrix for the $q$-dimensional
-vector-valued random effects variable, $\mc B$, whose value we are
-fixing at $\bm b$.  The unconditional distribution of $\mc B$ is also
-multivariate normal with mean zero and a parameterized $q\times q$
-variance-covariance matrix, $\bm\Sigma$,
-\begin{linenomath}
-\begin{equation}
-  \label{eq:LMMuncondB}
-  \mc B\sim\mc N(\bm0,\bm\Sigma) .
-\end{equation}
-\end{linenomath}
-As a variance-covariance matrix, $\bm\Sigma$ must be positive
-semidefinite.  It is convenient to express the model in terms of a
-\emph{relative covariance factor}, $\bm\Lambda_\theta$, which is a
-$q\times q$ matrix, depending on the \emph{variance-component
-  parameter}, $\bm\theta$, and generating the symmetric $q\times q$
-variance-covariance matrix, $\bm\Sigma$, according to
-\begin{linenomath}
-\begin{equation}
-  \label{eq:relcovfac}
-  \bm\Sigma_\theta=\sigma^2\bm\Lambda_\theta\bm\Lambda_\theta\trans ,
-\end{equation}
-\end{linenomath}
-where $\sigma$ is the same scale factor as in the conditional
-distribution (\ref{eq:LMMcondY}).
-
-Although $q$, the number of columns in $\bm Z$ and the
-size of $\bm\Sigma_{\bm\theta}$, can be very large indeed, the
-dimension of $\bm\theta$ is small, frequently less than 10.
-
-In calls to the \code{lm} function for fitting linear models the form
-of the model matrix $\bm X$ is determined by the \code{formula} and
-\code{data} arguments. The right-hand side of the formula consists of
-one or more terms that each generate one or more columns in the model
-matrix, $\bm X$.  For \code{lmer} the formula language is extended to
-allow for random-effects terms that generate the model matrix $\bm Z$
-and the mapping from $\bm\theta$ to $\bm\Lambda_{\bm\theta}$.
-
-To understand why the formulation in equations \ref{eq:LMMcondY} and
-\ref{eq:LMMuncondB} is particularly useful, we first show that the
-profiled deviance (negative twice the log-likelihood) and the profiled
-REML criterion can be expressed as a function of $\bm\theta$ only.
-Furthermore these criteria can be evaluated quickly and accurately.
-
-\section{Profiling the deviance and the REML criterion}
-\label{sec:profdev}
-
-As stated above, $\bm\theta$ determines the $q\times q$ matrix
-$\bm\Lambda_\theta$ which, together with a value of $\sigma^2$,
-determines
-$\Var(\mc B)=\bm\Sigma_\theta=\sigma^2\bm\Lambda_\theta\bm\Lambda_\theta\trans$.
-If we define a \emph{spherical}%
-\footnote{$\mathcal{N}(\bm\mu,\sigma^2\bm I)$
-  distributions are called ``spherical'' because contours of the
-  probability density are spheres.}
-\emph{random effects} variable, $\mc U$, with distribution
-\begin{linenomath}
-\begin{equation}
-  \label{eq:sphericalRE}
-  \mc U\sim\mathcal{N}(\bm 0,\sigma^2\bm I_q),
-\end{equation}
-\end{linenomath}
-and set
-\begin{linenomath}
-\begin{equation}
-  \label{eq:BU}
-  \mc B=\bm\Lambda_\theta\mc U,
-\end{equation}
-\end{linenomath}
-then $\mc B$ will have the desired $\mathcal{N}(\bm
-0,\bm\Sigma_\theta)$ distribution.
-
-Although it may seem more natural to define $\mc U$ in terms of $\mc
-B$ we must write the relationship as in \eq{eq:BU} because
-$\bm\Lambda_\theta$ may be singular.  In fact, it is important to
-allow for $\bm\Lambda_\theta$ to be singular because situations
-where the parameter estimates, $\widehat{\bm\theta}$, produce a
-singular $\bm\Lambda_{\widehat{\theta}}$ do occur in practice.  And
-even if the parameter estimates do not correspond to a singular
-$\bm\Lambda_\theta$, it may be necessary to evaluate the estimation
-criterion at such values during the course of the numerical
-optimization of the criterion.
-
-The model can now be defined in terms of
-\begin{linenomath}
-\begin{equation}
-  \label{eq:LMMcondYU}
-  (\mc Y|\mc U=\bm u)\sim\mc{N}(\bm Z\bm\Lambda_\theta\bm u+\bm X\bm\beta,\sigma^2\bm I_n)
-\end{equation}
-\end{linenomath}
-producing the joint density function
-\begin{linenomath}
-\begin{equation}
-  \label{eq:jointDens}
-  \begin{aligned}
-    f_{\mc Y,\mc U}(\bm y,\bm u)&
-    =f_{\mc Y|\mc U}(\bm y|\bm u)\,f_{\mc U}(\bm u)\\
-    &=\frac{\exp(-\frac{1}{2\sigma^2}\|\bm y-\bm X\bm\beta-\bm Z\bLt\bm u\|^2)}
-    {(2\pi\sigma^2)^{n/2}}\;
-    \frac{\exp(-\frac{1}{2\sigma^2}\|\bm u\|^2)}{(2\pi\sigma^2)^{q/2}}\\
-    &=\frac{\exp(-
-      \left[\|\bm y-\bm X\bm\beta-\bm Z\bLt\bm u\|^2+\|\bm u\|^2\right]/(2\sigma^2))}
-    {(2\pi\sigma^2)^{(n+q)/2}} .
-  \end{aligned}
-\end{equation}
-\end{linenomath}
-
-The \emph{likelihood} of the parameters, $\bm\theta$, $\bm\beta$ and
-$\sigma^2$, given the observed data is the value of the
-marginal density of $\mc Y$, evaluated at $\yobs$.  That is
-\begin{linenomath}
-\begin{equation}
-  \label{eq:likelihoodLMM}
-  L(\bm\theta,\bm\beta,\sigma^2|\yobs)=\int_{\mathbb{R}^q}f_{\mc Y,\mc
-    U}(\yobs,\bm u)\,d\bm u .
-\end{equation}
-\end{linenomath}
-The integrand in \eq{eq:likelihoodLMM} is the \emph{unscaled
-  conditional density} of $\mc U$ given $\mc Y=\yobs$.  The
-conditional density of $\mc U$ given $\mc Y=\yobs$ is
-\begin{linenomath}
-\begin{equation}
-  \label{eq:condDens}
-  f_{\mc U|\mc Y}(\bm u|\yobs)=\frac{f_{\mc Y,\mc U}(\yobs,\bm u)}
-  {\int f_{\mc Y,\mc U}(\yobs,\bm u)\,d\bm u}
-\end{equation}
-\end{linenomath}
-which is, up to a scale factor, the joint density, $f_{\mc Y,\mc
-  U}(\yobs,\bm u)$.  The unscaled conditional density will be, up to a
-scale factor, a $q$-dimensional multivariate Gaussian with an integral
-that is easily evaluated if we know its mean and variance-covariance
-matrix.
-
-The conditional mean, $\bm\mu_{\mc U|\mc Y=\yobs}$, is also the mode
-of the conditional distribution.  Because a constant factor in a
-function does not affect the location of the optimum, we can determine
-the conditional mode, and hence the conditional mean, by maximizing
-the unscaled conditional density.  This takes the form of a
-\emph{penalized linear least squares} problem,
-\begin{linenomath}
-\begin{equation}
-  \label{eq:PLSprob}
-  \bm\mu_{\mc U|\mc Y=\yobs}=\arg\min_{\bm u}
-  \left(\left\|\yobs-\bm X\bm\beta-\bm Z\bLt\bm u\right\|^2 +
-    \left\|\bm u\right\|^2\right) .
-\end{equation}
-\end{linenomath}
-
-In the so-called ``pseudo-data'' approach to solving such penalized
-least squares problems we write the objective as a residual sum of
-squares for an extended response vector and model matrix
-\begin{linenomath}
-\begin{equation}
-  \label{eq:pseudoData}
-  \left\|\yobs-\bm X\bm\beta-\bm Z\bLt\bm u\right\|^2 +
-  \left\|\bm u\right\|^2 =
-  \left\| \begin{bmatrix}\yobs-\bm X\bm\beta\\\bm 0\end{bmatrix}-
-    \begin{bmatrix}\bm Z\bLt\\\bm I_q\end{bmatrix}
-    \bm u\right\|^2.
-\end{equation}
-\end{linenomath}
-The contribution to the residual sum of squares from the ``pseudo''
-observations appended to $\yobs-\bm X\bm\beta$ is exactly the penalty
-term, $\left\|\bm u\right\|^2$.
-
-From \eq{eq:pseudoData} we can see that the conditional mean satisfies
-\begin{linenomath}
-\begin{equation}
-  \label{eq:PLSsol}
-  \left(\bLt\trans\bm Z\trans\bm Z\bLt+\bm I_q\right)
-  \bm\mu_{\mc U|\mc Y=\yobs}=\bLt\trans\bm Z\trans(\yobs-\bm X\bm\beta),
-\end{equation}
-\end{linenomath}
-which would be interesting, but not terribly useful, were it not for
-the fact that we can determine the solution to \eq{eq:PLSsol} quickly
-and accurately, even when $q$, the size of the system to solve, is
-very large indeed.  As described in sect.~\ref{sect:PLSnumer}, the
-special structures of the matrices $\bm Z$ and $\bLt$ for
-mixed-effects models described by a mixed-model formula result in
-$\bLt\trans\bm Z\trans\bm Z\bLt$ being very sparse, so that the
-sparse Cholesky factorization~\citep[Ch.~4]{davis06:csparse_book} can
-be used to solve eqn.~\ref{eq:PLSsol}.
-
-We can write the general approach in terms of a fill-reducing
-permutation matrix $\bm P$~\citep[Ch.~7]{davis06:csparse_book}, which
-is determined by the pattern of non-zeros in $\bm Z\bLt$ but does
-not depend on $\bm\theta$ itself, and the \emph{sparse Cholesky
-  factor}, $\bm L_\theta$, which is a sparse, lower-triangular matrix
-such that
-\begin{linenomath}
-\begin{equation}
-  \label{eq:sparseChol}
-  \bm L_\theta\bm L\trans_\theta=\bm P
-  \left(\bLt\trans\bm Z\trans\bm Z\bLt+\bm I_q\right)
-  \bm P\trans.
-\end{equation}
-\end{linenomath}
-
-\subsection{Evaluating the likelihood}
-\label{sec:evallike}
-
-After solving eqn.~\ref{eq:PLSsol} for $\bm\mu_{\mc U|\mc Y=\yobs}$,
-the exponent of $f_{\mc Y,\mc U}(\yobs, \bm u)$
-(eqn.~\ref{eq:jointDens}) can be written
-\begin{linenomath}
-\begin{equation}
-  \label{eq:PLS}
-  \|\yobs-\bm X\bm\beta-\bm Z\bLt\bm u\|^2+\|\bm u\|^2=
-  r^2(\bm\theta,\bm\beta)+
-  \|\bm L_\theta\trans\bm P(\bm u-\bm\mu_{\mc U|\mc Y=\yobs})\|^2.
-\end{equation}
-\end{linenomath}
-where $r^2(\bm\theta,\bm\beta)=\|\yobs-\bm X\bm\beta-
-\bm Z\bm\Lambda_\theta\bm\mu_{\mc U|\mc Y=\yobs}\|^2+
-\|\bm\mu_{\mc U|\mc Y=\yobs}\|^2$, is the minimum
-penalized residual sum of squares for the given values of $\bm\theta$ and
-$\bm\beta$.
-
-With expression (\ref{eq:PLS}) and the change of variable $\bm v=\bm
-L_\theta\trans\bm P(\bm u-\bm\mu_{\mc U|\mc Y=\yobs})$, for which $d\bm
-v=\abs(|\bm L_\theta||\bm P|)\,d\bm u$, we have
-\begin{linenomath}
-\begin{equation}
-  \label{eq:intExpr}
-  \int\frac{\exp\left(\frac{-\|\bm L_\theta\trans\bm P(\bm u-\bm\mu_{\mc U|\mc Y})\|^2}
-      {2\sigma^2}\right)}
-  {(2\pi\sigma^2)^{q/2}}\,d\bm u \\
-  = \int\frac{\exp\left(\frac{-\|\bm
-        v\|^2}{2\sigma^2}\right)}{(2\pi\sigma^2)^{q/2}}\,\frac{d\bm
-    v}{\abs(|\bm L_\theta||\bm P|)} = \frac{1}{\abs(|\bm L_\theta||\bm
-    P|)}=\frac{1}{|\bm L_\theta|}
-\end{equation}
-\end{linenomath}
-because $\abs|\bm P|=1$ (one property of a permutation matrix is $|\bm
-P|=\pm1$) and $|\bm L_\theta|$, which, because $\bm L_\theta$ is triangular, is the
-product of its diagonal elements, all of which are positive, is
-positive.
-
-Using this expression we can write the deviance (negative twice the
-log-likelihood) as
-\begin{linenomath}
-\begin{equation}
-  \label{eq:deviance}
-  -2\ell(\bm\theta,\bm\beta,\sigma^2|\yobs)=-2\log L(\bm\theta,\bm\beta,\sigma^2|\yobs)=
-  n\log(2\pi\sigma^2)+\frac{r^2(\bm\theta,\bm\beta)}{\sigma^2}+
-  \log(|\bm L_\theta|^2)
-\end{equation}
-\end{linenomath}
-Because the dependence of \eq{eq:deviance} on $\sigma^2$ is
-straightforward, we can form the conditional estimate
-\begin{linenomath}
-\begin{equation}
-  \label{eq:conddev}
-  \widehat{\sigma^2}(\bm\theta,\bm\beta)=\frac{r^2(\bm\theta,\bm\beta)}{n} ,
-\end{equation}
-\end{linenomath}
-producing the \emph{profiled deviance}
-\begin{linenomath}
-\begin{equation}
-  \label{eq:profdev1}
-  -2\tilde{\ell}(\bm\theta,\bm\beta|\yobs)=\log(|\bm L_\theta|^2)+
-  n\left[1+\log\left(\frac{2\pi r^2(\bm\theta,\bm\beta)}{n}\right)\right] .
-\end{equation}
-\end{linenomath}
-
-However, observing that \eq{eq:profdev1} depends on $\bm\beta$ only
-through $r^2(\bm\theta,\bm\beta)$ allows us to perform a much greater
-simplification by ``profiling out'' the fixed-effects parameter,
-$\bm\beta$, from the evaluation of the deviance.  The conditional
-estimate, $\widehat{\bm\beta}_\theta$, is the value of $\bm\beta$ at
-the solution of the joint penalized least squares problem
-\begin{linenomath}
-\begin{equation}
-  \label{eq:jointPLS}
-  r^2_\theta=\min_{\bm u,\bm\beta}
-  \left(\left\|\bm y-\bm X\bm\beta-\bm Z\bLt\bm u\right\|^2 +
-    \left\|\bm u\right\|^2\right).
-\end{equation}
-\end{linenomath}
-
-The solutions, $\bm\mu_{\mc U|\mc Y=\yobs}$ and
-$\widehat{\bm\beta}_\theta$, of the joint penalized least squares
-problem (\ref{eq:jointPLS}) satisfy
-\begin{linenomath}
-\begin{equation}
-  \label{eq:jointPLSeqn}
-  \begin{bmatrix}
-    \bLt\trans\bm Z\trans\bm Z\bLt+\bm I_q & \bm
-    \bLt\trans\bm Z\trans\bm X\\
-    \bm X\trans\bm Z\bLt & \bm X\trans\bm X
-  \end{bmatrix}
-  \begin{bmatrix}
-    \bm\mu_{\mc U|\mc Y=\yobs}\\\widehat{\bm\beta}_\theta
-  \end{bmatrix}=
-  \begin{bmatrix}\bLt\trans\bm Z\trans\yobs\\\bm X\trans\yobs .
-  \end{bmatrix}
-\end{equation}
-\end{linenomath}
-
-After solving eqn.~\ref{eq:jointPLS} we can write the \emph{profiled
-  deviance} as
-\begin{linenomath}
-\begin{equation}
-  \label{eq:profdev2}
-  -2\tilde{\ell}(\bm\theta)=\log(|\bm L_\theta|^2)+
-  n\left[1+\log\left(\frac{2\pi r^2_\theta}{n}\right)\right],
-\end{equation}
-\end{linenomath}
-which is a function of $\bm\theta$ only.  Eqn.~\ref{eq:profdev2} is a
-remarkably compact expression for the deviance.
-
-\subsection{The profiled REML criterion}
-\label{sec:profiledREML}
-
-\citet{laird_ware_1982} show that the criterion to be optimized by the
-REML estimates can be expressed as
-\begin{linenomath}
-\begin{equation}
-  \label{eq:REMLcrit}
-  L_R(\bm\theta,\sigma^2|\yobs)=\int
-  L(\bm\theta,\bm\beta,\sigma^2|\yobs)\,d\bm\beta .
-\end{equation}
-\end{linenomath}
-
-Let us write a partitioned Cholesky factorization of the system matrix
-for the joint penalized least squares problem, eqn.~\ref{eq:jointPLSeqn},
-as
-\begin{linenomath}
-\begin{equation}
-  \label{eq:fulldecomp}
-  \begin{bmatrix}
-    \bm P\trans\bm L& \bm 0\\
-    \bm R_{ZX}\trans & \bm R_X\trans
-  \end{bmatrix}
-  \begin{bmatrix}
-    \bm L\trans\bm P & \bm R_{ZX}\\
-    \bm 0            & \bm R_X
-  \end{bmatrix}=
-  \begin{bmatrix}
-    \bLt\trans\bm Z\trans\bm Z\bLt+\bm I & \bLt\trans\bm Z\trans\bm X\\
-    \bm X\trans\bm Z\bLt       & \bm X\trans\bm X
-  \end{bmatrix} .
-\end{equation}
-\end{linenomath}
-When the fixed-effects model matrix $\bm X$ is dense, the $q\times p$
-matrix $\bm R_{ZX}$ and the $p\times p$ upper triangular matrix $\bm
-R_X$ will also be dense.  Occasionally it makes sense to store $\bm X$
-as a sparse matrix in which case $\bm R_{ZX}$ and $\bm R_X$ could also
-be sparse.
-
-Because the joint solutions, $\bm\mu_{\mc U|\mc Y=\yobs}$ and
-$\widehat{\bm\beta}_\theta$, to eqn.~\ref{eq:jointPLSeqn} allow us to express
-\begin{linenomath}
-\begin{multline}
-  \label{eq:PLS2}
-  \|\yobs-\bm X\bm\beta-\bm Z\bLt\bm u\|^2+\|\bm u\|^2=\\
-  r^2_\theta+
-  \left\|\bm L_\theta\trans\bm P\left[\bm u-\bm\mu_{\mc U|\mc Y=\yobs}-
-  \bm R_{ZX}(\bm\beta - \widehat{\bm\beta}_\theta)\right]\right\|^2+
-  \left\|\bm R_X(\bm\beta - \widehat{\bm\beta}_\theta)\right\|^2
-\end{multline}
-\end{linenomath}
-we can use a change of variable, similar to that in
-\eq{eq:intExpr}, to evaluate the profiled REML criterion.  On the
-deviance scale the criterion can be evaluated as
-\begin{linenomath}
-\begin{equation}
-  \label{eq:profiledREML}
-  -2\tilde{\ell}_R(\bm\theta)=\log(|\bm L|^2)+\log(|\bm R_X|^2)+
-  (n-p)\left[1+\log\left(\frac{2\pi r^2_\theta}{n-p}\right)\right] .
-\end{equation}
-\end{linenomath}
-
-\section{Random-effects terms and resulting matrix properties}
-\label{sec:PLSnumer}
-
-In fitting linear mixed models an instance of the penalized least
-squares (PLS) problem (\ref{eq:jointPLSeqn}) must be solved at each
-evaluation of the objective function - either the profiled deviance or
-the profiled REML criterion - during the optimization with respect to
-$\bm\theta$.  Because this operation must be performed many times it
-is worthwhile studying the properties of $\bm Z$ and $\bLt$ in general
-and for common special cases.
-
-\subsection{Random-effects terms in mixed-model formulas}
-\label{sec:LMMmatrix}
-
-A mixed-model formula incorporates $k\ge1$ random-effects terms of the
-form \code{(r|f)} where \code{r} is a linear model formula and
-\code{f} is an expression that evaluates to a factor, called the
-\code{grouping factor}, for the term.  In practice, fixed-effects
-model matrices and random-effects terms are evaluated with respect to
-a \emph{model frame}, ensuring that any expressions for grouping
-factors have been evaluated to factors and any unused levels of these
-factors have been dropped.  That is, $\ell_i$, the number of levels in
-the grouping factor for the $i$ random-effects term, is well-defined.
-The vector $\bm i_i$ of \emph{factor indices} for the $i$th term is an
-$n$-vector of values from $1,\dots,\ell_i$.  Let $\bm J_i$ be the
-$n\times \ell_i$ matrix of indicator columns for $\bm i_i$.
-
-When $k>1$ we order the random-effects terms so that
-$\ell_1\ge\ell_2\ge\dots\ge\ell_k$.
-
-Let $\bm X_i$ of size $n\times p_i$ be the model matrix of the linear
-model formula, \code{r}, from the $i$th random-effects term,
-$i=1,\dots,k$.  A term is said to be a \emph{scalar} random-effects
-term when $p_i=1$, otherwise it is \emph{vector-valued}.  For a
-\emph{simple, scalar} random-effects term of the form \code{(1|f)},
-$\bm X_i$ is the $n\times 1$ matrix of ones.
-
-The $i$th random effects term contributes $q_i=\ell_ip_i$ columns,
-which we will write as $\bm Z_i$, to
-the model matrix $\bm Z$.  Thus $q$, the number of columns in $\bm Z$
-and the dimension of the random variables $\mc{B}$ and $\mc{U}$, is
-\begin{linenomath}
-  \begin{equation}
-    \label{eq:qcalc}
-    q=\sum_{i=1}^k q_i = \sum_{i=1}^k \ell_i\,p_i .
-  \end{equation}
-\end{linenomath}
-The creation of $\bm Z_i$ from $\bm X_i$ and $\bm i_i$ is a
-straightforward concept that is somewhat difficult to describe.
-Consider $\bm Z_i$ as being further decomposed into $\ell_i$ blocks of
-$p_i$ columns.  The rows in the first block are the rows of $\bm X_i$
-multiplied by the 0/1 values in the first column of $\bm J_i$.
-Similarly for the subsequent blocks.
-
-In particular, for a simple, scalar term, $\bm Z_i$ is exactly $\bm
-J_i$, the matrix of indicator columns.  For other scalar terms, $\bm
-Z_i$ is formed by element-wise multiplication of the single column of
-$\bm X_j$ by each of the columns of $\bm J_i$.
-
-Because each $\bm Z_i,i=1,\dots,k$ is generated from indicator
-columns, its cross-product, $\bm Z_i\trans\bm Z_i$ is block-diagonal
-consisting of $\ell_i$ diagonal blocks each of size $p_i$.
-Note that this means that when $k=1$ (i.e. there is only one
-random-effects term), $\bm Z\trans\bm Z$ will be block diagonal.
-
-The $q\times q$ covariance factor, $\bLt$, is a block diagonal matrix
-whose $i$th diagonal block, $\bm\Lambda_i$, is of size
-$q_i,i=1,\dots,k$.  Furthermore, $\bm\Lambda_i$ is a \emph{homogeneous
-  block diagonal} matrix with each of the $\ell_i$ lower-triangular
-blocks on the diagonal being a copy of a $p_i\times p_i$ lower-triangular
-template, $\bm T_i$.  The covariance parameter vector, $\bm\theta$,
-consists of the elements in the lower triangles of the $\bm
-T_i,i=1,\dots,k$.  To provide a unique representation we require that
-the diagonal elements of the $\bm T_i,i=1,\dots,k$ be non-negative.
-
-\subsection{General computational structures and methods}
-\label{sec:generalcomputational}
-
-Although special structures can be exploited to reduce the
-computational load for certain classes of mixed-effects models, we
-will start with the general case to determine suitable data
-structures.  
-
-In a complex model fit to a large data set, the dominant calculation
-in the evaluation of the profiled deviance (eqn.~\ref{eq:profdev2}) or
-REML criterion (eqn.~\ref{eq:profiledREML}) is the sparse Cholesky
-factorization (eqn.~\ref{eq:sparseChol}).  For fitting other types of
-models, such as GLMMs, we consider a more general problem of
-\begin{linenomath}
-\begin{equation}
-  \label{eq:sparseChol}
-  \bm L_\theta\bm L\trans_\theta=\bm P
-  \left(\bLt\trans\bm Z\trans\bm W\bm Z\bLt+\bm I_q\right)
-  \bm P\trans.
-\end{equation}
-\end{linenomath}
-where the $n\times n$ matrix $\bm W$ may change during the iterations
-to optimize the objective.
-
-When fitting a GLMM, $\bm W$ is the diagonal matrix of case weights in
-the Penalized Iteratively-Reweighted Least Squares (PIRLS) algorithm.
-Another example of a varying $\bm W$ would be fitting an LMM in which
-the conditional distribution of the response incorporates a
-correlation matrix that itself depends on
-parameters~\citep[Ch.~5]{R:Pinheiro+Bates:2000}.
-
-In any case, we will assume that $\bm W$ is symmetric and 
-positive semidefinite so that some type of ``square root'' factor,
-$\bm W^{1/2}$, satisfying
-\begin{linenomath}
-  \begin{equation}
-    \label{eq:Wsqrt}
-    \bm W = \bm W^{1/2}\left(\bm W^{1/2}\right)\trans,
-  \end{equation}
-\end{linenomath}
-is available.
-
-We wish to use structures and algorithms that allow us to take a new
-value of $\bm\theta$ or a new value of $\bm W^{1/2}$ or both and
-evaluate the $\bm L_\theta$ (eqn.~\ref{eq:sparseChol})
-efficiently.  The key to doing so is the special structure of
-$\bLt\trans\bm Z\trans\bm W^{1/2}$.  To understand why this matrix,
-and not its transpose, is of interest we describe the sparse
-matrix structures used in \proglang{Julia} and in the \pkg{Matrix}
-package for \proglang{R}.
-
-\subsubsection{Compressed sparse column-oriented matrices}
-\label{sec:CSCmats}
-
-Dense matrices are stored in \proglang{Julia} and in \proglang{R} as a
-one-dimensional array of contiguous storage locations addressed in
-\emph{column-major order}.  This means that elements in the same
-column are in adjacent storage locations whereas elements in the same
-row can be widely separated in memory.  For this reason, algorithms
-that work column-wise are preferred to those that work row-wise.
-
-Although a sparse matrix can be stored in a \emph{triplet} format,
-where the row position, column position and element value of each
-nonzero is recorded, the preferred storage forms for actual
-computation with sparse matrices are compressed sparse column (CSC) or
-compressed sparse row (CSR)~\citep[Ch.~2]{davis06:csparse_book}.
-
-The sparse matrix capabilities in \proglang{Julia} and in the
-\pkg{Matrix} package for \proglang{R} are based on SuiteSparse % figure out how to cite this
-which uses the CSC storage format.  In this format the non-zero
[TRUNCATED]

To get the complete diff run:
    svnlook diff /svnroot/lme4 -r 1848