[Lme4-commits] r1431 - in pkg/lme4Eigen: . inst/doc vignettes

Wed Nov 2 19:18:14 CET 2011

Author: dmbates
Date: 2011-11-02 19:18:14 +0100 (Wed, 02 Nov 2011)
New Revision: 1431

Added:
   pkg/lme4Eigen/vignettes/
   pkg/lme4Eigen/vignettes/PLSvGLS.Rnw
   pkg/lme4Eigen/vignettes/PLSvGLS.pdf
   pkg/lme4Eigen/vignettes/Theory.Rnw
   pkg/lme4Eigen/vignettes/Theory.pdf
   pkg/lme4Eigen/vignettes/lme4.bib
Removed:
   pkg/lme4Eigen/inst/doc/PLSvGLS.Rnw
   pkg/lme4Eigen/inst/doc/PLSvGLS.pdf
   pkg/lme4Eigen/inst/doc/Theory.Rnw
   pkg/lme4Eigen/inst/doc/Theory.pdf
   pkg/lme4Eigen/inst/doc/lme4.bib
Log:
Move vignette sources to a vignettes directory per recommendations for R-2.14.0


Deleted: pkg/lme4Eigen/inst/doc/PLSvGLS.Rnw
===================================================================

--- pkg/lme4Eigen/inst/doc/PLSvGLS.Rnw	2011-10-18 20:34:01 UTC (rev 1430)
+++ pkg/lme4Eigen/inst/doc/PLSvGLS.Rnw	2011-11-02 18:18:14 UTC (rev 1431)
@@ -1,432 +0,0 @@
-\documentclass[12pt]{article}
-\usepackage{Sweave,amsmath,amsfonts,bm}
-\usepackage[authoryear,round]{natbib}
-\bibliographystyle{plainnat}
-\DeclareMathOperator \tr {tr}
-\DefineVerbatimEnvironment{Sinput}{Verbatim}
-{formatcom={\vspace{-1ex}},fontshape=sl,
-  fontfamily=courier,fontseries=b, fontsize=\footnotesize}
-\DefineVerbatimEnvironment{Soutput}{Verbatim}
-{formatcom={\vspace{-1ex}},fontfamily=courier,fontseries=b,%
-  fontsize=\footnotesize}
-%%\VignetteIndexEntry{PLS vs GLS for LMMs}
-%%\VignetteDepends{lme4Eigen}
-\title{Penalized least squares versus generalized least squares
-  representations of linear mixed models}
-\author{Douglas Bates\\Department of Statistics\\%
-  University of Wisconsin -- Madison}
-\begin{document}
-\SweaveOpts{engine=R,eps=FALSE,pdf=TRUE,strip.white=true,keep.source=TRUE}
-\SweaveOpts{include=FALSE}
-\setkeys{Gin}{width=\textwidth}
-\newcommand{\code}[1]{\texttt{\small{#1}}}
-\newcommand{\package}[1]{\textsf{\small{#1}}}
-\newcommand{\trans}{\ensuremath{^\prime}}
-<<preliminaries,echo=FALSE,results=hide>>=
-options(width=65,digits=5)
-#library(lme4)
-@
-
-\maketitle
-
-\begin{abstract}
-  The methods in the \code{lme4} package for \code{R} for fitting
-  linear mixed models are based on sparse matrix methods, especially
-  the Cholesky decomposition of sparse positive-semidefinite matrices,
-  in a penalized least squares representation of the conditional model
-  for the response given the random effects.  The representation is
-  similar to that in Henderson's mixed-model equations.  An
-  alternative representation of the calculations is as a generalized
-  least squares problem.  We describe the two representations, show
-  the equivalence of the two representations and explain why we feel
-  that the penalized least squares approach is more versatile and more
-  computationally efficient.
-\end{abstract}
-
-\section{Definition of the model}
-\label{sec:Definition}
-
-We consider linear mixed models in which the random effects are
-represented by a $q$-dimensional random vector, $\bm{\mathcal{B}}$, and
-the response is represented by an $n$-dimensional random vector,
-$\bm{\mathcal{Y}}$.  We observe a value, $\bm y$, of the response.  The
-random effects are unobserved.
-
-For our purposes, we will assume a ``spherical'' multivariate normal
-conditional distribution of $\bm{\mathcal{Y}}$, given
-$\bm{\mathcal{B}}$.  That is, we assume the variance-covariance matrix
-of $\bm{\mathcal{Y}}|\bm{\mathcal{B}}$ is simply $\sigma^2\bm I_n$,
-where $\bm I_n$ denotes the identity matrix of order $n$.  (The term
-``spherical'' refers to the fact that contours of the conditional
-density are concentric spheres.)
-
-The conditional mean,
-$\mathrm{E}[\bm{\mathcal{Y}}|\bm{\mathcal{B}}=\bm b]$, is a linear
-function of $\bm b$ and the $p$-dimensional fixed-effects parameter,
-$\bm\beta$,
-\begin{equation}
-  \label{eq:condmean}
-  \mathrm{E}[\bm{\mathcal{Y}}|\bm{\mathcal{B}}=\bm b]=
-  \bm X\bm\beta+\bm Z\bm b ,
-\end{equation}
-where $\bm X$ and $\bm Z$ are known model matrices of sizes $n\times
-p$ and $n\times q$, respectively. Thus
-\begin{equation}
-  \label{eq:yconditional}
-  \bm{\mathcal{Y}}|\bm{\mathcal{B}}\sim
-  \mathcal{N}\left(\bm X\bm\beta+\bm Z\bm b,\sigma^2\bm I_n\right) .
-\end{equation}
-
-The marginal distribution of the random effects
-\begin{equation}
-  \label{eq:remargin}
-  \bm{\mathcal{B}}\sim\mathcal{N}\left(\bm 0,\sigma^2\bm\Sigma(\bm\theta)\right)
-\end{equation}
-is also multivariate normal, with mean $\bm 0$ and variance-covariance
-matrix $\sigma^2\bm\Sigma(\bm\theta)$.  The scalar, $\sigma^2$, in
-(\ref{eq:remargin}) is the same as the $\sigma^2$ in
-(\ref{eq:yconditional}).  As described in the next section, the
-relative variance-covariance matrix, $\bm\Sigma(\bm\theta)$, is a
-$q\times q$ positive semidefinite matrix depending on a parameter
-vector, $\bm\theta$.  Typically the dimension of $\bm\theta$ is much,
-much smaller than $q$.
-
-
-\subsection{Variance-covariance of the random effects}
-\label{sec:revarcov}
-
-The relative variance-covariance matrix, $\bm\Sigma(\bm\theta)$, must
-be symmetric and positive semidefinite (i.e. $\bm x\trans\bm\Sigma\bm
-x\ge0,\forall\bm x\in\mathbb{R}^q$).  Because the estimate of a
-variance component can be zero, it is important to allow for a
-semidefinite $\bm\Sigma$.  We do not assume that $\bm\Sigma$ is
-positive definite (i.e. $\bm x\trans\bm\Sigma\bm x>0,\forall\bm
-x\in\mathbb{R}^q, \bm x\ne\bm 0$) and, hence, we cannot assume that $\bm\Sigma^{-1}$
-exists.
-
-A positive semidefinite matrix such as $\bm\Sigma$ has a Cholesky
-decomposition of the so-called ``LDL$\trans$'' form.  We use a
-slight modification of this form,
-\begin{equation}
-  \label{eq:TSdef}
-  \bm\Sigma(\bm\theta)=\bm T(\bm\theta)\bm S(\bm\theta)\bm
-  S(\bm\theta)\bm T(\bm\theta)\trans ,
-\end{equation}
-where $\bm T(\bm\theta)$ is a unit lower-triangular $q\times q$ matrix
-and $\bm S(\bm\theta)$ is a diagonal $q\times q$ matrix with
-nonnegative diagonal elements that act as scale factors.  (They are
-the relative standard deviations of certain linear combinations of the
-random effects.)  Thus, $\bm T$ is a triangular matrix and $\bm S$ is
-a scale matrix.
-
-Both $\bm T$ and $\bm S$ are highly patterned.
-
-\subsection{Orthogonal random effects}
-\label{sec:orthogonal}
-
-Let us define a $q$-dimensional random vector, $\bm{\mathcal{U}}$, of
-orthogonal random effects with marginal distribution
-\begin{equation}
-  \label{eq:Udist}
-  \bm{\mathcal{U}}\sim\mathcal{N}\left(\bm 0,\sigma^2\bm I_q\right)
-\end{equation}
-and, for a given value of $\bm\theta$, express $\bm{\mathcal{B}}$ as a
-linear transformation of $\bm{\mathcal{U}}$,
-\begin{equation}
-  \label{eq:UtoB}
-  \bm{\mathcal{B}}=\bm T(\bm\theta)\bm S(\bm\theta)\bm{\mathcal{U}} .
-\end{equation}
-Note that the transformation (\ref{eq:UtoB}) gives the desired
-distribution of $\bm{\mathcal{B}}$ in that
-$\mathrm{E}[\bm{\mathcal{B}}]=\bm T\bm
-S\mathrm{E}[\bm{\mathcal{U}}]=\bm 0$ and
-\begin{displaymath}
-  \mathrm{Var}(\bm{\mathcal{B}})=\mathrm{E}[\bm{\mathcal{B}}\bm{\mathcal{B}}\trans]
-  =\bm T\bm S\mathrm{E}[\bm{\mathcal{U}}\bm{\mathcal{U}}\trans]\bm
-  S\bm T\trans=\sigma^2\bm T\bm S\bm S\bm T\trans=\bm\Sigma .
-\end{displaymath}
-
-The conditional distribution, $\bm{\mathcal{Y}}|\bm{\mathcal{U}}$, can
-be derived from $\bm{\mathcal{Y}}|\bm{\mathcal{B}}$ as
-\begin{equation}
-  \label{eq:YgivenU}
-  \bm{\mathcal{Y}}|\bm{\mathcal{U}}\sim\mathcal{N}\left(\bm X\bm\beta+\bm
-  Z\bm T\bm S\bm u, \sigma^2\bm I\right)
-\end{equation}
-We will write the transpose of $\bm Z\bm T\bm S$ as $\bm A$.  Because
-the matrices $\bm T$ and $\bm S$ depend on the parameter $\bm\theta$,
-$\bm A$ is also a function of $\bm\theta$,
-\begin{equation}
-  \label{eq:Adef}
-  \bm A\trans(\bm\theta)=\bm Z\bm T(\bm\theta)\bm S(\bm\theta) .
-\end{equation}
-
-In applications, the matrix $\bm Z$ is derived from indicator columns
-of the levels of one or more factors in the data and is a
-\emph{sparse} matrix, in the sense that most of its elements are zero.
-The matrix $\bm A$ is also sparse.  In fact, the structure of $\bm T$
-and $\bm S$ are such that pattern of nonzeros in $\bm A$ is that same
-as that in $\bm Z\trans$.
-
-\subsection{Sparse matrix methods}
-\label{sec:sparseMatrix}
-
-The reason for defining $\bm A$ as the transpose of a model matrix is
-because $\bm A$ is stored and manipulated as a sparse matrix.  In the
-compressed column-oriented storage form that we use for sparse
-matrices, there are advantages to storing $\bm A$ as a matrix of $n$
-columns and $q$ rows.  In particular, the CHOLMOD sparse matrix
-library allows us to evaluate the sparse Cholesky factor, $\bm
-L(\bm\theta)$, a sparse lower triangular matrix that satisfies
-\begin{equation}
-  \label{eq:SparseChol}
-  \bm L(\bm\theta)\bm L(\bm\theta)\trans=
-  \bm P\left(\bm A(\bm\theta)\bm A(\bm\theta)\trans+\bm I_q\right)\bm P\trans ,
-\end{equation}
-directly from $\bm A(\bm\theta)$.
-
-In (\ref{eq:SparseChol}) the $q\times q$ matrix $\bm P$ is a
-``fill-reducing'' permutation matrix determined from the pattern of
-nonzeros in $\bm Z$.  $\bm P$ does not affect the statistical theory
-(if $\bm{\mathcal{U}}\sim\mathcal{N}(\bm 0,\sigma^2\bm I)$ then $\bm
-P\trans\bm{\mathcal{U}}$ also has a $\mathcal{N}(\bm 0,\sigma^2\bm I)$
-distribution because $\bm P\bm P\trans=\bm P\trans\bm P=\bm I$) but,
-because it affects the number of nonzeros in $\bm L$, it can have a
-tremendous impact on the amount storage required for $\bm L$ and the
-time required to evaluate $\bm L$ from $\bm A$.  Indeed, it is
-precisely because $\bm L(\bm\theta)$ can be evaluated quickly, even
-for complex models applied the large data sets, that the \code{lmer}
-function is effective in fitting such models.
-
-\section{The penalized least squares approach to linear mixed models}
-\label{sec:Penalized}
-
-Given a value of $\bm\theta$ we form $\bm A(\bm\theta)$ from which we
-evaluate $\bm L(\bm\theta)$.  We can then solve for the $q\times p$
-matrix, $\bm R_{\bm{ZX}}$, in the system of equations
-\begin{equation}
-  \label{eq:RZX}
-  \bm L(\theta)\bm R_{\bm{ZX}}=\bm P\bm A(\bm\theta)\bm X
-\end{equation}
-and for the $p\times p$ upper triangular matrix, $\bm R_{\bm X}$, satisfying
-\begin{equation}
-  \label{eq:RX}
-  \bm R_{\bm X}\trans\bm R_{\bm X}=
-  \bm X\trans\bm X-\bm R_{\bm{ZX}}\trans\bm R_{\bm{ZX}}
-\end{equation}
-
-The conditional mode, $\tilde{\bm u}(\bm\theta)$, of the
-orthogonal random effects and the conditional mle,
-$\widehat{\bm\beta}(\bm\theta)$, of the fixed-effects parameters
-can be determined simultaneously as the solutions to a penalized least
-squares problem,
-\begin{equation}
-  \label{eq:PLS}
-  \begin{bmatrix}
-    \tilde{\bm u}(\bm\theta)\\
-    \widehat{\bm\beta}(\bm\theta)
-  \end{bmatrix}=
-  \arg\min_{\bm u,\bm\beta}\left\|
-    \begin{bmatrix}\bm y\\\bm 0\end{bmatrix} -
-    \begin{bmatrix}
-      \bm A\trans\bm P\trans & \bm X\\
-      \bm I_q & \bm 0
-    \end{bmatrix}
-    \begin{bmatrix}\bm u\\\bm\beta\end{bmatrix} ,
-  \right\|^2
-\end{equation}
-for which the solution satisfies
-\begin{equation}
-  \label{eq:PLSsol}
-  \begin{bmatrix}
-    \bm P\left(\bm A\bm A\trans+\bm I\right)\bm P\trans &
-    \bm P\bm A\bm X\\
-    \bm X\trans\bm A\trans\bm P\trans & \bm X\trans\bm X
-  \end{bmatrix}
-  \begin{bmatrix}
-    \tilde{\bm u}(\bm\theta)\\
-    \widehat{\bm\beta}(\bm\theta)
-  \end{bmatrix}=
-  \begin{bmatrix}\bm P\bm A\bm y\\\bm X\trans\bm y\end{bmatrix} .
-\end{equation}
-The Cholesky factor of the system matrix for the PLS problem can be
-expressed using $\bm L$, $\bm R_{\bm Z\bm X}$ and $\bm R_{\bm X}$, because
-\begin{equation}
-  \label{eq:PLSChol}
-  \begin{bmatrix}
-    \bm P\left(\bm A\bm A\trans+\bm I\right)\bm P\trans & \bm P\bm
-    A\bm X\\
-    \bm X\trans\bm A\trans\bm P\trans & \bm X\trans\bm X
-  \end{bmatrix} =
-  \begin{bmatrix}
-    \bm L & \bm 0\\
-    \bm R_{\bm Z\bm X}\trans & \bm R_{\bm X}\trans
-  \end{bmatrix}
-  \begin{bmatrix}
-    \bm L\trans & \bm R_{\bm Z\bm X}\\
-    \bm 0 & \bm R_{\bm X}
-  \end{bmatrix} .
-\end{equation}
-
-In the \code{lme4} package the \code{"mer"} class is the
-representation of a mixed-effects model.  Several slots in this class
-are matrices corresponding directly to the matrices in the preceding
-equations. The \code{A} slot contains the sparse matrix $\bm
-A(\bm\theta)$ and the \code{L} slot contains the sparse Cholesky
-factor, $\bm L(\bm\theta)$.  The \code{RZX} and \code{RX} slots contain
-$\bm R_{\bm Z\bm X}(\bm\theta)$ and $\bm R_{\bm X}(\bm\theta)$,
-respectively, stored as dense matrices.
-
-It is not necessary to solve for $\tilde{\bm u}(\bm\theta)$ and
-$\widehat{\bm\beta}(\bm\theta)$ to evaluate the \emph{profiled}
-log-likelihood, which is the log-likelihood evaluated $\bm\theta$ and
-the conditional estimates of the other parameters,
-$\widehat{\bm\beta}(\bm\theta)$ and $\widehat{\sigma^2}(\bm\theta)$.
-All that is needed for evaluation of the profiled log-likelihood is
-the (penalized) residual sum of squares, $r^2$, from the penalized
-least squares problem (\ref{eq:PLS}) and the determinant $|\bm A\bm
-A\trans+\bm I|=|\bm L|^2$. Because $\bm L$ is triangular, its
-determinant is easily evaluated as the product
-of its diagonal elements.  Furthermore, $|\bm L|^2 > 0$ because it is
-equal to $|\bm A\bm A\trans + \bm I|$, which is the determinant of a
-positive definite matrix.  Thus $\log(|\bm L|^2)$ is both well-defined
-and easily calculated from $\bm L$.
-
-The profiled deviance (negative twice the profiled log-likelihood), as
-a function of $\bm\theta$ only ($\bm\beta$ and $\sigma^2$ at their
-conditional estimates), is
-\begin{equation}
-  \label{eq:profiledDev}
-  d(\bm\theta|\bm y)=\log(|\bm L|^2)+n\left(1+\log(r^2)+\frac{2\pi}{n}\right)
-\end{equation}
-The maximum likelihood estimates, $\widehat{\bm\theta}$, satisfy
-\begin{equation}
-  \label{eq:thetamle}
-  \widehat{\bm\theta}=\arg\min_{\bm\theta}d(\bm\theta|\bm y)
-\end{equation}
-Once the value of $\widehat{\bm\theta}$ has been determined, the mle
-of $\bm\beta$ is evaluated from (\ref{eq:PLSsol}) and the mle of
-$\sigma^2$ as $\widehat{\sigma^2}(\bm\theta)=r^2/n$.
-
-Note that nothing has been said about the form of the sparse model
-matrix, $\bm Z$, other than the fact that it is sparse.  In contrast
-to other methods for linear mixed models, these results apply to
-models where $\bm Z$ is derived from crossed or partially crossed
-grouping factors, in addition to models with multiple, nested grouping
-factors.
-
-The system (\ref{eq:PLSsol}) is similar to Henderson's ``mixed-model
-equations'' (reference?).  One important difference between
-(\ref{eq:PLSsol}) and Henderson's formulation is that Henderson
-represented his system of equations in terms of $\bm\Sigma^{-1}$ and,
-in important practical examples, $\bm\Sigma^{-1}$ does not exist at
-the parameter estimates.  Also, Henderson assumed that equations like
-(\ref{eq:PLSsol}) would need to be solved explicitly and, as we have
-seen, only the decomposition of the system matrix is needed for
-evaluation of the profiled log-likelihood.  The same is true of the
-profiled the logarithm of the REML criterion, which we define later.
-
-\section{The generalized least squares approach to linear mixed models}
-\label{sec:GLS}
-
-Another common approach to linear mixed models is to derive the
-marginal variance-covariance matrix of $\bm{\mathcal{Y}}$ as a
-function of $\bm\theta$ and use that to determine the conditional
-estimates, $\widehat{\bm\beta}(\bm\theta)$, as the solution of a
-generalized least squares (GLS) problem.  In the notation of
-\S\ref{sec:Definition} the marginal mean of $\bm{\mathcal{Y}}$ is
-$\mathrm{E}[\bm{\mathcal{Y}}]=\bm X\bm\beta$ and the marginal
-variance-covariance matrix is
-\begin{equation}
-  \label{eq:marginalvarcovY}
-  \mathrm{Var}(\bm{\mathcal{Y}})=\sigma^2\left(\bm I_n+\bm Z\bm T\bm
-    S\bm S\bm T\trans\bm Z\trans\right)=\sigma^2\left(\bm I_n+\bm
-    A\trans\bm A\right) =\sigma^2\bm V(\bm\theta) ,
-\end{equation}
-where $\bm V(\bm\theta)=\bm I_n+\bm A\trans\bm A$.
-
-The conditional estimates of $\bm\beta$ are often written as
-\begin{equation}
-  \label{eq:condbeta}
-  \widehat{\bm\beta}(\bm\theta)=\left(\bm X\trans\bm V^{-1}\bm
-    X\right)^{-1}\bm X\trans\bm V^{-1}\bm y
-\end{equation}
-but, of course, this formula is not suitable for computation.  The
-matrix $\bm V(\bm\theta)$ is a symmetric $n\times n$ positive definite
-matrix and hence has a Cholesky factor.  However, this factor is
-$n\times n$, not $q\times q$, and $n$ is always larger than $q$ ---
-sometimes orders of magnitude larger.  Blithely writing a formula in
-terms of $\bm V^{-1}$ when $\bm V$ is $n\times n$, and $n$ can be in
-the millions does not a computational formula make.
-
-\subsection{Relating the GLS approach to the Cholesky factor}
-\label{sec:GLStoL}
-
-We can use the fact that
-\begin{equation}
-  \label{eq:Vinv}
-  \bm V^{-1}(\bm\theta)=\left(\bm I_n+\bm A\trans\bm A\right)^{-1}=
-  \bm I_n-\bm A\trans\left(\bm I_q+\bm A\bm A\trans\right)^{-1}\bm A
-\end{equation}
-to relate the GLS problem to the PLS problem.  One way to establish
-(\ref{eq:Vinv}) is simply to show that the product
-\begin{multline*}
-  (\bm I+\bm A\trans\bm A)\left(\bm I-\bm A\trans\left(\bm I+\bm A\bm
-      A\trans\right)^{-1}\bm A\right)\\
-  \begin{aligned}
-    =&\bm I+\bm A\trans\bm A-\bm A\trans\left(\bm I+\bm A\bm
-      A\trans\right)
-    \left(\bm I+\bm A\bm A\trans\right)^{-1}\bm A\\
-    =&\bm I+\bm A\trans\bm A-\bm A\trans\bm A\\
-    =&\bm I .
-  \end{aligned}
-\end{multline*}
-Incorporating the permutation matrix $\bm P$ we have
-\begin{equation}
-  \label{eq:PLA}
-  \begin{aligned}
-    \bm V^{-1}(\bm\theta)=&\bm I_n-\bm A\trans\bm P\trans\bm P\left(\bm
-      I_q+\bm A\bm A\trans\right)^{-1}\bm P\trans\bm P\bm A\\
-    =&\bm I_n-\bm A\trans\bm P\trans(\bm L\bm L\trans)^{-1}\bm P\bm A\\
-    =&\bm I_n-\left(\bm L^{-1}\bm P\bm A\right)\trans\bm L^{-1}\bm P\bm A .
-  \end{aligned}
-\end{equation}
-Even in this form we would not want to routinely evaluate $\bm
-V^{-1}$.  However, (\ref{eq:PLA}) does allow us to simplify many
-common expressions.
-
-For example, the variance-covariance of the estimator $\widehat{\bm
-  \beta}$, conditional on $\bm\theta$ and $\sigma$, can be expressed as
-\begin{equation}
-  \label{eq:varcovbeta}
-  \begin{aligned}
-  \sigma^2\left(\bm X\trans\bm V^{-1}(\bm\theta)\bm X\right)^{-1}
-  =&\sigma^2\left(\bm X\trans\bm X-\left(\bm L^{-1}\bm P\bm
-      A\bm X\right)\trans\left(\bm L^{-1}\bm P\bm A\bm
-      X\right)\right)^{-1}\\
-  =&\sigma^2\left(\bm X\trans\bm X-\bm R_{\bm Z\bm X}\trans\bm
-    R_{\bm Z\bm X}\right)^{-1}\\
-    =&\sigma^2\left(\bm R_{\bm X}\trans\bm R_{\bm X}\right)^{-1} .
-  \end{aligned}
-\end{equation}
-
-\section{Trace of the ``hat'' matrix}
-\label{sec:hatTrace}
-
-Another calculation that is of interest to some is the
-the trace of the ``hat'' matrix, which can be written as
-\begin{multline}
-  \label{eq:hatTrace}
-  \tr\left(\begin{bmatrix}\bm A\trans&\bm X\end{bmatrix}
-    \left(\begin{bmatrix}\bm A\trans&\bm X\\\bm I&\bm0\end{bmatrix}\trans
-      \begin{bmatrix}\bm A\trans&\bm X\\\bm I&\bm0\end{bmatrix}\right)^{-1}
-    \begin{bmatrix}\bm A\\\bm X\trans\end{bmatrix}\right)\\
-  = \tr\left(\begin{bmatrix}\bm A\trans&\bm X\end{bmatrix}
-    \left(\begin{bmatrix}\bm L&\bm0\\
-        \bm R_{\bm{ZX}}\trans&\bm R_{\bm X}\trans\end{bmatrix}
-      \begin{bmatrix}\bm L\trans&\bm R_{\bm{ZX}}\\
-        \bm0&\bm R_{\bm X}\end{bmatrix}\right)^{-1}
-    \begin{bmatrix}\bm A\\\bm X\trans\end{bmatrix}\right)
-\end{multline}
-
-\end{document}

Deleted: pkg/lme4Eigen/inst/doc/PLSvGLS.pdf
===================================================================
(Binary files differ)

Deleted: pkg/lme4Eigen/inst/doc/Theory.Rnw
===================================================================
--- pkg/lme4Eigen/inst/doc/Theory.Rnw	2011-10-18 20:34:01 UTC (rev 1430)
+++ pkg/lme4Eigen/inst/doc/Theory.Rnw	2011-11-02 18:18:14 UTC (rev 1431)
@@ -1,1260 +0,0 @@
-\documentclass[12pt]{article}
-\usepackage{Sweave,amsmath,amsfonts,bm}
-\usepackage[authoryear,round]{natbib}
-\bibliographystyle{plainnat}
-\DefineVerbatimEnvironment{Sinput}{Verbatim}
-{formatcom={\vspace{-1ex}},fontshape=sl,
-  fontfamily=courier,fontseries=b, fontsize=\footnotesize}
-\DefineVerbatimEnvironment{Soutput}{Verbatim}
-{formatcom={\vspace{-1ex}},fontfamily=courier,fontseries=b,%
-  fontsize=\footnotesize}
-%%\VignetteIndexEntry{Computational Methods}
-%%\VignetteDepends{lme4Eigen}
-\title{Computational methods for mixed models}
-\author{Douglas Bates\\Department of Statistics\\%
-  University of Wisconsin -- Madison}
-\begin{document}
-\SweaveOpts{engine=R,eps=FALSE,pdf=TRUE,strip.white=true,keep.source=TRUE}
-\SweaveOpts{include=FALSE}
-\setkeys{Gin}{width=\textwidth}
-\newcommand{\code}[1]{\texttt{\small{#1}}}
-\newcommand{\package}[1]{\textsf{\small{#1}}}
-\newcommand{\trans}{\ensuremath{^\prime}}
-\renewcommand{\vec}{\operatorname{vec}}
-\newcommand{\diag}{\operatorname{diag}}
-\newcommand{\bc}[1]{\ensuremath{\bm{\mathcal{#1}}}}
-<<preliminaries,echo=FALSE,results=hide>>=
-options(width=65,digits=5)
-#library(lme4)
-@
-\maketitle
-\begin{abstract}
-  The \package{lme4} package provides R functions to fit and analyze
-  several different types of mixed-effects models, including linear
-  mixed models, generalized linear mixed models and nonlinear mixed
-  models.  In this vignette we describe the formulation of these
-  models and the computational approach used to evaluate or
-  approximate the log-likelihood of a model/data/parameter value
-  combination.
-\end{abstract}
-
-\section{Introduction}
-\label{sec:intro}
-
-The \package{lme4} package provides \code{R} functions to fit and analyze
-linear mixed models, generalized linear mixed models and nonlinear
-mixed models.  These models are called \emph{mixed-effects models} or,
-more simply, \emph{mixed models} because they incorporate both
-\emph{fixed-effects} parameters, which apply to an entire population
-or to certain well-defined and repeatable subsets of a population, and
-\emph{random effects}, which apply to the particular experimental
-units or observational units in the study.  Such models are also
-called \emph{multilevel} models because the random effects represent
-levels of variation in addition to the per-observation noise term that
-is incorporated in common statistical models such as linear
-regression models, generalized linear models and nonlinear regression
-models.
-
-We begin by describing common properties of these mixed models and the
-general computational approach used in the \package{lme4} package. The
-estimates of the parameters in a mixed model are determined as the
-values that optimize an objective function --- either the likelihood
-of the parameters given the observed data, for maximum likelihood
-(ML) estimates, or a related objective function called the REML
-criterion.  Because this objective function must be evaluated at many
-different values of the model parameters during the optimization
-process, we focus on the evaluation of the objective function and a
-critical computation in this evalution --- determining the solution to a
-penalized, weighted least squares (PWLS) problem.
-
-The dimension of the solution of the PWLS problem can be very large,
-perhaps in the millions.  Furthermore, such problems must be solved
-repeatedly during the optimization process to determine parameter
-estimates.  The whole approach would be infeasible were it not for the
-fact that the matrices determining the PWLS problem are sparse and we
-can use sparse matrix storage formats and sparse matrix computations
-\citep{davis06:csparse_book}. In particular, the whole computational
-approach hinges on the extraordinarily efficient methods for
-determining the Cholesky decomposition of sparse, symmetric,
-positive-definite matrices embodied in the CHOLMOD library of C
-functions \citep{Cholmod}.
-
-% The three types of mixed models -- linear, generalized linear and
-% nonlinear -- share common characteristics in that the model is
-% specified in whole or in part by a \emph{mixed model formula} that
-% describes a \emph{linear predictor} and a variance-covariance
-% structure for the random effects.  In the next section we describe
-% the mixed model formula and the forms of these matrices.  The
-% following section presents a general formulation of the Laplace
-% approximation to the log-likelihood of a mixed model.
-
-% In subsequent sections we describe computational methods for specific
-% kinds of mixed models.  In particular, we should how a profiled
-% log-likelihood for linear mixed models, and for some nonlinear mixed
-% models, can be evaluated exactly.
-
-In the next section we describe the general form of the mixed models
-that can be represented in the \package{lme4} package and the
-computational approach embodied in the package.  In the following
-section we describe a particular form of mixed model,
-called a linear mixed model, and the computational details for those
-models.  In the fourth section we describe computational methods for
-generalized linear mixed models, nonlinear mixed models and
-generalized nonlinear mixed models.
-
-
-\section{Formulation of mixed models}
-\label{sec:form-mixed-models}
-
-A mixed-effects model incorporates two vector-valued random variables:
-the $n$-dimensional response vector, $\bc Y$, and the $q$-dimensional
-random effects vector, $\bc B$. We observe the value, $\bm y$, of $\bc Y$.
-We do not observe the value of $\bc B$.
-
-The random variable $\bc Y$ may be continuous or discrete.  That is,
-the observed data, $\bm y$, may be on a continuous scale or they may
-be on a discrete scale, such as binary responses or responses
-representing a count. In our formulation, the random variable $\bc B$
-is always continous.
-
-We specify a mixed model by describing the unconditional distribution
-of $\bc B$ and the conditional distribution $(\bc Y|\bc B=\bm b)$.
-
-\subsection{The unconditional distribution of $\bc B$}
-\label{sec:uncond-distr-B}
-
-In our formulation, the unconditional distribution of $\bc B$ is
-always a $q$-dimensional multivariate Gaussian (or ``normal'')
-distribution with mean $\bm 0$ and with a parameterized covariance
-matrix,
-\begin{equation}
-  \label{eq:2}
-  \bc B\sim\mathcal{N}\left(\bm 0,\sigma^2\bm\Lambda(\bm\theta)
-  \bm\Lambda\trans(\bm\theta)\right) .
-\end{equation}
-
-The scalar, $\sigma$, in (\ref{eq:2}), is called the \emph{common scale
-  parameter}.  As we will see later, not all types of mixed models
-incorporate this parameter. We will include $\sigma^2$ in the general
-form of the unconditional distribution of $\bc B$ with the
-understanding that, in some models, $\sigma\equiv 1$.
-
-The $q\times q$ matrix $\bm\Lambda(\bm\theta)$, which is a left factor
-of the covariance matrix (when $\sigma=1$) or the relative covariance
-matrix (when $\sigma\ne 1$), depends on an $m$-dimensional parameter
-$\bm\theta$.  Typically $m\ll q$; in the examples we show below it is
-always the case that $m<5$, even when $q$ is in the thousands.  The
-fact that $m$ is very small is important because, as we shall see,
-determining the parameter estimates in a mixed model can be expressed
-as an optimization problem with respect to $\bm\theta$ only.
-
-The parameter $\bm\theta$ may be, and typically is, subject to
-constraints. For ease of computation, we require that the constraints
-be expressed as ``box'' constraints of the form
-$\theta_{iL}\le\theta_i\le\theta_{iU},i=1,\dots,m$ for constants
-$\theta_{iL}$ and $\theta_{iU}, i=1,\dots,m$.  We shall write the set
-of such constraints as $\bm\theta_L\le\bm\theta\le\bm\theta_R$.  The matrix
-$\bm\Lambda(\bm\theta)$ is required to be non-singular
-(i.e.{} invertible) when $\bm\theta$ is not on the boundary.
-
-\subsection{The conditional distribution,  $(\bc Y|\bc B=\bm b)$}
-\label{sec:cond-distr-YB}
-
-The conditional distribution, $(\bc Y|\bc B=\bm b)$, must satisfy:
-\begin{enumerate}
-\item The conditional mean, $\bm\mu_{\bc Y|\bc B}(\bm b) =
-  \mathrm{E}[\bc Y|\bc B=\bm b]$, depends on $\bm b$ only through the
-  value of the \emph{linear predictor}, $\bm Z\bm b+\bm X\bm\beta$,
-  where $\bm\beta$ is the $p$-dimensional \emph{fixed-effects}
-  parameter vector and the \emph{model matrices}, $\bm Z$ and $\bm X$,
-  are fixed matrices of the appropriate dimension.  That is, the
-  two model matrices must have the same number of rows and must have
-  $q$ and $p$ columns, respectively.  The number of rows
-  in $\bm Z$ and $\bm X$ is a multiple of $n$, the dimension of $\bm y$.
-\item The scalar distributions, $(\mathcal{Y}_i|\bc B=\bm
-  b),i=1,\dots,n$, all have the same form and are completely
-  determined by the conditional mean, $\bm\mu_{\bc Y|\bc B}(\bm b)$
-  and, at most, one additional parameter, $\sigma$, which is the
-  common scale parameter.
-\item The scalar distributions, $(\mathcal{Y}_i|\bc B=\bm
-  b),i=1,\dots,n$, are independent.  That is, the components of
-  $\bc Y$ are \emph{conditionally independent} given $\bc B$.
-\end{enumerate}
-
-An important special case of the conditional distribution is the
-multivariate Gaussian distribution of the form
-\begin{equation}
-  \label{eq:1}
-  (\bc Y|\bc B=\bm b)\sim\mathcal{N}(\bm Z\bm b+\bm
-  X\bm\beta,\sigma^2\bm I_n)
-\end{equation}
-where $\bm I_n$ denotes the identity matrix of size $n$.
-In this case the conditional mean, $\bm\mu_{\bc Y|\bc B}(\bm b)$, is
-exactly the linear predictor, $\bm Z\bm b+\bm X\bm\beta$, a situation
-we will later describe as being an ``identity link'' between the
-conditional mean and the linear predictor.  Models with conditional
-distribution (\ref{eq:1}) are called \emph{linear mixed models}.
-
-\subsection{A change of variable to ``spherical'' random effects}
-\label{sec:change-vari-spher}
-
-Because the conditional distribution $(\bc Y|\bc B=\bm b)$ depends on
-$\bm b$ only through the linear predictor, it is easy to express the
-model in terms of a linear transformation of $\bc B$.  We define the
-linear transformation from a $q$-dimensional ``spherical'' Gaussian
-random variable, $\bc U$, to $\bc B$ as
-\begin{equation}
-  \label{eq:3}
-  \bc B=\bm\Lambda(\bm\theta)\bc U,\quad
-  \bc U\sim\mathcal{N}(\bm 0,\sigma^2\bm I_q).
-\end{equation}
-(The term ``spherical'' refers to the fact that contours of constant
-probability density for $\bc U$ are spheres centered at the mean ---
-in this case, $\bm0$.)
-
-When $\bm\theta$ is not on the boundary this is an invertible
-transformation.  When $\bm\theta$ is on the boundary the
-transformation can fail to be invertible.  However, we will only need
-to be able to express $\bc B$ in terms of $\bc U$ and that
-transformation is well-defined, even when $\bm\theta$ is on the
-boundary.
-
-The linear predictor, as a function of $\bm u$, is
-\begin{equation}
-  \label{eq:4}
-  \bm\gamma(\bm u)=\bm Z\bm\Lambda(\bm\theta)\bm u
-  + \bm X\bm\beta.
-\end{equation}
-When we wish to emphasize the role of the model parameters,
-$\bm\theta$ and $\bm\beta$, in the formulation of $\bm\gamma$, we will
-write the linear predictor as $\bm\gamma(\bm u,\bm\theta,\bm\beta)$.
-
-\subsection{The conditional density $(\bc U|\bc Y=\bm y)$}
-\label{sec:cond-dens-bc}
-
-Because we observe $\bm y$ and do not observe $\bm b$ or $\bm u$, the
-conditional distribution of interest, for the purposes of statistical
-inference, is $(\bc U|\bc Y=\bm y)$ (or, equivalently, $(\bc B|\bc
-Y=\bm y)$).  This conditional distribution is always a continuous
-distribution with conditional probability density $f_{\bc U|\bc
-  Y}(\bm u|\bm y)$.
-
-We can evaluate $f_{\bc U|\bc Y}(\bm u|\bm y)$ , up to a constant, as
-the product of the unconditional density, $f_{\bc U}(\bm u)$, and the
-conditional density (or the probability mass function, whichever is
-appropriate), $f_{\bc Y|\bc U}(\bm y|\bm u)$.  We write this
-unnormalized conditional density as
-\begin{equation}
-  \label{eq:5}
-  h(\bm u|\bm y,\bm\theta,\bm\beta,\sigma) =
-  f_{\bc Y|\bc U}(\bm y|\bm u,\bm\theta,\bm\beta,\sigma)
-  f_{\bc U}(\bm u|\sigma) .
-\end{equation}
-
-We say that $h$ is the ``unnormalized'' conditional density because
-all we know is that the conditional density is proportional to $h(\bm
-u|\bm y,\bm\theta,\bm\beta,\sigma)$.  To obtain the conditional
-density we must normalize $h$ by dividing by the value of the integral
-\begin{equation}
-  \label{eq:6}
-  L(\bm\theta,\bm\beta,\sigma|\bm y) =
-  \int_{\mathbb{R}^q}h(\bm u|\bm y,\bm\theta,\bm\beta,\sigma)\,d\bm u .
-\end{equation}
-We write the value of the integral (\ref{eq:6}) as
-$L(\bm\theta,\bm\beta,\sigma|\bm y)$ because it is exactly the
-\emph{likelihood} of the parameters $\bm\theta$, $\bm\beta$ and
-$\sigma$, given the observed data $\bm y$.  The \emph{maximum
-  likelihood (ML) estimates} of these parameters are the values that
-maximize $L$.
-
-\subsection{Determining the ML estimates}
-\label{sec:DeterminingML}
-
-The general problem of maximizing $L(\bm\theta,\bm\beta,\sigma|\bm y)$
-with respect to $\bm\theta$, $\bm\beta$ and $\sigma$ can be formidable
-because each evaluation of this function involves a potentially
-high-dimensional integral and because the dimension of $\bm\beta$ can
-be large.  However, this general optimization problem can be split
-into manageable subproblems.  Given a value of $\bm\theta$ we can
-determine the \emph{conditional mode}, $\tilde{\bm u}(\bm\theta)$, of
-$\bm u$ and the \emph{conditional estimate},
-$\tilde{\bm\beta}(\bm\theta)$ simultaneously using \emph{penalized,
-  iteratively re-weighted least squares} (PIRLS).  The conditional
-mode and the conditional estimate are defined as
-\begin{equation}
-  \label{eq:condmode}
-  \begin{bmatrix}
-    \tilde{\bm u}(\bm\theta)\\
-    \tilde{\bm\beta}(\bm\theta)
-  \end{bmatrix}=\arg\max_{\bm u,\bm\beta}h(\bm u|\bm
-  y,\bm\theta,\bm\beta,\sigma) .
-\end{equation}
-(It may look as if we have missed the dependence on $\sigma$ on the
[TRUNCATED]

To get the complete diff run:
    svnlook diff /svnroot/lme4 -r 1431