[Depmix-commits] r310 - papers/jss

Wed Aug 19 13:48:29 CEST 2009

Author: ingmarvisser
Date: 2009-08-19 13:48:28 +0200 (Wed, 19 Aug 2009)
New Revision: 310

Modified:
   papers/jss/article.tex
Log:
... jss final version, to be submitted.

Modified: papers/jss/article.tex
===================================================================

--- papers/jss/article.tex	2009-08-13 12:47:29 UTC (rev 309)
+++ papers/jss/article.tex	2009-08-19 11:48:28 UTC (rev 310)
@@ -36,10 +36,10 @@
 	multivariate data with distributions from the \code{glm} family,
 	the logistic multinomial, or the multivariate normal distribution.
 	Other distributions can be added easily, and an example is
-	provided with the exgaus distribution.  Parameter are estimated by
+	provided with the exgaus distribution.  Parameters are estimated by
 	the EM algorithm or, when (linear) constraints are imposed on the
 	parameters, by direct numerical optimization with the
-	\pkg{Rdonlp2} routine.  }
+	\pkg{Rdonlp2} routine.}
 
 \Keywords{hidden Markov model, dependent mixture model, mixture model}
 
@@ -105,20 +105,20 @@
 \citealp{Ghysels1994}).  Further applications include speech
 recognition \citep{Rabiner1989}, EEG analysis \citep{Rainer2000}, and
 genetics \citep{Krogh1998}.  In these latter areas of application,
-latent Markov models are usually referred to as hidden Markov models. 
-See for example \citet{Fruhwirth2006} for an overview of hidden Markov models 
-with extensions. Further examples of applications can be found in e.g.\
-\citet[][chapter~1]{Cappe2005}.  
+latent Markov models are usually referred to as hidden Markov models.
+See for example \citet{Fruhwirth2006} for an overview of hidden Markov
+models with extensions.  Further examples of applications can be found
+in e.g.\ \citet[][chapter~1]{Cappe2005}.
 
 The \pkg{depmixS4} package was motivated by the fact that while Markov
 models are used commonly in the social sciences, no comprehensive
 package was available for fitting such models.  Existing software for
 estimating Markovian models include Panmark \citep{Pol1996}, and for
-latent class models Latent Gold \citep{Vermunt2003}. These programs
+latent class models Latent Gold \citep{Vermunt2003}.  These programs
 lack a number of important features, besides not being freely
-available. There are currently some packages in \proglang{R} that
+available.  There are currently some packages in \proglang{R} that
 handle hidden Markov models but they lack a number of features that we
-needed in our research. In particular, \pkg{depmixS4} was designed to
+needed in our research.  In particular, \pkg{depmixS4} was designed to
 meet the following goals:
 
 \begin{enumerate}
@@ -140,30 +140,30 @@
 
 Although \pkg{depmixS4} was designed to deal with longitudinal or time
 series data, for say $T>100$, it can also handle the limit case when
-$T=1$.  In this case, there are no time dependencies between
-observed data and the model reduces to a finite mixture or
-latent class model. While there are specialized packages to
-deal with mixture data, as far as we know these 
-don't allow the inclusion of covariates on the prior probabilities of
-class membership. The possibility to estimate the effects of covariates on 
-prior and transition probabilities is a distinguishing feature of 
-\pkg{depmixS4}. In the next section, we provide an outline of the model and 
-likelihood equations.
+$T=1$.  In this case, there are no time dependencies between observed
+data and the model reduces to a finite mixture or latent class model.
+While there are specialized packages to deal with mixture data, as far
+as we know these don't allow the inclusion of covariates on the prior
+probabilities of class membership.  The possibility to estimate the
+effects of covariates on prior and transition probabilities is a
+distinguishing feature of \pkg{depmixS4}.  In the next section, we
+provide an outline of the model and likelihood equations.
 
 
 \section{The dependent mixture model}
 
-The data considered here have the general form $\vc{O}_{1:T}= (O_{1}^{1}, \ldots,
-O_{1}^{m}$, $O_{2}^{1}, \ldots, O_{2}^{m}$, \ldots, $O_{T}^{1},
-\ldots, O_{T}^{m})$ for an $m$-variate time series of length $T$.  As
-an example, consider a time series of responses generated by a single
-participant in a psychological response time experiment. The data consists of three
-variables, response time, response accuracy, and a covariate which is a pay-off
-variable reflecting the relative reward for speeded and/or accurate responding. 
-These variables are measured on 168, 134 and 137 occasions respectively (the 
-first part of this series is plotted in
-Figure~\ref{fig:speed}). These data
-are more fully described in \citet{Dutilh2009}. 
+The data considered here have the general form $\vc{O}_{1:T}=
+(O_{1}^{1}, \ldots, O_{1}^{m}$, $O_{2}^{1}, \ldots, O_{2}^{m}$,
+\ldots, $O_{T}^{1}, \ldots, O_{T}^{m})$ for an $m$-variate time series
+of length $T$.  As an example, consider a time series of responses
+generated by a single participant in a psychological response time
+experiment.  The data consists of three variables, response time,
+response accuracy, and a covariate which is a pay-off variable
+reflecting the relative reward for speeded and/or accurate responding.
+These variables are measured on 168, 134 and 137 occasions
+respectively (the first part of this series is plotted in
+Figure~\ref{fig:speed}).  These data are more fully described in
+\citet{Dutilh2009}.
 
 \begin{figure}[htbp]
   \begin{center}
@@ -175,26 +175,26 @@
 \end{figure}
 
 The latent Markov model is usually associated with data of this type,
-in particular for multinomially distributed responses. % response variables are considered.
-However, commonly employed estimation procedures
-\citep[e.g.,][]{Pol1996}, are not suitable for long time series due to
-underflow problems. In contrast, the hidden Markov model is typically
-only used for `long' univariate time series \citep[][chapter~1]{Cappe2005}. 
-We use the term ``dependent mixture model'' because one of the authors 
-(Ingmar Visser) thought it
-was time for a new name to relate these models\footnote{Only later did I
-find out that \citet{Leroux1992} already coined the term dependent
-mixture models in an application with hidden Markov mixtures of
-Poisson count data.}.
+in particular for multinomially distributed responses.  However,
+commonly employed estimation procedures \citep[e.g.,][]{Pol1996}, are
+not suitable for long time series due to underflow problems.  In
+contrast, the hidden Markov model is typically only used for `long'
+univariate time series \citep[][chapter~1]{Cappe2005}.  We use the
+term ``dependent mixture model'' because one of the authors (Ingmar
+Visser) thought it was time for a new name to relate these
+models\footnote{Only later did I find out that \citet{Leroux1992}
+already coined the term dependent mixture models in an application
+with hidden Markov mixtures of Poisson count data.}.
 
-The fundamental assumption of a dependent mixture model is that at any time 
-point, the observations are distributed as a mixture with $n$ components 
-(or states), and that time-dependencies between the observations are due to 
-time-dependencies between the mixture components (i.e., transition probabilities
-between the components). These latter dependencies are assumed to
-follow a first-order Markov process. In the models we are considering here, the 
-mixture distributions, the initial mixture probabilities and transition 
-probabilities can all dependent on covariates $\vc{z}_t$. 
+The fundamental assumption of a dependent mixture model is that at any
+time point, the observations are distributed as a mixture with $n$
+components (or states), and that time-dependencies between the
+observations are due to time-dependencies between the mixture
+components (i.e., transition probabilities between the components).
+These latter dependencies are assumed to follow a first-order Markov
+process.  In the models we are considering here, the mixture
+distributions, the initial mixture probabilities and transition
+probabilities can all depend on covariates $\vc{z}_t$.
 
 %transition probability functions $a_{ij}$ and the initial state
 %probability functions $\greekv{\pi}$ may depend on covariates as well
@@ -202,10 +202,10 @@
 %\citet{Fruhwirth2006} for an overview of hidden Markov models with
 %extensions.
 
-In a dependent mixture model, the joint likelihood of observations $\vc{O}_{1:T}$ and 
-latent states $\vc{S}_{1:T} = (S_1,\ldots,S_T)$, given model parameters 
-$\greekvec{\theta}$ and covariates $\vc{z}_{1:T} = (\vc{z}_1,\ldots,\vc{z}_T)$, 
-can be written as
+In a dependent mixture model, the joint likelihood of observations
+$\vc{O}_{1:T}$ and latent states $\vc{S}_{1:T} = (S_1,\ldots,S_T)$,
+given model parameters $\greekv{\theta}$ and covariates $\vc{z}_{1:T}
+= (\vc{z}_1,\ldots,\vc{z}_T)$, can be written as:
 \begin{equation}
 	\Prob(\vc{O}_{1:T},\vc{S}_{1:T}|\greekv{\theta},\vc{z}_{1:T}) =  
 	\pi_{i}(\vc{z}_1) \vc{b}_{S_t}(\vc{O}_1|\vc{z}_{1})
@@ -215,22 +215,22 @@
 \begin{enumerate}
 	
 	\item $S_{t}$ is an element of $\mathcal{S}=\{1\ldots n\}$, a set
-	of $n$ latent classes or states. %; we write for short $S_{i}$ to
-	%denote $S_{t}=i$.
+	of $n$ latent classes or states.
 	
-	\item $\pi_{i}(\vc{z}_1) = \Prob(S_1 = i|\vc{z}_1)$,
-	giving the probability of class/state $i$ at time $t=1$ with
-	covariate $\vc{z}_1$.
+	\item $\pi_{i}(\vc{z}_1) = \Prob(S_1 = i|\vc{z}_1)$, giving the
+	probability of class/state $i$ at time $t=1$ with covariate
+	$\vc{z}_1$.
 	
 	\item $a_{ij}(\vc{z}_t) = \Prob(S_{t+1}=j|S_{t}=i,\vc{z}_t)$,
 	provides the probability of a transition from state $i$ to state
 	$j$ with covariate $\vc{z}_t$,
 	
 	\item $\vc{b}_{S_t}$ is a vector of observation densities
-	$b_{j}^k(\vc{z}_t) = \Prob(O_{t}^k|S_t = j, \vc{z}_t)$ that provide the
-	conditional densities of observations $O_{t}^k$ associated with
-	latent class/state $j$ and covariate $\vc{z}_t$, $j=1, \ldots, n$,
-	$k=1, \ldots, m$. 
+	$b_{j}^k(\vc{z}_t) = \Prob(O_{t}^k|S_t = j, \vc{z}_t)$ that
+	provide the conditional densities of observations $O_{t}^k$
+	associated with latent class/state $j$ and covariate $\vc{z}_t$,
+	$j=1, \ldots, n$, $k=1, \ldots, m$.
+	
 \end{enumerate}
 
 %In the next paragraphs, the likelihood
@@ -291,23 +291,21 @@
 %density function for $O_{t}$ conditional on the hidden state $S_{t}$.
 
 For the example data above, $b_j^k$ could be a Gaussian distribution
-function for the response time variable, and a Bernoulli distribution for the
-accuracy variable.  In the models we are considering here, both the
-transition probability functions $a_{ij}$ and the initial state
-probability functions $\greekv{\pi}$ may depend on covariates as well
-as the response distributions $b_{j}^{k}$.  See for example
-\citet{Fruhwirth2006} for an overview of hidden Markov models with
-extensions.
+function for the response time variable, and a Bernoulli distribution
+for the accuracy variable.  In the models we are considering here,
+both the transition probability functions $a_{ij}$ and the initial
+state probability functions $\greekv{\pi}$ may depend on covariates as
+well as the response distributions $b_{j}^{k}$.
 
 \subsection{Likelihood}
 
-To obtain maximum likelihood estimates of the model parameters, we need the
-marginal likelihood of the observations. For hidden Markov models, this 
-marginal (log-)likelihood is usually computed by the
+To obtain maximum likelihood estimates of the model parameters, we
+need the marginal likelihood of the observations.  For hidden Markov
+models, this marginal (log-)likelihood is usually computed by the
 so-called forward-backward algorithm \citep{Baum1966,Rabiner1989}, or
-rather by the forward part of this algorithm. \cite{Lystig2002}
+rather by the forward part of this algorithm.  \cite{Lystig2002}
 changed the forward algorithm in such a way as to allow computing the
-gradients of the log-likelihood at the same time. They start by
+gradients of the log-likelihood at the same time.  They start by
 rewriting the likelihood as follows (for ease of exposition the
 dependence on the model parameters and covariates is dropped here):
 \begin{equation}
@@ -350,13 +348,14 @@
 \subsection{Parameter estimation}
 
 Parameters are estimated in \pkg{depmixS4} using the EM algorithm or
-through the use of a general Newton-Raphson optimizer.  In the EM algorithm, 
-parameters are estimated by iteratively maximising the 
-expected joint likelihood of the parameters given the observations and states. 
-Let $\greekv{\theta} = (\greekv{\theta}_1, \greekv{\theta}_2,\greekv{\theta}_3)$
-be the general parameter vector consisting of three subvectors with parameters 
-for the prior model, transition model, and response model respectively. The 
-joint log-likelihood can be written as
+through the use of a general Newton-Raphson optimizer.  In the EM
+algorithm, parameters are estimated by iteratively maximising the
+expected joint likelihood of the parameters given the observations and
+states.  Let $\greekv{\theta} = (\greekv{\theta}_1,
+\greekv{\theta}_2,\greekv{\theta}_3)$ be the general parameter vector
+consisting of three subvectors with parameters for the prior model,
+transition model, and response model respectively.  The joint
+log-likelihood can be written as
 \begin{equation}
 \log \Prob(\vc{O}_{1:T}, \vc{S}_{1:T}|\vc{z}_{1:T},\greekv{\theta}) = \log 
 \Prob(S_1|\vc{z}_{1},\greekv{\theta}_1) 
@@ -366,7 +365,7 @@
 This likelihood depends on the unobserved states $\vc{S}_{1:T}$. In the 
 Expectation step, we replace these with their expected values given a set of 
 (initial) parameters $\greekv{\theta}' = (\greekv{\theta}'_1, 
-\greekv{\theta}'_2,\greekv{\theta}'_3)$ and observations $O_{1:T}$. The expected 
+\greekv{\theta}'_2,\greekv{\theta}'_3)$ and observations $\vc{O}_{1:T}$. The expected 
 log likelihood 
 \begin{equation}
 Q(\greekv{\theta},\greekv{\theta}') = E_{\greekv{\theta}'} 
@@ -411,11 +410,11 @@
 
 
 The EM algorithm however has some drawbacks.  First, it can be slow to
-converge towards the end of optimization.  Second,
-applying constraints to parameters can be problematic; in particular,
-EM can lead to wrong parameter estimates when applying constraints.
-Hence, in \pkg{depmixS4}, EM is used by default in unconstrained
-models, but otherwise, direct optimization is done using \pkg{Rdonlp2}
+converge towards the end of optimization.  Second, applying
+constraints to parameters can be problematic; in particular, EM can
+lead to wrong parameter estimates when applying constraints.  Hence,
+in \pkg{depmixS4}, EM is used by default in unconstrained models, but
+otherwise, direct optimization is done using \pkg{Rdonlp2}
 \citep{Tamura2009,Spellucci2002}, because it handles general linear
 (in)equality constraints, and optionally also non-linear constraints.
 
@@ -436,30 +435,30 @@
 	
 	\item  model fitting with function \code{fit}
 \end{enumerate}
-We have separated the stages of model specification and model fitting because
-fitting large models can be fairly time-consuming and it is hence useful to be
-able to check the model specification before actually fitting the model. 
+We have separated the stages of model specification and model fitting
+because fitting large models can be fairly time-consuming and it is
+hence useful to be able to check the model specification before
+actually fitting the model.
 
-\subsection{Example data: speed}
+\subsection[Example data: speed]{Example data: \code{speed}}
 
-Throughout this article a data set called \code{speed} is used.  As already 
-indicated in the Introduction, it
-consists of three time series with three variables: response time,
-accuracy, and a covariate Pacc which defines the relative pay-off for
-speeded and accurate responding.  The participant in this experiment
-switches between fast responding at chance level and relatively slower
-responding at a high level of accuracy.  Interesting hypotheses to
-test are: is the switching regime symmetric?  Is there evidence for
-two states or does one state suffice?  Is the guessing state actually
-a guessing state, i.e., is the probability of a correct response at
-chance level (0.5)?
+Throughout this article a data set called \code{speed} is used.  As
+already indicated in the Introduction, it consists of three time
+series with three variables: response time, accuracy, and a covariate
+Pacc which defines the relative pay-off for speeded versus accurate
+responding.  The participant in this experiment switches between fast
+responding at chance level and relatively slower responding at a high
+level of accuracy.  Interesting hypotheses to test are: is the
+switching regime symmetric?  Is there evidence for two states or does
+one state suffice?  Is the guessing state actually a guessing state,
+i.e., is the probability of a correct response at chance level (0.5)?
 
 \subsection{A simple model}
 
-A dependent mixture model is defined by the number of states and the initial 
-state, state transition, and response distribution functions. A default 
-dependent mixture model can be created with the \code{depmix}-function as 
-follows:
+A dependent mixture model is defined by the number of states and the
+initial state, state transition, and response distribution functions.
+A dependent mixture model can be created with the
+\code{depmix}-function as follows:
 
 \begin{CodeChunk}
 \begin{CodeInput}
@@ -471,10 +470,10 @@
 The \code{depmix} function returns an object of class \code{depmix}
 which contains the model specification (and not a fitted model!).
 Note also that start values for the transition parameters are provided
-in this call using the \code{trstart} argument. At this time, the package does 
-not provide automatic starting values. 
+in this call using the \code{trstart} argument.  At this time, the
+package does not provide automatic starting values.
 
-The so-defined models needs to be \code{fit}ted with the following
+The so-defined model needs to be \code{fit}ted with the following
 line of code:
 \begin{CodeChunk}
 \begin{CodeInput}
@@ -494,17 +493,23 @@
 \end{CodeInput}
 \begin{CodeOutput}
 Convergence info: Log likelihood converged to within tol. 
-'log Lik.' -84.34175 (df=7)
-AIC:  182.6835 
-BIC:  211.275 
+'log Lik.' -84.34 (df=7)
+AIC:  182.68 
+BIC:  211.28 
 \end{CodeOutput}
 \end{CodeChunk}
-These statistics can also be extracted using \code{logLik},
-\code{AIC} and \code{BIC}, respectively.
+These statistics can also be extracted using \code{logLik}, \code{AIC}
+and \code{BIC}, respectively.  By comparison, a 1-state model for
+these data, i.e. assuming there is no mixture, has a log-likelihood of
+-305.33, and 614.66, and 622.83 for the AIC and BIC respectively.
+Hence, the 2-state model fits the data much better than the 1-state
+model.  Note that the 1-state model can be specified using \code{mod <-
+depmix(rt~1, data=speed, nstates=1)}, although this model is trivial
+as it will simply return the mean and the sd of the rt variable.
 
-The \code{summary} method of \code{fit}ted models provides the parameter
-estimates, first for the prior probabilities model, second for the
-transition model, and third for the response models.
+The \code{summary} method of \code{fit}ted models provides the
+parameter estimates, first for the prior probabilities model, second
+for the transition model, and third for the response models.
 
 \begin{CodeChunk}
 \begin{CodeInput}
@@ -515,68 +520,67 @@
 Model of type multinomial, formula: ~1
 Coefficients: 
      [,1]      [,2]
-[1,]    0 -11.25688
+[1,]    0 -11.25
 Probalities at zero values of the covariates.
-0.999987 1.291798e-05 
+0.999 1.292e-05 
 
 Transition model for state (component) 1 
 Model of type multinomial, formula: ~1
 Coefficients: 
-[1]  0.000000 -2.392455
+[1]  0.000 -2.392
 Probalities at zero values of the covariates.
-0.9162501 0.08374986 
+0.9163 0.0837 
 
 Transition model for state (component) 2 
 Model of type multinomial, formula: ~1
 Coefficients: 
-[1] 0.000000 2.139255
+[1] 0.000 2.139
 Probalities at zero values of the covariates.
-0.1053396 0.8946604 
+0.105 0.895 
 
 Response model(s) for state 1 
 
 Response model for response 1 
 Model of type gaussian, formula: rt ~ 1
 Coefficients: 
-[1] 6.385492
-sd  0.2439376 
+[1] 6.385
+sd  0.244
 
 Response model(s) for state 2 
 
 Response model for response 1 
 Model of type gaussian, formula: rt ~ 1
 Coefficients: 
-[1] 5.511151
-sd  0.1926063 
+[1] 5.511
+sd  0.193 
 \end{CodeOutput}
 \end{CodeChunk}
 
-Note that, since no further arguments were specified, the initial state, 
-state transition and response distributions were set to their defaults 
-(multinomial distributions for the first two, and Gaussian distributions for the
-response distributions). 
+Note that, since no further arguments were specified, the initial
+state, state transition and response distributions were set to their
+defaults (multinomial distributions for the first two, and Gaussian
+distributions for the response distributions).
 
 \subsection{Covariates on transition parameters}
 
-By default, the transition probabilities and the initial state probabilities
-are parameterized using the multinomial
-logistic model. More precisely, each row of the transition matrix is
-parameterized by a baseline category logistic multinomial, 
-meaning that the parameter for the
-base category is fixed at zero \citep[see][p.\ 267
-ff., for multinomial logistic models and various
-parameterizations]{Agresti2002}. The default baseline category is the 
-first state. Hence, for example, for a 3-state model, the initial state probability
-model would have three parameters of which the first is fixed at zero
-and the other two are freely estimated. 
+By default, the transition probabilities and the initial state
+probabilities are parameterized using the multinomial logistic model.
+More precisely, each row of the transition matrix is parameterized by
+a baseline category logistic multinomial, meaning that the parameter
+for the base category is fixed at zero \citep[see][p.\ 267 ff., for
+multinomial logistic models and various
+parameterizations]{Agresti2002}.  The default baseline category is the
+first state.  Hence, for example, for a 3-state model, the initial
+state probability model would have three parameters of which the first
+is fixed at zero and the other two are freely estimated.
 
-The multinomial logistic model allows us to include covariates 
-on the initial state and transition probabilities. \citet{Chung2007} discuss a 
-related latent transition model for repeated measurement
-data ($T=2$) using logistic regression on the transition 
-parameters; they rely on Bayesian methods of estimation. 
-Covariates on the transition probabilities can be specified using a
-one-sided formula as in the following example:
+The multinomial logistic model allows us to include covariates on the
+initial state and transition probabilities.  \citet{Chung2007} discuss
+a related latent transition model for repeated measurement data
+($T=2$) using logistic regression on the transition parameters; they
+rely on Bayesian methods of estimation.  Covariates on the transition
+probabilities can be specified using a one-sided formula as in the
+following example:
 \begin{CodeChunk}
 \begin{CodeInput}
 > set.seed(1)
@@ -595,37 +599,37 @@
 Model of type multinomial, formula: ~scale(Pacc)
 Coefficients: 
      [,1]       [,2]
-[1,]    0 -0.9215182
-[2,]    0  1.8649734
+[1,]    0 -0.9215
+[2,]    0  1.865
 Probalities at zero values of the covariates.
-0.7153513 0.2846487 
+0.7154 0.2846
 
 Transition model for state (component) 2 
 Model of type multinomial, formula: ~scale(Pacc)
 Coefficients: 
      [,1]     [,2]
-[1,]    0 2.471442
-[2,]    0 3.570856
+[1,]    0 2.471
+[2,]    0 3.571
 Probalities at zero values of the covariates.
-0.07788458 0.9221154
+0.0779 0.9221
 ...
 \end{CodeOutput}
 \end{CodeChunk}
-The summary provides all parameters of the model, also the 
-(redundant) zeroes for the base-line category in the multinomial model. 
-The summary also prints the transition probabilities
-at the zero value of the covariate. Note that scaling of the covariate 
-is useful in this regard as it makes interpretation of these intercept probabilities 
-easier. 
+The summary provides all parameters of the model, also the (redundant)
+zeroes for the base-line category in the multinomial model.  The
+summary also prints the transition probabilities at the zero value of
+the covariate.  Note that scaling of the covariate is useful in this
+regard as it makes interpretation of these intercept probabilities
+easier.
 
 \subsection{Multivariate data}
 
-Multivariate data can be modelled by providing a list of formulae as 
-well as a list of family objects for the distributions of the various 
-responses. In above examples we have only used the response times 
-which were modelled with the Gaussian distribution. The accuracy data 
-are in the \code{speed} data are modelled with a multinomial by 
-specifying the following: 
+Multivariate data can be modelled by providing a list of formulae as
+well as a list of family objects for the distributions of the various
+responses.  In above examples we have only used the response times
+which were modelled with the Gaussian distribution.  The accuracy
+variable in the \code{speed} data can be modelled with a multinomial
+by specifying the following:
 \begin{CodeChunk}
 \begin{CodeInput}
 > set.seed(1)
@@ -635,7 +639,7 @@
 > fm <- fit(mod)
 \end{CodeInput}
 \end{CodeChunk}
-which provides the following fitted model parameters (only the 
+This provides the following fitted model parameters (only the 
 response parameters are given here): 
 \begin{CodeChunk}
 \begin{CodeInput}
@@ -648,56 +652,57 @@
 Response model for response 1 
 Model of type gaussian, formula: rt ~ 1
 Coefficients: 
-[1] 5.52169
-sd  0.2028857 
+[1] 5.522
+sd  0.2029 
 
 Response model for response 2 
 Model of type multinomial, formula: corr ~ 1
 Coefficients: 
      [,1]      [,2]
-[1,]    0 0.1030554
+[1,]    0 0.1031
 Probalities at zero values of the covariates.
-0.4742589 0.5257411 
+0.4743 0.5257
 
 Response model(s) for state 2 
 
 Response model for response 1 
 Model of type gaussian, formula: rt ~ 1
 Coefficients: 
-[1] 6.39369
-sd  0.2373650 
+[1] 6.394
+sd  0.2374
 
 Response model for response 2 
 Model of type multinomial, formula: corr ~ 1
 Coefficients: 
 	   [,1]     [,2]
-[1,]    0 2.245514
+[1,]    0 2.2455
 Probalities at zero values of the covariates.
-0.09573715 0.9042629 	
+0.0957 0.9043 	
 \end{CodeOutput}
 \end{CodeChunk}
-As can be seen, state 1 has fast response times around chance level, 
-whereas state 2 corresponds with slower responding at higher accuracy 
-levels. 
+As can be seen, state 1 has fast response times and accuracy is
+approximately at chance level (.474), whereas state 2 corresponds with
+slower responding at higher accuracy levels (.904).
 
-Note that by specifying multivariate observations in terms of a list, the 
-variables are considered conditionally independent (given the states). 
-Conditionally \emph{dependent} variables must be handled as a single element in
-the list. Effectively, this means specifying a multivariate response model. The 
-only multivariate response model currently implemented in \pkg{depmix} is for
-multivariate normal variables.
+Note that by specifying multivariate observations in terms of a list,
+the variables are considered conditionally independent (given the
+states).  Conditionally \emph{dependent} variables must be handled as
+a single element in the list.  Effectively, this means specifying a
+multivariate response model.  The only multivariate response model
+currently implemented in \pkg{depmixS4} is for multivariate normal
+variables.
 
 \subsection{Adding covariates on the prior probabilities}
 
-To illustrate the use of covariates on the prior probabilities we have included
-another data set with \pkg{depmixS4}. The \code{balance} data consists 
-of 4 binary items (correct-incorrect) on a balance scale task 
-\citet{Siegler1981}. The data form a subset of the data published in 
-\citet{Jansen2002}. 
+To illustrate the use of covariates on the prior probabilities we have
+included another data set with \pkg{depmixS4}.  The \code{balance}
+data consists of 4 binary items (correct-incorrect) on a balance scale
+task \citet{Siegler1981}.  The data form a subset of the data
+published in \citet{Jansen2002}.
 
-Similarly to the transition matrix, covariates on the prior 
-probabilities of the latent states (or classes in this case), are 
-defined by using a one-sided formula: 
+Similarly to the transition matrix, covariates on the prior
+probabilities of the latent states (or classes in this case), are
+defined by using a one-sided formula:
 \begin{CodeChunk}
 \begin{CodeInput}
 > balance$age <- balance$age-5
@@ -710,10 +715,15 @@
 \end{CodeInput}
 \end{CodeChunk}
 Note here that we define a \code{mix} model instead of a \code{depmix}
-models as these data form independent observations.
+models as these data form independent observations.  More formally,
+\code{depmix} models extend the class of \code{mix} models by adding
+the transition models.  As for fitting \code{mix} models: as can be
+seen in equation~\ref{eq:Q}, the EM algorithm can be applied by simply
+dropping the second summand containing the transition parameters, and 
+this is implemented as such in the EM algorithms in \pkg{depmixS4}.
 
-The summary of the \code{fit}ted model gives (only the prior model is 
-shown here): 
+The summary of the \code{fit}ted model gives the following (only the
+prior model is shown here):
 \begin{CodeChunk}
 \begin{CodeInput}
 > summary(fm)
@@ -722,10 +732,10 @@
 Mixture probabilities model 
 Model of type multinomial, formula: ~age
      [,1]       [,2]
-[1,]    0 -2.5182573
-[2,]    0  0.5512996
+[1,]    0 -2.518
+[2,]    0  0.551
 Probalities at zero values of the covariates.
-0.9254119 0.07458815 
+0.9254 0.0746
 ...
 \end{CodeOutput}
 \end{CodeChunk}	
@@ -736,29 +746,29 @@
 
 \subsection{Fixing and constraining parameters}
 
-Using package \pkg{Rdonlp2} by \citet{Tamura2009}, parameters may be fitted subject to
-general linear (in-)equality constraints.  Constraining and fixing
-parameters is done using the \code{conpat} argument to the
-\code{fit}-function, which specifies for each parameter in the
-model whether it's fixed (0) or free (1 or higher).  Equality
-constraints can be imposed by having two parameters have the same
-number in the \code{conpat} vector.  When only fixed values are
+Using package \pkg{Rdonlp2} by \citet{Tamura2009}, parameters may be
+fitted subject to general linear (in-)equality constraints.
+Constraining and fixing parameters is done using the \code{conpat}
+argument to the \code{fit}-function, which specifies for each
+parameter in the model whether it's fixed (0) or free (1 or higher).
+Equality constraints can be imposed by having two parameters have the
+same number in the \code{conpat} vector.  When only fixed values are
 required, the \code{fixed} argument can be used instead of
 \code{conpat}, with zeroes for fixed parameters and other values (ones
 e.g.) for non-fixed parameters.  Fitting the models subject to these
-constraints is handled by the optimization routine \code{donlp2}.
-To be able to construct the \code{conpat} and/or \code{fixed} vectors 
-one needs the correct ordering of parameters which is briefly discussed 
-next before proceeding with an example. 
+constraints is handled by the optimization routine \code{donlp2}.  To
+be able to construct the \code{conpat} and/or \code{fixed} vectors one
+needs the correct ordering of parameters which is briefly discussed
+next before proceeding with an example.
 
 \paragraph{Parameter numbering} When using the \code{conpat} and
 \code{fixed} arguments, complete parameter vectors should be supplied,
 i.e., these vectors should have length of the number of parameters of
-the model, which can be obtained by calling \code{npar(object)}. Note that
-this is not the same as the degrees of freedom used e.g.\ in the \code{logLik}
-function because \code{npar} also counts the baseline category zeroes
-from the multinomial logistic models. 
-Parameters are numbered in the following order:
+the model, which can be obtained by calling \code{npar(object)}.  Note
+that this is not the same as the degrees of freedom used e.g.\ in the
+\code{logLik} function because \code{npar} also counts the baseline
+category zeroes from the multinomial logistic models.  Parameters are
+numbered in the following order:
 \begin{enumerate}
 	\item  the prior model parameters
 	\item  the parameters for the transition models
@@ -794,8 +804,9 @@
 > fm1 <- fit(mod)
 \end{CodeInput}
 \end{CodeChunk}
-After this, we use the fitted values from this model to constrain the 
-regression coefficients on the transition matrix (parameters numbers~6 and~10):
+After this, we use the fitted values from this model to constrain the
+regression coefficients on the transition matrix (parameters numbers~6
+and~10):
 \begin{CodeChunk}
 \begin{CodeInput}
 # start with fixed and free parameters
@@ -818,14 +829,19 @@
 
 \section[Extending depmixS4]{Extending \pkg{depmixS4}}
 
-The \pkg{depmixS4} package was designed with the aim of making it relatively
-easy to add new response distributions (as well as possibly new prior and 
-transition models). 
-To make this possible, the EM routine simply calls the \code{fit} methods 
[TRUNCATED]

To get the complete diff run:
    svnlook diff /svnroot/depmix -r 310