[Depmix-commits] r309 - papers/jss

Thu Aug 13 14:47:30 CEST 2009

Author: maarten
Date: 2009-08-13 14:47:29 +0200 (Thu, 13 Aug 2009)
New Revision: 309

Modified:
   papers/jss/article.tex
Log:
- more changes to jss paper

Modified: papers/jss/article.tex
===================================================================

--- papers/jss/article.tex	2009-08-05 14:03:04 UTC (rev 308)
+++ papers/jss/article.tex	2009-08-13 12:47:29 UTC (rev 309)
@@ -98,26 +98,27 @@
 
 Markov and latent Markov models are frequently used in the social
 sciences, in different areas and applications.  In psychology, they
-are used for modelling learning processes, see \citet{Wickens1982},
-for an overview, and e.g.\ \citet{Schmittmann2006} for a recent
-application.  In economics, latent Markov models are so-called regime
+are used for modelling learning processes (see \citealp{Wickens1982},
+for an overview, and e.g. \citealp{Schmittmann2006}, for a recent
+application).  In economics, latent Markov models are so-called regime
 switching models (see e.g., \citealp{Kim1994} and
 \citealp{Ghysels1994}).  Further applications include speech
 recognition \citep{Rabiner1989}, EEG analysis \citep{Rainer2000}, and
-genetics \citep{Krogh1998}.  In those latter areas of application,
-latent Markov models are usually referred to as hidden Markov models.
-For more examples of applications, see e.g.\
-\citet[][chapter~1]{Cappe2005}.
+genetics \citep{Krogh1998}.  In these latter areas of application,
+latent Markov models are usually referred to as hidden Markov models. 
+See for example \citet{Fruhwirth2006} for an overview of hidden Markov models 
+with extensions. Further examples of applications can be found in e.g.\
+\citet[][chapter~1]{Cappe2005}.  
 
 The \pkg{depmixS4} package was motivated by the fact that while Markov
 models are used commonly in the social sciences, no comprehensive
-package was available for fitting such models.  Common software for
+package was available for fitting such models.  Existing software for
 estimating Markovian models include Panmark \citep{Pol1996}, and for
-latent class models Latent Gold \citep{Vermunt2003}.  Those programs
-are lacking a number of important features, besides not being freely
-available.  There are currently some packages in \proglang{R} that
+latent class models Latent Gold \citep{Vermunt2003}. These programs
+lack a number of important features, besides not being freely
+available. There are currently some packages in \proglang{R} that
 handle hidden Markov models but they lack a number of features that we
-needed in our research.  In particular, \pkg{depmixS4} was designed to
+needed in our research. In particular, \pkg{depmixS4} was designed to
 meet the following goals:
 
 \begin{enumerate}
@@ -129,12 +130,11 @@
 	to have time-dependent transition matrices
 	
 	\item to be able to include covariates in the prior or initial
-	state probabilities of models
+	state probabilities
 	
 	\item to be easily extensible, in particular, to allow users to
-	easily add new uni- or multivariate response distributions, and
-	similarly for the addition of other transition models, e.g.,
-	continuous time observation models
+	easily add new uni- or multivariate response distributions and
+	new transition models, e.g., continuous time observation models
 	
 \end{enumerate}
 
@@ -142,26 +142,27 @@
 series data, for say $T>100$, it can also handle the limit case when
 $T=1$.  In this case, there are no time dependencies between
 observed data and the model reduces to a finite mixture or
-latent class model. Although there are other specialized packages to
+latent class model. While there are specialized packages to
 deal with mixture data, as far as we know these 
 don't allow the inclusion of covariates on the prior probabilities of
-class membership. The possibility to estimating the effects of covariates on 
+class membership. The possibility to estimate the effects of covariates on 
 prior and transition probabilities is a distinguishing feature of 
-\pkg{depmixS4}. In the next section, an outline is provided of the model and 
-the likelihood equations.
+\pkg{depmixS4}. In the next section, we provide an outline of the model and 
+likelihood equations.
 
 
 \section{The dependent mixture model}
 
-The data considered here have the general form $\vc{O}_{1:T}=O_{1}^{1}, \ldots,
+The data considered here have the general form $\vc{O}_{1:T}= (O_{1}^{1}, \ldots,
 O_{1}^{m}$, $O_{2}^{1}, \ldots, O_{2}^{m}$, \ldots, $O_{T}^{1},
-\ldots, O_{T}^{m}$ for an $m$-variate time series of length $T$.  As
+\ldots, O_{T}^{m})$ for an $m$-variate time series of length $T$.  As
 an example, consider a time series of responses generated by a single
-participant in a response time experiment. The data consists of three
+participant in a psychological response time experiment. The data consists of three
 variables, response time, response accuracy, and a covariate which is a pay-off
-variable reflecting the relative reward for speeded and/or accuracy responding. 
-These variables are measured on 168, 134 and 137 occasions respectively (in
-Figure~\ref{fig:speed} the first part of this series is plotted). These data
+variable reflecting the relative reward for speeded and/or accurate responding. 
+These variables are measured on 168, 134 and 137 occasions respectively (the 
+first part of this series is plotted in
+Figure~\ref{fig:speed}). These data
 are more fully described in \citet{Dutilh2009}. 
 
 \begin{figure}[htbp]
@@ -173,45 +174,49 @@
   \end{center}
 \end{figure}
 
-The latent Markov model is commonly associated with data of this type,
-although usually only multinomial response variables are considered.
-However, common estimation procedures, such as those implemented in
-\citet{Pol1996}, are not suitable for long time series due to
-underflow problems.  In contrast, the hidden Markov model is typically
-only used for `long' univariate time series \citep[][,
-chapter~1]{Cappe2005}.  In the next paragraphs, the likelihood
-equation and estimation procedures for the dependent mixture model are
-described for data of the above form.  We use the term ``dependent
-mixture model'' because one of the authors (Ingmar Visser) thought it
-was time for a new name for these models\footnote{Only later did I
+The latent Markov model is usually associated with data of this type,
+in particular for multinomially distributed responses. % response variables are considered.
+However, commonly employed estimation procedures
+\citep[e.g.,][]{Pol1996}, are not suitable for long time series due to
+underflow problems. In contrast, the hidden Markov model is typically
+only used for `long' univariate time series \citep[][chapter~1]{Cappe2005}. 
+We use the term ``dependent mixture model'' because one of the authors 
+(Ingmar Visser) thought it
+was time for a new name to relate these models\footnote{Only later did I
 find out that \citet{Leroux1992} already coined the term dependent
 mixture models in an application with hidden Markov mixtures of
 Poisson count data.}.
 
+The fundamental assumption of a dependent mixture model is that at any time 
+point, the observations are distributed as a mixture with $n$ components 
+(or states), and that time-dependencies between the observations are due to 
+time-dependencies between the mixture components (i.e., transition probabilities
+between the components). These latter dependencies are assumed to
+follow a first-order Markov process. In the models we are considering here, the 
+mixture distributions, the initial mixture probabilities and transition 
+probabilities can all dependent on covariates $\vc{z}_t$. 
 
-The likelihood of the dependent mixture model conditional on the
-unknown (or hidden) state sequence $S_{1:T}$ and the model is given
-by:
+%transition probability functions $a_{ij}$ and the initial state
+%probability functions $\greekv{\pi}$ may depend on covariates as well
+%as the response distributions $\vc{b_{j}^{k}}$.  See for example
+%\citet{Fruhwirth2006} for an overview of hidden Markov models with
+%extensions.
+
+In a dependent mixture model, the joint likelihood of observations $\vc{O}_{1:T}$ and 
+latent states $\vc{S}_{1:T} = (S_1,\ldots,S_T)$, given model parameters 
+$\greekvec{\theta}$ and covariates $\vc{z}_{1:T} = (\vc{z}_1,\ldots,\vc{z}_T)$, 
+can be written as
 \begin{equation}
-	\Prob( \vc{O}_{1:T} | \vc{S}_{1:T}, \greekv{\theta} ) = 
-	\prod_{t=1}^{T}  \Prob( \vc{O}_{t} | S_{t},  \greekv{\theta}),
+	\Prob(\vc{O}_{1:T},\vc{S}_{1:T}|\greekv{\theta},\vc{z}_{1:T}) =  
+	\pi_{i}(\vc{z}_1) \vc{b}_{S_t}(\vc{O}_1|\vc{z}_{1})
+	\prod_{t=1}^{T-1} a_{ij}(\vc{z}_{t}) \vc{b}_{S_t}(\vc{O}_{t+1}|\vc{z}_{t+1}),
 \end{equation}
-where $\greekv{\theta}$ is the parameter vector of the model.  To arrive
-at the likelihood of the data given the parameter vector
-$\greekv{\theta}$, i.e.\ without the state sequence, we need to sum
-above likelihood over all possible state sequences. This likelihood 
-is written as: 
-\begin{equation}
-	\Prob(\vc{O}_{1:T}|\vc{S}_{1:T},\greekv{\theta}) 
-	\sum_{all Ss} \pi_{i}(\vc{z}_1) \vc{b}_{1}(\vc{O}_{1}) 
-	\prod_{t=1}^{T-1} a_{ij}(z_{t}) \vc{b}_{j}(\vc{O}_{t+1}|\vc{z}_{t+1}),
-\end{equation}
 where we have the following elements:
 \begin{enumerate}
 	
 	\item $S_{t}$ is an element of $\mathcal{S}=\{1\ldots n\}$, a set
-	of $n$ latent classes or states; we write for short $S_{i}$ to
-	denote $S_{t}=i$.
+	of $n$ latent classes or states. %; we write for short $S_{i}$ to
+	%denote $S_{t}=i$.
 	
 	\item $\pi_{i}(\vc{z}_1) = \Prob(S_1 = i|\vc{z}_1)$,
 	giving the probability of class/state $i$ at time $t=1$ with
@@ -221,13 +226,57 @@
 	provides the probability of a transition from state $i$ to state
 	$j$ with covariate $\vc{z}_t$,
 	
-	\item $\vc{b}_{j}$ is a vector of observation densities
-	$b_j^k(\vc{z}_t) = \Prob(O_{t}^k|S_t = j, \vc{z}_t)$ that provide the
+	\item $\vc{b}_{S_t}$ is a vector of observation densities
+	$b_{j}^k(\vc{z}_t) = \Prob(O_{t}^k|S_t = j, \vc{z}_t)$ that provide the
 	conditional densities of observations $O_{t}^k$ associated with
 	latent class/state $j$ and covariate $\vc{z}_t$, $j=1, \ldots, n$,
 	$k=1, \ldots, m$. 
 \end{enumerate}
 
+%In the next paragraphs, the likelihood
+%equation and estimation procedures for the dependent mixture model are
+%described for data of the above form.  
+
+%The likelihood of the dependent mixture model conditional on the
+%unknown (or hidden) state sequence $\vc{S}_{1:T} = (S_1,\ldots,S_T)$ 
+%and the model is given
+%by:
+%\begin{equation}
+%	\Prob( \vc{O}_{1:T} | \vc{S}_{1:T}, \greekv{\theta} ) = 
+%	\prod_{t=1}^{T}  \Prob( \vc{O}_{t} | S_{t},  \greekv{\theta}),
+%\end{equation}
+%where $\greekv{\theta}$ is the parameter vector of the model.  To arrive
+%at the marginal likelihood of the data for parameter vector
+%$\greekv{\theta}$, i.e.\ unconditional on the state sequence, we need to sum
+%above likelihood over all possible state sequences. This likelihood 
+%is written as: 
+%\begin{equation}
+%	\Prob(\vc{O}_{1:T}|\greekv{\theta}) =  
+%	\sum_{all Ss} \pi_{i}(\vc{z}_1) \vc{b}_{1}(\vc{O}_{1}) 
+%	\prod_{t=1}^{T-1} a_{ij}(\vc{z}_{t}) \vc{b}_{j}(\vc{O}_{t+1}|\vc{z}_{t+1}),
+%\end{equation}
+%where we have the following elements:
+%\begin{enumerate}
+%	
+%	\item $S_{t}$ is an element of $\mathcal{S}=\{1\ldots n\}$, a set
+%	of $n$ latent classes or states; we write for short $S_{i}$ to
+%	denote $S_{t}=i$.
+%	
+%	\item $\pi_{i}(\vc{z}_1) = \Prob(S_1 = i|\vc{z}_1)$,
+%	giving the probability of class/state $i$ at time $t=1$ with
+%	covariate $\vc{z}_1$.
+%	
+%	\item $a_{ij}(\vc{z}_t) = \Prob(S_{t+1}=j|S_{t}=i,\vc{z}_t)$,
+%	provides the probability of a transition from state $i$ to state
+%	$j$ with covariate $\vc{z}_t$,
+%	
+%	\item $\vc{b}_{j}$ is a vector of observation densities
+%	$b_j^k(\vc{z}_t) = \Prob(O_{t}^k|S_t = j, \vc{z}_t)$ that provide the
+%	conditional densities of observations $O_{t}^k$ associated with
+%	latent class/state $j$ and covariate $\vc{z}_t$, $j=1, \ldots, n$,
+%	$k=1, \ldots, m$. 
+%\end{enumerate}
+
 % The dependent mixture model is defined by the following equations: 
 % %\begin{align}
 % %	S_{t} &= A S_{t-1}, t=2, \ldots, T \\
@@ -241,37 +290,38 @@
 %matrix, $O_{t}$ is an (possibly multivariate) observation and $b$ is a
 %density function for $O_{t}$ conditional on the hidden state $S_{t}$.
 
-In the example data above, $b_j^k$ could be a Gaussian distribution
-function for the response time variable, and a Bernoulli for the
-accuracy data.  In the models we are considering here, both the
-transition probability functions $a_{ij}$and the initial state
+For the example data above, $b_j^k$ could be a Gaussian distribution
+function for the response time variable, and a Bernoulli distribution for the
+accuracy variable.  In the models we are considering here, both the
+transition probability functions $a_{ij}$ and the initial state
 probability functions $\greekv{\pi}$ may depend on covariates as well
-as the response distributions $\vc{b_{j}^{k}}$.  See for example
+as the response distributions $b_{j}^{k}$.  See for example
 \citet{Fruhwirth2006} for an overview of hidden Markov models with
 extensions.
 
 \subsection{Likelihood}
 
-The log-likelihood of hidden Markov models is usually computed by the
+To obtain maximum likelihood estimates of the model parameters, we need the
+marginal likelihood of the observations. For hidden Markov models, this 
+marginal (log-)likelihood is usually computed by the
 so-called forward-backward algorithm \citep{Baum1966,Rabiner1989}, or
-rather by the forward part of this algorithm.  \cite{Lystig2002}
+rather by the forward part of this algorithm. \cite{Lystig2002}
 changed the forward algorithm in such a way as to allow computing the
-gradients of the log-likelihood at the same time.  They start by
+gradients of the log-likelihood at the same time. They start by
 rewriting the likelihood as follows (for ease of exposition the
-dependence on the model parameters is dropped here):
+dependence on the model parameters and covariates is dropped here):
 \begin{equation}
-	L_{T} = Pr(\vc{O}_{1:T}) = \prod_{t=1}^{T} 
-	Pr(\vc{O}_{t}|\vc{O}_{1:t-1}), 
+	L_{T} = \Prob(\vc{O}_{1:T}) = \prod_{t=1}^{T} 
+	\Prob(\vc{O}_{t}|\vc{O}_{1:(t-1)}), 
 	\label{condLike}
 \end{equation}
-where $Pr(\vc{O}_{1}|\vc{O}_{0}):=Pr(\vc{O}_{1})$. Note that for a 
+where $\Prob(\vc{O}_{1}|\vc{O}_{0}):=\Prob(\vc{O}_{1})$. Note that for a 
 simple, i.e.\ observed, Markov chain these probabilities reduce to 
-$Pr(\vc{O}_{t}|\vc{O}_{1},\ldots, 
-\vc{O}_{t-1})=Pr(\vc{O}_{t}|\vc{O}_{t-1})$.
+$\Prob(\vc{O}_{t}|\vc{O}_{1},\ldots, 
+\vc{O}_{t-1})=\Prob(\vc{O}_{t}|\vc{O}_{t-1})$.
 The log-likelihood can now be expressed as:
 \begin{equation}
-	l_{T} = \sum_{t=1}^{T} \log[\Prob(\vc{O}_{t}|\vc{O}_{1}, \ldots, 
-\vc{O}_{t-1})].
+	l_{T} = \sum_{t=1}^{T} \log[\Prob(\vc{O}_{t}|\vc{O}_{1:(t-1)}].
 	\label{eq:condLogl}
 \end{equation}
 
@@ -280,16 +330,15 @@
 \begin{align}
 	\phi_{1}(j) &:= \Prob(\vc{O}_{1}, S_{1}=j) = \pi_{j} b_{j}(\vc{O}_{1}) 
 	\label{eq:fwd1} \\
-\begin{split}
-	\phi_{t}(j) &:= \Prob(\vc{O}_{t}, S_{t}=j|\vc{O}_{1}, \ldots, 
-\vc{O}_{t-1}) \\
-	&= \sum_{i=1}^{N} [\phi_{t-1}(i)a_{ij}b_{j}(\vc{O}_{t})] \times 
+%\begin{split}
+	\phi_{t}(j) &:= \Prob(\vc{O}_{t}, S_{t}=j|\vc{O}_{1:(t-1)}) %\\
+	= \sum_{i=1}^{N} [\phi_{t-1}(i)a_{ij}b_{j}(\vc{O}_{t})] \times 
 (\Phi_{t-1})^{-1},
 	\label{eq:fwdt} 
-\end{split} 
+%\end{split} 
 \end{align}
 where $\Phi_{t}=\sum_{i=1}^{N} \phi_{t}(i)$. Combining 
-$\Phi_{t}=\Prob(\vc{O}_{t}|\vc{O}_{1}, \ldots, \vc{O}_{t-1})$, and 
+$\Phi_{t}=\Prob(\vc{O}_{t}|\vc{O}_{1:(t-1)})$, and 
 equation~(\ref{eq:condLogl}) gives the following expression for the 
 log-likelihood:
 \begin{equation}
@@ -307,12 +356,12 @@
 Let $\greekv{\theta} = (\greekv{\theta}_1, \greekv{\theta}_2,\greekv{\theta}_3)$
 be the general parameter vector consisting of three subvectors with parameters 
 for the prior model, transition model, and response model respectively. The 
-joint log likelihood can be written as
+joint log-likelihood can be written as
 \begin{equation}
 \log \Prob(\vc{O}_{1:T}, \vc{S}_{1:T}|\vc{z}_{1:T},\greekv{\theta}) = \log 
 \Prob(S_1|\vc{z}_{1},\greekv{\theta}_1) 
 + \sum_{t=2}^{T} \log \Prob(S_t|S_{t-1},\vc{z}_{t-1},\greekv{\theta}_2) 
-+ \sum_{t=1}^{T} \log \Prob(O_t|S_t,\vc{z}_t,\greekv{\theta}_3)
++ \sum_{t=1}^{T} \log \Prob(\vc{O}_t|S_t,\vc{z}_t,\greekv{\theta}_3)
 \end{equation}
 This likelihood depends on the unobserved states $\vc{S}_{1:T}$. In the 
 Expectation step, we replace these with their expected values given a set of 
@@ -329,7 +378,7 @@
 \label{eq:Q}
 Q(\greekv{\theta},\greekv{\theta}') = 
 \sum_{j=1}^n \gamma_1(j) \log \Prob(S_1=j|\vc{z}_1,\greekv{\theta}_1) \\ 
-+ \sum_{t=2}^T \sum_{j=1}^n \sum_{k=1}^n \xi_t^i(j,k) \log \Prob(S_t = k|S_{t-1} 
++ \sum_{t=2}^T \sum_{j=1}^n \sum_{k=1}^n \xi_t(j,k) \log \Prob(S_t = k|S_{t-1} 
 = j,\vc{z}_{t-1},\greekv{\theta}_2)  \\
  + \sum_{t=1}^T \sum_{j=1}^n \sum_{k=1}^m \gamma_t(j) 
 \ln \Prob(O^k_t|S_t=j,\vc{z}_t,\greekv{\theta}_3),
@@ -338,7 +387,7 @@
 where the expected values $\xi_t(j,k) =  P(S_t = k, S_{t-1} = j|\vc{O}_{1:T},
 \vc{z}_{1:T},\greekv{\theta}')$ and $\gamma_t(j) = P(S_t = j|\vc{O}_{1:T},
 \vc{z}_{1:T},\greekv{\theta}')$ can be computed effectively by the 
-Forward-Backward algorithm \citep[see e.g.,][]{Rabiner1989}. The Maximisation 
+forward-backward algorithm \citep[see e.g.,][]{Rabiner1989}. The Maximisation 
 step consists of the maximisation of (\ref{eq:Q}) for $\greekv{\theta}$. As the 
 right hand side of (\ref{eq:Q}) consists of three separate parts, we can 
 maximise separately for $\greekv{\theta}_1$, $\greekv{\theta}_2$ and 
@@ -393,7 +442,8 @@
 
 \subsection{Example data: speed}
 
-Throughout this article a data set called \code{speed} is used.  It
+Throughout this article a data set called \code{speed} is used.  As already 
+indicated in the Introduction, it
 consists of three time series with three variables: response time,
 accuracy, and a covariate Pacc which defines the relative pay-off for
 speeded and accurate responding.  The participant in this experiment
@@ -406,9 +456,10 @@
 
 \subsection{A simple model}
 
-A dependent mixture model is defined by the number of states, and by
-the response distribution functions, and can be created with the
-\code{depmix}-function as follows:
+A dependent mixture model is defined by the number of states and the initial 
+state, state transition, and response distribution functions. A default 
+dependent mixture model can be created with the \code{depmix}-function as 
+follows:
 
 \begin{CodeChunk}
 \begin{CodeInput}
@@ -435,7 +486,7 @@
 \code{depmix.fitted} which extends the \code{depmix} class, adding
 convergence information (and information about constraints if these
 were applied, see below).  The \code{print} method provides summary
-information on convergence, the log likelihood and the AIC and BIC
+information on convergence, the log-likelihood and the AIC and BIC
 values:
 \begin{CodeChunk}
 \begin{CodeInput}
@@ -500,32 +551,37 @@
 \end{CodeOutput}
 \end{CodeChunk}
 
+Note that, since no further arguments were specified, the initial state, 
+state transition and response distributions were set to their defaults 
+(multinomial distributions for the first two, and Gaussian distributions for the
+response distributions). 
 
 \subsection{Covariates on transition parameters}
 
-The transition probabilities and the initial state probabilities
+By default, the transition probabilities and the initial state probabilities
 are parameterized using the multinomial
-logistic model.   In particular, each row of the transition matrix is
+logistic model. More precisely, each row of the transition matrix is
 parameterized by a baseline category logistic multinomial, 
 meaning that the parameter for the
 base category is fixed at zero \citep[see][p.\ 267
 ff., for multinomial logistic models and various
-parameterizations]{Agresti2002}. See also \citet{Chung2007} for similar models, 
-latent transition models using logistic regression on the transition parameters. 
-They fit such models on repeated measurement
-data ($T=2$) using Bayesian methods.  The default baseline category is the 
-first state.
-Hence, for example, for a 3-state model, the initial state probability
+parameterizations]{Agresti2002}. The default baseline category is the 
+first state. Hence, for example, for a 3-state model, the initial state probability
 model would have three parameters of which the first is fixed at zero
-and the other two are freely estimated.
+and the other two are freely estimated. 
 
+The multinomial logistic model allows us to include covariates 
+on the initial state and transition probabilities. \citet{Chung2007} discuss a 
+related latent transition model for repeated measurement
+data ($T=2$) using logistic regression on the transition 
+parameters; they rely on Bayesian methods of estimation. 
 Covariates on the transition probabilities can be specified using a
 one-sided formula as in the following example:
 \begin{CodeChunk}
 \begin{CodeInput}
 > set.seed(1)
 > mod <- depmix(rt~1, data=speed, nstates=2, family=gaussian(), 
-  transition=~scale(Pacc), instart=runif(2))
+    transition=~scale(Pacc), instart=runif(2))
 > fm <- fit(mod)
 \end{CodeInput}
 \end{CodeChunk}
@@ -538,7 +594,7 @@
 Transition model for state (component) 1 
 Model of type multinomial, formula: ~scale(Pacc)
 Coefficients: 
-	 [,1]       [,2]
+     [,1]       [,2]
 [1,]    0 -0.9215182
 [2,]    0  1.8649734
 Probalities at zero values of the covariates.
@@ -547,7 +603,7 @@
 Transition model for state (component) 2 
 Model of type multinomial, formula: ~scale(Pacc)
 Coefficients: 
-	 [,1]     [,2]
+     [,1]     [,2]
 [1,]    0 2.471442
 [2,]    0 3.570856
 Probalities at zero values of the covariates.
@@ -574,8 +630,8 @@
 \begin{CodeInput}
 > set.seed(1)
 > mod <- depmix(list(rt~1,corr~1), data=speed, nstates=2, 
-	  family=list(gaussian(),multinomial()),
-	  transition=~scale(Pacc),instart=runif(2))
+    family=list(gaussian(),multinomial()),
+    transition=~scale(Pacc),instart=runif(2))
 > fm <- fit(mod)
 \end{CodeInput}
 \end{CodeChunk}
@@ -598,7 +654,7 @@
 Response model for response 2 
 Model of type multinomial, formula: corr ~ 1
 Coefficients: 
-	 [,1]      [,2]
+     [,1]      [,2]
 [1,]    0 0.1030554
 Probalities at zero values of the covariates.
 0.4742589 0.5257411 
@@ -614,7 +670,7 @@
 Response model for response 2 
 Model of type multinomial, formula: corr ~ 1
 Coefficients: 
-	 [,1]     [,2]
+	   [,1]     [,2]
 [1,]    0 2.245514
 Probalities at zero values of the covariates.
 0.09573715 0.9042629 	
@@ -665,7 +721,7 @@
 \begin{CodeOutput}
 Mixture probabilities model 
 Model of type multinomial, formula: ~age
-   [,1]   [,2]
+     [,1]       [,2]
 [1,]    0 -2.5182573
 [2,]    0  0.5512996
 Probalities at zero values of the covariates.
@@ -713,8 +769,7 @@
 To see the ordering of parameters use the following:
 \begin{CodeChunk}
 \begin{CodeInput}
-> mod <- setpars(mod, value=1:npar(mod))
-> mod
+> setpars(mod, value=1:npar(mod))
 \end{CodeInput}
 \end{CodeChunk}
 
@@ -723,8 +778,7 @@
 initial state probabilities model):
 \begin{CodeChunk}
 \begin{CodeInput}
-> mod <- setpars(mod, getpars(mod,which="fixed"))
-> mod
+> setpars(mod, getpars(mod,which="fixed"))
 \end{CodeInput}
 \end{CodeChunk}
 
@@ -758,17 +812,17 @@
 tenability of constraints \citep{Dannemann2007}. Note that these arguments 
 provide the possibility for arbitrary 
 constraints, also between, e.g., a multinomial regression coefficient 
-for the transition matrix and the mean of a gaussian response model. 
+for the transition matrix and the mean of a Gaussian response model. 
 Whether such constraints make sense is hence the responsibility of 
 the user. 
 
 \section[Extending depmixS4]{Extending \pkg{depmixS4}}
 
-The \pkg{depmixS4} package was built with the aim of having the 
-possibility of adding new response distributions (and possibly also 
-other models for the prior and transition probabilities models). 
-Therefore, the EM algorithm simply calls \code{fit} functions from the 
-separate response models without knowing what they are. As a 
+The \pkg{depmixS4} package was designed with the aim of making it relatively
+easy to add new response distributions (as well as possibly new prior and 
+transition models). 
+To make this possible, the EM routine simply calls the \code{fit} methods 
+of the separate response models without needing to know how they work. As a 
 consequence, adding user-specified response models is straightforward. 
 User-defined distributions should extend the \code{response}-class 
 and have the following slots:
@@ -776,7 +830,7 @@
 	\item y: the response variable
 	\item x: the design matrix, possibly only an intercept
 	\item paramaters: a named list with the coefficients and possibly 
-	other parameters, e.g., the standard deviation in the gaussian 
+	other parameters, e.g., the standard deviation in the Gaussian 
 	response model
 	\item fixed: a vector of logicals indicating whether parameters are 
 	fixed
@@ -784,11 +838,12 @@
 \end{enumerate}
 
 In the \code{speed} data example, it may be more appropriate to model
-the response times as an exgaus distribution rather than using the
-Gaussian.  We can do so as follows, first we define our own
-exgaus-class extending the response-class:
+the response times with an exgaus rather than Gaussian distribution.  
+To do so, we first define an exgaus-class extending the response-class:
 \begin{CodeChunk}
-	setClass("exgaus", contains="response")
+\begin{CodeInput}
+setClass("exgaus", contains="response")
+\end{CodeInput}
 \end{CodeChunk}
 
 The so-defined class now needs a number of methods: 
@@ -803,84 +858,86 @@
 	by the EM algorithm)
 \end{enumerate}
 
-Only the constructor function and the fit function are provided here
-and the complete code is provided in the helpfile for \code{makeDepmix}. 
-The \code{exgaus} example uses the \pkg{gamlss} and \pkg{gamlss.distr} packages \cite{Stasinopoulos2009a,Stasinopoulos2009b} for 
+Only the constructor and the fit method are provided here; the complete code 
+can be found in the helpfile for \code{makeDepmix}. 
+The \code{exgaus} example uses the \pkg{gamlss} and \pkg{gamlss.distr} 
+packages \citep{Stasinopoulos2009a,Stasinopoulos2009b} for 
 computing the \code{dens}ity and for \code{fit}ting the parameters. 
 
-The constructor function is defined as: 
+The constructor method is defined as: 
 \begin{CodeChunk}
 \begin{CodeInput}
 setGeneric("exgaus", function(y, pstart = NULL, fixed = NULL, ...) 
-	standardGeneric("exgaus"))
+  standardGeneric("exgaus"))
+  
 setMethod("exgaus", 
-	signature(y="ANY"), 
-	function(y,pstart=NULL,fixed=NULL, ...) {
-		y <- matrix(y,length(y))
-		x <- matrix(1)
-		parameters <- list()
-		npar <- 3
-		if(is.null(fixed)) fixed <- as.logical(rep(0,npar))
-		if(!is.null(pstart)) {
-			if(length(pstart)!=npar) stop("length of 'pstart' must be ",npar)
-			parameters$mu <- pstart[1]
-			parameters$sigma <- log(pstart[2])
-			parameters$nu <- log(pstart[3])
-		}
-		mod <- new("exgaus",parameters=parameters,fixed=fixed,x=x,y=y,npar=npar)
-		mod
-	}
+  signature(y="ANY"), 
+  function(y,pstart=NULL,fixed=NULL, ...) {
+    y <- matrix(y,length(y))
+    x <- matrix(1)
+    parameters <- list()
+    npar <- 3
+    if(is.null(fixed)) fixed <- as.logical(rep(0,npar))
+    if(!is.null(pstart)) {
+      if(length(pstart)!=npar) stop("length of 'pstart' must be ",npar)
+      parameters$mu <- pstart[1]
+      parameters$sigma <- log(pstart[2])
+      parameters$nu <- log(pstart[3])
+    }
+    mod <- new("exgaus",parameters=parameters,fixed=fixed,x=x,y=y,npar=npar)
+    mod
+  }
 )
 \end{CodeInput}
 \end{CodeChunk}
 
-The \code{fit} function is defined as follows: 
+The \code{fit} method is defined as follows: 
 \begin{CodeChunk}
 \begin{CodeInput}
 setMethod("fit","exgaus",
-	function(object,w) {
-		if(missing(w)) w <- NULL
-		y <- object at y
-		fit <- gamlss(y~1,weights=w,family=exGAUS(),
-			control=gamlss.control(n.cyc=100,trace=FALSE),
-			mu.start=object at parameters$mu,
-			sigma.start=exp(object at parameters$sigma),
-			nu.start=exp(object at parameters$nu))
-		pars <- c(fit$mu.coefficients,fit$sigma.coefficients,fit$nu.coefficients)
-		object <- setpars(object,pars)
-		object
-	}
+  function(object,w) {
+    if(missing(w)) w <- NULL
+    y <- object at y
+    fit <- gamlss(y~1,weights=w,family=exGAUS(),
+      control=gamlss.control(n.cyc=100,trace=FALSE),
+      mu.start=object at parameters$mu,
+      sigma.start=exp(object at parameters$sigma),
+      nu.start=exp(object at parameters$nu))
+    pars <- c(fit$mu.coefficients,fit$sigma.coefficients,fit$nu.coefficients)
+    object <- setpars(object,pars)
+    object
+  }
 )
 \end{CodeInput}
 \end{CodeChunk}
 
-The \code{fit} function defines a \code{gamlss} model with 
+The \code{fit} method defines a \code{gamlss} model with 
 only an intercept to be estimated and then sets the fitted parameters 
 back into their respective slots in the `exgaus' object. See the help 
 for \code{gamlss.distr} for interpretation of these parameters. 
 
 After defining all the necessary methods for the new response model, 
-we can  now define the dependent mixture model using this reponse. 
-The function \code{makeDepmix} is added to \pkg{depmixS4} to have 
+we can  now define the dependent mixture model using this reponse model. 
+The function \code{makeDepmix} is included in \pkg{depmixS4} to have 
 full control over model specification, and we need it here. 
 
 We first create all the response models that we want as a double list: 
 \begin{CodeChunk}
 \begin{CodeInput}
 rModels <- list(
-	list(
-		exgaus(rt,pstart=c(5,.1,.1)),
-		GLMresponse(formula=corr~1,data=speed,family=multinomial(),pstart=c(0.5,0.5))
-	),
-	list(
-		exgaus(rt,pstart=c(6,.1,.1)),
-		GLMresponse(formula=corr~1,data=speed,family=multinomial(),pstart=c(.1,.9))
-	)
+  list(
+    exgaus(rt,pstart=c(5,.1,.1)),
+    GLMresponse(formula=corr~1,data=speed,family=multinomial(),pstart=c(0.5,0.5))
+  ),
+  list(
+    exgaus(rt,pstart=c(6,.1,.1)),
+    GLMresponse(formula=corr~1,data=speed,family=multinomial(),pstart=c(.1,.9))
+  )
 )
 \end{CodeInput}
 \end{CodeChunk}
 Next, we define the transition and prior probability models using the 
-transInit function (which produces a transInit model, which also extends 
+\code{transInit} function (which produces a transInit model, which also extends 
 the response class): 
 \begin{CodeChunk}
 \begin{CodeInput}
@@ -895,9 +952,9 @@
 the model: 
 \begin{CodeChunk}
 \begin{CodeInput}
-mod <- makeDepmix(response=rModels,transition=transition,
-prior=inMod,stat=FALSE)
-fm <- fit(mod)
+> mod <- makeDepmix(response=rModels,transition=transition,
+    prior=inMod,stat=FALSE)
+> fm <- fit(mod)
 \end{CodeInput}
 \end{CodeChunk}
 
@@ -911,7 +968,7 @@
 \pkg{depmixS4} provides a flexible framework for fitting dependent mixture models
 for a large variety of response distributions. It can also fit latent class regression
 and finite mixture models, although it should be noted that more specialized packages 
-are available for doing so, e.g.\ \pkg{FlexMix} \citep{Leisch2004}. The package is intended
+are available for this such as \pkg{FlexMix} \citep{Leisch2004}. The package is intended
 for modeling of (individual) time series data with the aim of characterizing the transition 
 processes underlying the data. The possibility to use covariates on the transition matrix
 greatly enhances the flexibility of the model. The EM algorithm uses a very general