[Depmix-commits] r291 - papers/jss

Thu Jul 9 16:49:29 CEST 2009

Author: ingmarvisser
Date: 2009-07-09 16:49:26 +0200 (Thu, 09 Jul 2009)
New Revision: 291

Modified:
   papers/jss/article.tex
Log:
Filled in most sections of the paper, added first 2 examples.

Modified: papers/jss/article.tex
===================================================================

--- papers/jss/article.tex	2009-07-08 15:25:51 UTC (rev 290)
+++ papers/jss/article.tex	2009-07-09 14:49:26 UTC (rev 291)
@@ -54,6 +54,7 @@
 E-mail: \email{i.visser at uva.nl} \\
 URL: \url{http://www.ingmar.org/}
 }
+
 %% It is also possible to add a telephone and fax number
 %% before the e-mail in the following format:
 %% Telephone: +43/1/31336-5053
@@ -64,6 +65,7 @@
 
 %% end of declarations %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 
+\batchmode
 
 \begin{document}
 
@@ -73,41 +75,396 @@
 \section[]{Introduction}
 %% Note: If there is markup in \(sub)section, then it has to be escape as above.
 
-Why did we develop depmixS4?
+Markov and latent Markov models are frequently used in the social
+sciences, in different areas and applications.  In psychology, they
+are used for modelling learning processes, see \citet{Wickens1982},
+for an overview, and \citet{Schmittmann2006} for a recent application.
+In economics, latent Markov models are commonly used as regime
+switching models, see e.g.\ \citet{Kim1994} and \citet{Ghysels1994}.
+Further applications include speech recognition \citep{Rabiner1989},
+EEG analysis \citep{Rainer2000}, and genetics \citep{Krogh1998}.  In
+those latter areas of application, latent Markov models are usually
+referred to as hidden Markov models.
 
+The \pkg{depmixS4} package was motivated by the fact that Markov models
+are used commonly in the social sciences, but no comprehensive package
+was available for fitting such models.  Common programs for Markovian
+models include Panmark \citep{Pol1996}, and for latent class models
+Latent Gold \citep{Vermunt2003}.  Those programs are lacking a number
+of important features, besides not being freely available.  There are
+currently some packages in R that handle hidden Markov models but they
+lack a number of features that we needed in our research.  In
+particular, \pkg{depmixS4} was designed to meet the following goals: 
+\begin{enumerate}
+	
+	\item to be able to handle parameter estimates subject to general
+	linear (in)equality constraints
+	
+	\item to be able to fit transition models with covariates, i.e.,
+	to have time-dependent transition matrices
+	
+	\item to be able to include covariates in the prior or initial
+	state probabilities of models
+	
+	\item to allow for easy extensibility, in particular, to be able
+	to add new response distributions, both univariate and
+	multivariate, and similarly to be able to allow for the addition
+	of other transition models, e.g., continuous time observation
+	models
+	
+\end{enumerate}
+
+Although \pkg{depmixS4} is designed to deal with
+longitudinal or time series data, for say $T>100$, it can also handle
+the limit case with $T=1$.  In those cases, there are no time
+dependencies between observed data, and the model reduces to a finite
+mixture model, or a latent class model.  Although there are other
+specialized packages to deal with mixture data, one specific feature
+that we needed ourselves which is to the best of our knowledge not
+available in other packages is the possibility to include covariates
+on the prior probabilities of class membership.  In the next section,
+an outline is provided of the model and the likelihood equations.  
+
+
+\subsection*{Acknowledgements} 
+
+Ingmar Visser was supported by an EC Framework 6 grant, project 516542
+(NEST).  Maarten Speekenbrink was supported by the ESRC Centre for
+Economic Learning and Social Evolution (ELSE).  Han van der Maas
+provided the speed-accuracy data \cite{Maas2005b} and thereby
+neccessitated implementing models with time-dependent covariates.
+Brenda Jansen provided the balance scale data set \citep{Jansen2001}
+which was the perfect opportunity to test the covariates on the prior
+model parameters.  The examples in the help files use both of these
+data sets.
+
+
 \section{The dependent mixture model}
 
-Describe the model formulae
+The data considered here, has the general form $O_{1}^{1}, \ldots,
+O_{1}^{m}$, $O_{2}^{1}, \ldots, O_{2}^{m}$, \ldots, $O_{T}^{1}, \ldots,
+O_{T}^{m}$ for an $m$-variate time series of length $T$.  As an
+example, consider a time series of responses generated by a single
+subject in a reaction time experiment.  The data consists of three
+variables, reaction time, accuracy and a covariate which is a pay-off
+factor which determines the reward for speed and accuracy.  These
+variables are measured on 168, 134 and 137 occasions respectively 
+(in Figure~\ref{fig:speed} the first part of this series is plotted).
 
+\begin{figure}[htbp]
+  \begin{center}
+	  \scalebox{1}{\includegraphics*{graphs/speed1.pdf}}
+	  \caption{Reaction times, accuracy and pay-off values for
+	  the first series of responses in dataset \texttt{speed}.}
+	  \label{fig:speed}
+  \end{center}
+\end{figure}
+
+The latent Markov model is commonly associated with data of this type,
+albeit usually only multinomial variables are considered.  However,
+common estimation procedures, such as those implemented in
+\citet{Pol1996} are not suitable for long time series due to underflow
+problems.  In contrast, the hidden Markov model is typically only used
+for `long' univariate time series.  In the next sections, the
+likelihood and estimation procedure for the hidden Markov model is
+described, given data of the above form.  These models are called
+dependent mixture models because one of the authors (Ingmar Visser)
+thought it was time for a new name for these models\footnote{Only later 
+did I find out that \citet{Leroux1992} already coined the term dependent 
+mixture models in an application with hidden Markov mixtures of Poisson 
+count data.}
+
+The dependent mixture model is defined by the following elements:
+\begin{enumerate}
+	
+	\item a set $\vc{S}$ of latent classes or states $S_{i},\, i=1,
+	\ldots , n$,
+	
+	\item matrices $\mat{A}_t$ of transition probabilities $a_{ij,t}$ for
+	the transition from state $S_{i}$ to state $S_{j}$ at time $t$,
+	
+	\item a set $\vc{B}_t$ of observation functions $b_j^k(\cdot)$ that
+	provide the conditional probabilities of observations $O_{t}^k$ 
+	associated with latent state $S_{j}$,
+	
+	\item a vector $\pmb{\pi}$ of latent state initial probabilities
+	$\pi_{i}$
+\end{enumerate}
+When transitions are added to the latent class model, it is more 
+appropriate to refer to the classes as states. The word class is 
+rather more associated with a stable trait-like attribute whereas a 
+state can change over time. 
+
+Need model equations: maybe from Fruhwirth-Schnatter?
+
+
 \subsection{Likelihood}
 
-Give some refs to computing the likelihood
+The log-likelihood of hidden Markov models is usually computed by the
+so-called forward-backward algorithm \citep{Baum1966,Rabiner1989}, or
+rather by the forward part of this algorithm.  \cite{Lystig2002}
+changed the forward algorithm in such a way as to allow computing the
+gradients of the log-likelihood at the same time.  They start by
+rewriting the likelihood as follows (for ease of exposition the
+dependence on the model parameters is dropped here):
+\begin{equation}
+	L_{T} = Pr(\vc{O}_{1}, \ldots, \vc{O}_{T}) = \prod_{t=1}^{T} 
+Pr(\vc{O}_{t}|\vc{O}_{1}, 
+	\ldots, \vc{O}_{t-1}), 
+	\label{condLike}
+\end{equation}
+where $Pr(\vc{O}_{1}|\vc{O}_{0}):=Pr(\vc{O}_{1})$. Note that for a 
+simple, i.e.\ observed, Markov chain these probabilities reduce to 
+$Pr(\vc{O}_{t}|\vc{O}_{1},\ldots, 
+\vc{O}_{t-1})=Pr(\vc{O}_{t}|\vc{O}_{t-1})$.
+The log-likelihood can now be expressed as:
+\begin{equation}
+	l_{T} = \sum_{t=1}^{T} \log[Pr(\vc{O}_{t}|\vc{O}_{1}, \ldots, 
+\vc{O}_{t-1})].
+	\label{eq:condLogl}
+\end{equation}
 
+To compute the log-likelihood, \cite{Lystig2002} define the following 
+(forward) recursion:
+\begin{align}
+	\phi_{1}(j) &:= Pr(\vc{O}_{1}, S_{1}=j) = \pi_{j} b_{j}(\vc{O}_{1}) 
+	\label{eq:fwd1} \\
+\begin{split}
+	\phi_{t}(j) &:= Pr(\vc{O}_{t}, S_{t}=j|\vc{O}_{1}, \ldots, 
+\vc{O}_{t-1}) \\
+	&= \sum_{i=1}^{N} [\phi_{t-1}(i)a_{ij}b_{j}(\vc{O}_{t})] \times 
+(\Phi_{t-1})^{-1},
+	\label{eq:fwdt} 
+\end{split} 
+\end{align}
+where $\Phi_{t}=\sum_{i=1}^{N} \phi_{t}(i)$. Combining 
+$\Phi_{t}=Pr(\vc{O}_{t}|\vc{O}_{1}, \ldots, \vc{O}_{t-1})$, and 
+equation~(\ref{eq:condLogl}) gives the following expression for the 
+log-likelihood:
+\begin{equation}
+	l_{T} = \sum_{t=1}^{T} \log \Phi_{t}.
+	\label{eq:logl}
+\end{equation}
+
+
 \subsection{Parameter estimation}
 
-Give some refs to EM/forward/backbward and possibly others.
+Parameters are estimated in \pkg{depmixS4} using the EM algorithm or
+through the use of a general Newton-Raphson optimizer.  The EM
+algorithm however has some drawbacks.  First, it can be slow to
+converge towards the end of optimization (although it is usually
+faster than direct optimization at the start, so possibly a
+combination of EM and direct optimization is fastest).  Second,
+applying constraints to parameters can be problematic; in particular,
+EM can lead to wrong parameter estimates when applying constraints.
+Hence, in \pkg{depmixS4}, EM is used by default in unconstrained
+models, but otherwise, direct optimization is done using \pkg{Rdonlp2}
+\cite{Tamura2007,Spellucci2002}, because it handles general linear
+(in)equality constraints, and optionally also non-linear constraints.
 
+Need some more on EM and how/why it is justified to do separate weighted
+fits of the response models and transition and prior models. 
 
+Also mention use of glm, nnet and possibly other packages that we use. 
+
+
 \section{Using \pkg{depmixS4}}
 
-Explain the three steps involved in fitting a model. 
+Two steps are involved in using \pkg{depmixS4} which are illustrated
+below with examples:
+\begin{enumerate}
+	\item model specification with function \code{depmix}
+	
+	\item  model fitting with function \code{fit}
+\end{enumerate}
 
+\subsection{Example data: speed}
 
-\subsection{Start-up example}
+Throughout this manual a data set called \code{speed} is used.  It
+consists of three time series with three variables: reaction time,
+accuracy, and a covariate Pacc which defines the relative pay-off for
+speeded and accurate responding. The participant in this experiment 
+switches between fast responding at chance level and relatively 
+slower responding at a high level of accuracy. 
 
-Just the RT data.
+Interesting hypotheses to test are: is the switching regime symmetric?
+Is there evidence for two states or does one state suffice?  Is the
+guessing state actually a guessing state, i.e., is the probability
+of a correct response at chance level (0.5)?
 
 
+\subsection{A simple example}
+
+A dependent mixture model is defined by the number of states, and by
+the response distribution functions, and can be created with the
+\code{depmix}-function as follows (see the help files for other
+options):
+
+\begin{verbatim} 
+set.seed(1)
+mod <- depmix(rt~1, data=speed, nstates=2, trstart=runif(4))
+fm <- fit(mod)
+\end{verbatim}
+
+The function returns an object of class \code{depmix.fitted} which
+extends the \code{depmix} class, adding convergence information (and
+information about constraints if these were applied, see section on Constraints below).  The
+\code{print} method provides summary information on convergence, the log likelihood
+and the AIC and BIC values. These statistics may be extracted using  \code{logLik},
+\code{AIC} and \code{BIC}, respectively. 
+
+\begin{verbatim}
+> fm 
+Convergence info: Log likelihood converged to within tol. 
+'log Lik.' -84.34175 (df=7)
+AIC:  182.6835 
+BIC:  211.275 
+\end{verbatim}
+
+The \code{summary} method of fitted models provides the parameter estimates, first for
+the prior probabilities model, second for the transition model, and third for the response models. 
+
+\begin{verbatim}
+> summary(fm)
+Initial state probabilties model 
+Model of type multinomial, formula: ~1
+Coefficients: 
+     [,1]      [,2]
+[1,]    0 -11.25688
+Probalities at zero values of the covariates.
+0.999987 1.291797e-05 
+
+Transition model for state (component) 1 
+Model of type multinomial, formula: ~1
+Coefficients: 
+[1]  0.000000 -2.392455
+Probalities at zero values of the covariates.
+0.9162501 0.08374986 
+
+Transition model for state (component) 2 
+Model of type multinomial, formula: ~1
+Coefficients: 
+[1] 0.000000 2.139255
+Probalities at zero values of the covariates.
+0.1053396 0.8946604 
+
+Response model(s) for state 1 
+
+Response model for response 1 
+Model of type gaussian, formula: rt ~ 1
+Coefficients: 
+[1] 6.385492
+sd  0.2439376 
+
+Response model(s) for state 2 
+
+Response model for response 1 
+Model of type gaussian, formula: rt ~ 1
+Coefficients: 
+[1] 5.511151
+sd  0.1926063 
+\end{verbatim}
+
+
 \subsection{Adding covariates on transition parameters}
 
+The transition matrix is parametrized as a list of multinomial
+logistic models.  The initial state probabilities are similarly
+parametrized as a multinomial logistic model.  Both models use a baseline
+category parametrization, meaning that the parameter for the base
+category is fixed at zero.  The default base category is the first
+state.  Hence, for example, for a 3-state model, the initial state
+probability model would have three parameters of which the first is
+fixed at zero and the other two are freely estimated.
 
-Give an example on how to do this.
+See \citet[see][p.\ 267 ff.]{Agresti2002} for multinomial logistic models and various
+parameterizations. 
 
+Covariates can be specified using a one-sided formula as in the
+following example:
 
+\begin{verbatim}
+set.seed(1)
+mod <- depmix(list(rt~1,corr~1), data=speed, nstates=2, family=list(gaussian(), multinomial()),
+transition=~Pacc, instart=runif(2))
+fm <- fit(mod)
+\end{verbatim}
+
+
+
+
+
 \subsection{Add covariates to prior model}
 
-Give an example with the balance data. 
+Add example of balance data. 
 
+Note that this can be done for the initial state probabilities by
+specifying \code{prior=~X1}, where X1 is the desired covariate.  The result
+of this is that the transition probabilities are now dependent on the
+covariate Pacc (which is an experimenter controlled variable to induce
+switching between guessing and accurate responding). 
+
+\begin{verbatim}
+data(balance)
+set.seed(1)
+mod <- mix(list(d1~1,d2~1,d3~1,d4~1), data=balance, nstates=2,
+        family=list(multinomial(),multinomial(),multinomial(),multinomial()),
+        respstart=runif(16))
+fm <- fit(mod)
+\end{verbatim}
+
+
+
+
+
+\subsection{Fixing and constraining parameters}
+
+Using package \pkg{Rdonlp2}, parameters may be fitted subject to 
+general linear (in-)equality constraints. In particular, the following
+constraints may be applied: 
+
+
+
+Constraining and fixing parameters is done using the \code{conpat}
+argument to the \code{depmix.fit}-function, which specifies for each
+parameter in the model whether it's fixed (0) or free (1 or higher).
+Equality constraints can be imposed by having two parameters have the
+same number in the \code{conpat} vector.  When only fixed values are
+required the \code{fixed} argument can be used instead of
+\code{conpat}, with zeroes for fixed parameters and other values (ones
+e.g.) for non-fixed parameters.  Fitting the models subject to these
+constraints is handled by the optimization routine \code{donlp2}.
+
+\paragraph{Parameter numbering} When using the \code{conpat} and
+\code{fixed} arguments, complete parameter vectors should be supplied,
+i.e., these vectors should have length of the number of parameters of
+the model, which can be obtained by calling \code{npar(object)}.
+Parameters are numbered in the following order:
+\begin{enumerate}
+	\item  the prior model parameters
+
+	\item  the parameters for the transition models
+	
+	\item  the response model parameters per state (and subsequently
+	per response in the case of multivariate time series)
+
+\end{enumerate}
+
+To see the ordering of parameters use the following:
+\begin{verbatim}
+mod <- setpars(mod, value=1:npar(mod))mod
+\end{verbatim}
+
+To see which parameters are fixed (by default only baseline parameters
+in the multinomial logistic models for the transition models and the
+initial state probabilities model:
+\begin{verbatim}
+mod <- setpars(mod,
+getpars(mod,which="fixed"))mod
+\end{verbatim}
+
+
+
 \section{Extending \pkg{depmixS4}}
 
 Introduce the exgaus example. 
@@ -115,12 +472,16 @@
 Explain the use of makeDepmix for having full control of every aspect of the model. 
 
 
+
+
 \section{Conclusion \& future work}
 
 
 What are our next plans?
 
+Adding gradients for speed and computation of the Hessian. 
 
+
 \bibliography{all,ingmar}