[Depmix-commits] r291 - papers/jss
noreply at r-forge.r-project.org
noreply at r-forge.r-project.org
Thu Jul 9 16:49:29 CEST 2009
Author: ingmarvisser
Date: 2009-07-09 16:49:26 +0200 (Thu, 09 Jul 2009)
New Revision: 291
Modified:
papers/jss/article.tex
Log:
Filled in most sections of the paper, added first 2 examples.
Modified: papers/jss/article.tex
===================================================================
--- papers/jss/article.tex 2009-07-08 15:25:51 UTC (rev 290)
+++ papers/jss/article.tex 2009-07-09 14:49:26 UTC (rev 291)
@@ -54,6 +54,7 @@
E-mail: \email{i.visser at uva.nl} \\
URL: \url{http://www.ingmar.org/}
}
+
%% It is also possible to add a telephone and fax number
%% before the e-mail in the following format:
%% Telephone: +43/1/31336-5053
@@ -64,6 +65,7 @@
%% end of declarations %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\batchmode
\begin{document}
@@ -73,41 +75,396 @@
\section[]{Introduction}
%% Note: If there is markup in \(sub)section, then it has to be escape as above.
-Why did we develop depmixS4?
+Markov and latent Markov models are frequently used in the social
+sciences, in different areas and applications. In psychology, they
+are used for modelling learning processes, see \citet{Wickens1982},
+for an overview, and \citet{Schmittmann2006} for a recent application.
+In economics, latent Markov models are commonly used as regime
+switching models, see e.g.\ \citet{Kim1994} and \citet{Ghysels1994}.
+Further applications include speech recognition \citep{Rabiner1989},
+EEG analysis \citep{Rainer2000}, and genetics \citep{Krogh1998}. In
+those latter areas of application, latent Markov models are usually
+referred to as hidden Markov models.
+The \pkg{depmixS4} package was motivated by the fact that Markov models
+are used commonly in the social sciences, but no comprehensive package
+was available for fitting such models. Common programs for Markovian
+models include Panmark \citep{Pol1996}, and for latent class models
+Latent Gold \citep{Vermunt2003}. Those programs are lacking a number
+of important features, besides not being freely available. There are
+currently some packages in R that handle hidden Markov models but they
+lack a number of features that we needed in our research. In
+particular, \pkg{depmixS4} was designed to meet the following goals:
+\begin{enumerate}
+
+ \item to be able to handle parameter estimates subject to general
+ linear (in)equality constraints
+
+ \item to be able to fit transition models with covariates, i.e.,
+ to have time-dependent transition matrices
+
+ \item to be able to include covariates in the prior or initial
+ state probabilities of models
+
+ \item to allow for easy extensibility, in particular, to be able
+ to add new response distributions, both univariate and
+ multivariate, and similarly to be able to allow for the addition
+ of other transition models, e.g., continuous time observation
+ models
+
+\end{enumerate}
+
+Although \pkg{depmixS4} is designed to deal with
+longitudinal or time series data, for say $T>100$, it can also handle
+the limit case with $T=1$. In those cases, there are no time
+dependencies between observed data, and the model reduces to a finite
+mixture model, or a latent class model. Although there are other
+specialized packages to deal with mixture data, one specific feature
+that we needed ourselves which is to the best of our knowledge not
+available in other packages is the possibility to include covariates
+on the prior probabilities of class membership. In the next section,
+an outline is provided of the model and the likelihood equations.
+
+
+\subsection*{Acknowledgements}
+
+Ingmar Visser was supported by an EC Framework 6 grant, project 516542
+(NEST). Maarten Speekenbrink was supported by the ESRC Centre for
+Economic Learning and Social Evolution (ELSE). Han van der Maas
+provided the speed-accuracy data \cite{Maas2005b} and thereby
+neccessitated implementing models with time-dependent covariates.
+Brenda Jansen provided the balance scale data set \citep{Jansen2001}
+which was the perfect opportunity to test the covariates on the prior
+model parameters. The examples in the help files use both of these
+data sets.
+
+
\section{The dependent mixture model}
-Describe the model formulae
+The data considered here, has the general form $O_{1}^{1}, \ldots,
+O_{1}^{m}$, $O_{2}^{1}, \ldots, O_{2}^{m}$, \ldots, $O_{T}^{1}, \ldots,
+O_{T}^{m}$ for an $m$-variate time series of length $T$. As an
+example, consider a time series of responses generated by a single
+subject in a reaction time experiment. The data consists of three
+variables, reaction time, accuracy and a covariate which is a pay-off
+factor which determines the reward for speed and accuracy. These
+variables are measured on 168, 134 and 137 occasions respectively
+(in Figure~\ref{fig:speed} the first part of this series is plotted).
+\begin{figure}[htbp]
+ \begin{center}
+ \scalebox{1}{\includegraphics*{graphs/speed1.pdf}}
+ \caption{Reaction times, accuracy and pay-off values for
+ the first series of responses in dataset \texttt{speed}.}
+ \label{fig:speed}
+ \end{center}
+\end{figure}
+
+The latent Markov model is commonly associated with data of this type,
+albeit usually only multinomial variables are considered. However,
+common estimation procedures, such as those implemented in
+\citet{Pol1996} are not suitable for long time series due to underflow
+problems. In contrast, the hidden Markov model is typically only used
+for `long' univariate time series. In the next sections, the
+likelihood and estimation procedure for the hidden Markov model is
+described, given data of the above form. These models are called
+dependent mixture models because one of the authors (Ingmar Visser)
+thought it was time for a new name for these models\footnote{Only later
+did I find out that \citet{Leroux1992} already coined the term dependent
+mixture models in an application with hidden Markov mixtures of Poisson
+count data.}
+
+The dependent mixture model is defined by the following elements:
+\begin{enumerate}
+
+ \item a set $\vc{S}$ of latent classes or states $S_{i},\, i=1,
+ \ldots , n$,
+
+ \item matrices $\mat{A}_t$ of transition probabilities $a_{ij,t}$ for
+ the transition from state $S_{i}$ to state $S_{j}$ at time $t$,
+
+ \item a set $\vc{B}_t$ of observation functions $b_j^k(\cdot)$ that
+ provide the conditional probabilities of observations $O_{t}^k$
+ associated with latent state $S_{j}$,
+
+ \item a vector $\pmb{\pi}$ of latent state initial probabilities
+ $\pi_{i}$
+\end{enumerate}
+When transitions are added to the latent class model, it is more
+appropriate to refer to the classes as states. The word class is
+rather more associated with a stable trait-like attribute whereas a
+state can change over time.
+
+Need model equations: maybe from Fruhwirth-Schnatter?
+
+
\subsection{Likelihood}
-Give some refs to computing the likelihood
+The log-likelihood of hidden Markov models is usually computed by the
+so-called forward-backward algorithm \citep{Baum1966,Rabiner1989}, or
+rather by the forward part of this algorithm. \cite{Lystig2002}
+changed the forward algorithm in such a way as to allow computing the
+gradients of the log-likelihood at the same time. They start by
+rewriting the likelihood as follows (for ease of exposition the
+dependence on the model parameters is dropped here):
+\begin{equation}
+ L_{T} = Pr(\vc{O}_{1}, \ldots, \vc{O}_{T}) = \prod_{t=1}^{T}
+Pr(\vc{O}_{t}|\vc{O}_{1},
+ \ldots, \vc{O}_{t-1}),
+ \label{condLike}
+\end{equation}
+where $Pr(\vc{O}_{1}|\vc{O}_{0}):=Pr(\vc{O}_{1})$. Note that for a
+simple, i.e.\ observed, Markov chain these probabilities reduce to
+$Pr(\vc{O}_{t}|\vc{O}_{1},\ldots,
+\vc{O}_{t-1})=Pr(\vc{O}_{t}|\vc{O}_{t-1})$.
+The log-likelihood can now be expressed as:
+\begin{equation}
+ l_{T} = \sum_{t=1}^{T} \log[Pr(\vc{O}_{t}|\vc{O}_{1}, \ldots,
+\vc{O}_{t-1})].
+ \label{eq:condLogl}
+\end{equation}
+To compute the log-likelihood, \cite{Lystig2002} define the following
+(forward) recursion:
+\begin{align}
+ \phi_{1}(j) &:= Pr(\vc{O}_{1}, S_{1}=j) = \pi_{j} b_{j}(\vc{O}_{1})
+ \label{eq:fwd1} \\
+\begin{split}
+ \phi_{t}(j) &:= Pr(\vc{O}_{t}, S_{t}=j|\vc{O}_{1}, \ldots,
+\vc{O}_{t-1}) \\
+ &= \sum_{i=1}^{N} [\phi_{t-1}(i)a_{ij}b_{j}(\vc{O}_{t})] \times
+(\Phi_{t-1})^{-1},
+ \label{eq:fwdt}
+\end{split}
+\end{align}
+where $\Phi_{t}=\sum_{i=1}^{N} \phi_{t}(i)$. Combining
+$\Phi_{t}=Pr(\vc{O}_{t}|\vc{O}_{1}, \ldots, \vc{O}_{t-1})$, and
+equation~(\ref{eq:condLogl}) gives the following expression for the
+log-likelihood:
+\begin{equation}
+ l_{T} = \sum_{t=1}^{T} \log \Phi_{t}.
+ \label{eq:logl}
+\end{equation}
+
+
\subsection{Parameter estimation}
-Give some refs to EM/forward/backbward and possibly others.
+Parameters are estimated in \pkg{depmixS4} using the EM algorithm or
+through the use of a general Newton-Raphson optimizer. The EM
+algorithm however has some drawbacks. First, it can be slow to
+converge towards the end of optimization (although it is usually
+faster than direct optimization at the start, so possibly a
+combination of EM and direct optimization is fastest). Second,
+applying constraints to parameters can be problematic; in particular,
+EM can lead to wrong parameter estimates when applying constraints.
+Hence, in \pkg{depmixS4}, EM is used by default in unconstrained
+models, but otherwise, direct optimization is done using \pkg{Rdonlp2}
+\cite{Tamura2007,Spellucci2002}, because it handles general linear
+(in)equality constraints, and optionally also non-linear constraints.
+Need some more on EM and how/why it is justified to do separate weighted
+fits of the response models and transition and prior models.
+Also mention use of glm, nnet and possibly other packages that we use.
+
+
\section{Using \pkg{depmixS4}}
-Explain the three steps involved in fitting a model.
+Two steps are involved in using \pkg{depmixS4} which are illustrated
+below with examples:
+\begin{enumerate}
+ \item model specification with function \code{depmix}
+
+ \item model fitting with function \code{fit}
+\end{enumerate}
+\subsection{Example data: speed}
-\subsection{Start-up example}
+Throughout this manual a data set called \code{speed} is used. It
+consists of three time series with three variables: reaction time,
+accuracy, and a covariate Pacc which defines the relative pay-off for
+speeded and accurate responding. The participant in this experiment
+switches between fast responding at chance level and relatively
+slower responding at a high level of accuracy.
-Just the RT data.
+Interesting hypotheses to test are: is the switching regime symmetric?
+Is there evidence for two states or does one state suffice? Is the
+guessing state actually a guessing state, i.e., is the probability
+of a correct response at chance level (0.5)?
+\subsection{A simple example}
+
+A dependent mixture model is defined by the number of states, and by
+the response distribution functions, and can be created with the
+\code{depmix}-function as follows (see the help files for other
+options):
+
+\begin{verbatim}
+set.seed(1)
+mod <- depmix(rt~1, data=speed, nstates=2, trstart=runif(4))
+fm <- fit(mod)
+\end{verbatim}
+
+The function returns an object of class \code{depmix.fitted} which
+extends the \code{depmix} class, adding convergence information (and
+information about constraints if these were applied, see section on Constraints below). The
+\code{print} method provides summary information on convergence, the log likelihood
+and the AIC and BIC values. These statistics may be extracted using \code{logLik},
+\code{AIC} and \code{BIC}, respectively.
+
+\begin{verbatim}
+> fm
+Convergence info: Log likelihood converged to within tol.
+'log Lik.' -84.34175 (df=7)
+AIC: 182.6835
+BIC: 211.275
+\end{verbatim}
+
+The \code{summary} method of fitted models provides the parameter estimates, first for
+the prior probabilities model, second for the transition model, and third for the response models.
+
+\begin{verbatim}
+> summary(fm)
+Initial state probabilties model
+Model of type multinomial, formula: ~1
+Coefficients:
+ [,1] [,2]
+[1,] 0 -11.25688
+Probalities at zero values of the covariates.
+0.999987 1.291797e-05
+
+Transition model for state (component) 1
+Model of type multinomial, formula: ~1
+Coefficients:
+[1] 0.000000 -2.392455
+Probalities at zero values of the covariates.
+0.9162501 0.08374986
+
+Transition model for state (component) 2
+Model of type multinomial, formula: ~1
+Coefficients:
+[1] 0.000000 2.139255
+Probalities at zero values of the covariates.
+0.1053396 0.8946604
+
+Response model(s) for state 1
+
+Response model for response 1
+Model of type gaussian, formula: rt ~ 1
+Coefficients:
+[1] 6.385492
+sd 0.2439376
+
+Response model(s) for state 2
+
+Response model for response 1
+Model of type gaussian, formula: rt ~ 1
+Coefficients:
+[1] 5.511151
+sd 0.1926063
+\end{verbatim}
+
+
\subsection{Adding covariates on transition parameters}
+The transition matrix is parametrized as a list of multinomial
+logistic models. The initial state probabilities are similarly
+parametrized as a multinomial logistic model. Both models use a baseline
+category parametrization, meaning that the parameter for the base
+category is fixed at zero. The default base category is the first
+state. Hence, for example, for a 3-state model, the initial state
+probability model would have three parameters of which the first is
+fixed at zero and the other two are freely estimated.
-Give an example on how to do this.
+See \citet[see][p.\ 267 ff.]{Agresti2002} for multinomial logistic models and various
+parameterizations.
+Covariates can be specified using a one-sided formula as in the
+following example:
+\begin{verbatim}
+set.seed(1)
+mod <- depmix(list(rt~1,corr~1), data=speed, nstates=2, family=list(gaussian(), multinomial()),
+transition=~Pacc, instart=runif(2))
+fm <- fit(mod)
+\end{verbatim}
+
+
+
+
+
\subsection{Add covariates to prior model}
-Give an example with the balance data.
+Add example of balance data.
+Note that this can be done for the initial state probabilities by
+specifying \code{prior=~X1}, where X1 is the desired covariate. The result
+of this is that the transition probabilities are now dependent on the
+covariate Pacc (which is an experimenter controlled variable to induce
+switching between guessing and accurate responding).
+
+\begin{verbatim}
+data(balance)
+set.seed(1)
+mod <- mix(list(d1~1,d2~1,d3~1,d4~1), data=balance, nstates=2,
+ family=list(multinomial(),multinomial(),multinomial(),multinomial()),
+ respstart=runif(16))
+fm <- fit(mod)
+\end{verbatim}
+
+
+
+
+
+\subsection{Fixing and constraining parameters}
+
+Using package \pkg{Rdonlp2}, parameters may be fitted subject to
+general linear (in-)equality constraints. In particular, the following
+constraints may be applied:
+
+
+
+Constraining and fixing parameters is done using the \code{conpat}
+argument to the \code{depmix.fit}-function, which specifies for each
+parameter in the model whether it's fixed (0) or free (1 or higher).
+Equality constraints can be imposed by having two parameters have the
+same number in the \code{conpat} vector. When only fixed values are
+required the \code{fixed} argument can be used instead of
+\code{conpat}, with zeroes for fixed parameters and other values (ones
+e.g.) for non-fixed parameters. Fitting the models subject to these
+constraints is handled by the optimization routine \code{donlp2}.
+
+\paragraph{Parameter numbering} When using the \code{conpat} and
+\code{fixed} arguments, complete parameter vectors should be supplied,
+i.e., these vectors should have length of the number of parameters of
+the model, which can be obtained by calling \code{npar(object)}.
+Parameters are numbered in the following order:
+\begin{enumerate}
+ \item the prior model parameters
+
+ \item the parameters for the transition models
+
+ \item the response model parameters per state (and subsequently
+ per response in the case of multivariate time series)
+
+\end{enumerate}
+
+To see the ordering of parameters use the following:
+\begin{verbatim}
+mod <- setpars(mod, value=1:npar(mod))mod
+\end{verbatim}
+
+To see which parameters are fixed (by default only baseline parameters
+in the multinomial logistic models for the transition models and the
+initial state probabilities model:
+\begin{verbatim}
+mod <- setpars(mod,
+getpars(mod,which="fixed"))mod
+\end{verbatim}
+
+
+
\section{Extending \pkg{depmixS4}}
Introduce the exgaus example.
@@ -115,12 +472,16 @@
Explain the use of makeDepmix for having full control of every aspect of the model.
+
+
\section{Conclusion \& future work}
What are our next plans?
+Adding gradients for speed and computation of the Hessian.
+
\bibliography{all,ingmar}
More information about the depmix-commits
mailing list