[Depmix-commits] r301 - papers/jss
noreply at r-forge.r-project.org
noreply at r-forge.r-project.org
Tue Jul 14 15:45:52 CEST 2009
Author: ingmarvisser
Date: 2009-07-14 15:45:49 +0200 (Tue, 14 Jul 2009)
New Revision: 301
Modified:
papers/jss/article.tex
Log:
Yet another version of the jss paper, now with conclusions as well.
Modified: papers/jss/article.tex
===================================================================
--- papers/jss/article.tex 2009-07-14 10:29:30 UTC (rev 300)
+++ papers/jss/article.tex 2009-07-14 13:45:49 UTC (rev 301)
@@ -26,17 +26,17 @@
%% an abstract and keywords
\Abstract{
- \pkg{depmixS4} implements a general framework for definining and
- estimating hidden Markov mixture model in the R programming language
- \citep{R2009}. This includes standard Markov models,
- latent/hidden Markov models, and latent class and finite mixture
- distribution models. The models can be fitted on mixed
- multivariate data with multinomial and/or gaussian distributions.
- Parameters can be estimated subject to general linear constraints.
- Parameter estimation is done through an EM algorithm or by a
- direct optimization approach using the \pkg{Rdonlp2} optimization
- routine when contraints are imposed on the parameters. A number
- of illustrative examples are included.
+
+ \pkg{depmixS4} implements a general framework for defining and estimating
+ dependent mixture models in the \proglang{R} programming language
+ \citep{R2009}. This includes standard Markov models, latent/hidden Markov
+ models, and latent class and finite mixture distribution models. The models can
+ be fitted on mixed multivariate data with distributions from the \code{glm}
+ family, the logistic multinomial, or the multivariate normal distribution. Other
+ distributions can be added easily, and an example is provided. Parameter
+ estimation is done through an EM algorithm or by a direct optimization approach using
+ the\pkg{Rdonlp2} optimization routine when contraints are imposed on the parameters.
+ Parameters can be estimated subject to general linear constraints.
}
\Keywords{hidden Markov model, dependent mixture model, mixture model}
@@ -100,7 +100,8 @@
Further applications include speech recognition \citep{Rabiner1989},
EEG analysis \citep{Rainer2000}, and genetics \citep{Krogh1998}. In
those latter areas of application, latent Markov models are usually
-referred to as hidden Markov models.
+referred to as hidden Markov models. See e.g.\ \citet[][chapter~1]{Cappe2005}
+for an overview of different applications.
The \pkg{depmixS4} package was motivated by the fact that Markov
models are used commonly in the social sciences, but no comprehensive
@@ -112,6 +113,7 @@
models but they lack a number of features that we needed in our
research. In particular, \pkg{depmixS4} was designed to meet the
following goals:
+
\begin{enumerate}
\item to be able to handle parameter estimates subject to general
@@ -139,8 +141,8 @@
deal with mixture data, one specific feature that we needed ourselves
which is to the best of our knowledge not available in other packages
is the possibility to include covariates on the prior probabilities of
-class membership. In the next section, an outline is provided of the
-model and the likelihood equations.
+class membership and the transition probabilities. In the next section,
+an outline is provided of the model and the likelihood equations.
\section{The dependent mixture model}
@@ -149,34 +151,37 @@
O_{1}^{m}$, $O_{2}^{1}, \ldots, O_{2}^{m}$, \ldots, $O_{T}^{1},
\ldots, O_{T}^{m}$ for an $m$-variate time series of length $T$. As
an example, consider a time series of responses generated by a single
-subject in a reaction time experiment. The data consists of three
-variables, reaction time, accuracy and a covariate which is a pay-off
-factor which determines the reward for speed and accuracy. These
+participant in a response time experiment. The data consists of three
+variables, response time, accuracy and a covariate which is a pay-off
+determining the relative reward for speeded and/or accuracy responding. These
variables are measured on 168, 134 and 137 occasions respectively (in
-Figure~\ref{fig:speed} the first part of this series is plotted).
+Figure~\ref{fig:speed} the first part of this series is plotted). These data
+are more fully described in \citet{Dutilh2009}.
\begin{figure}[htbp]
\begin{center}
\scalebox{1.2}{\includegraphics*{graphs/speed1.pdf}}
- \caption{Response times, accuracy and pay-off values for
- the first series of responses in dataset \texttt{speed}.}
+ \caption{Response times (rt), accuracy (corr) and pay-off values (Pacc) for
+ the first series of responses in dataset \code{speed}.}
\label{fig:speed}
\end{center}
\end{figure}
The latent Markov model is commonly associated with data of this type,
-although usually only multinomial variables are considered. However,
+although usually only multinomial response variables are considered. However,
common estimation procedures, such as those implemented in
\citet{Pol1996} are not suitable for long time series due to underflow
problems. In contrast, the hidden Markov model is typically only used
-for `long' univariate time series. In the next sections, the
-likelihood and estimation procedure for the hidden Markov model is
-described, given data of the above form. These models are called
+for `long' univariate time series
+\cite[see e.g.][, chapter~1 for an overview of examples]{Cappe2005}.
+In the next sections, the
+likelihood and estimation procedures for the dependent mixture model is
+described for data of the above form. These models are called
dependent mixture models because one of the authors (Ingmar Visser)
thought it was time for a new name for these models\footnote{Only
later did I find out that \citet{Leroux1992} already coined the term
dependent mixture models in an application with hidden Markov mixtures
-of Poisson count data.}
+of Poisson count data.}.
The dependent mixture model is defined by the following elements:
\begin{enumerate}
@@ -184,38 +189,33 @@
\item a set $\vc{S}$ of latent classes or states $S_{i},\, i=1,
\ldots , n$,
- \item matrices $\mat{A}_t$ of transition probabilities $a_{ij,t}$ for
- the transition from state $S_{i}$ to state $S_{j}$ at time $t$,
+ \item matrices $\mat{A}$ of transition probabilities $a_{ij}$ for
+ the transition from state $S_{i}$ to state $S_{j}$,
- \item a set $\vc{B}_t$ of observation functions $b_j^k(\cdot)$ that
- provide the conditional probabilities of observations $O_{t}^k$
+ \item a set $\vc{B}$ of observation density functions $b_j^k(\cdot)$ that
+ provide the conditional densities of observations $O_{t}^k$
associated with latent state $S_{j}$,
\item a vector $\pmb{\pi}$ of latent state initial probabilities
$\pi_{i}$
\end{enumerate}
-When transitions are added to the latent class model, it is more
-appropriate to refer to the classes as states. The word class is
-rather more associated with a stable trait-like attribute whereas a
-state can change over time.
-The dependent mixture model is then given by the following equations:
+The dependent mixture model is defined by the following equations:
\begin{align}
- S_{t} = A S_{t-1}, t=2, \ldots, T \\
- O_{t} = f(O_{t}|S_{t}),
+ S_{t} &= A S_{t-1}, t=2, \ldots, T \\
+ O_{t} &= b(O_{t}|S_{t}),
\end{align}
where $S_{t}$ is a sequence of hidden states, $A$ is a transition
-matrix, $O_{t}$ is an (possibly multivariate) observation and $f$ is a
+matrix, $O_{t}$ is an (possibly multivariate) observation and $b$ is a
density function for $O_{t}$ conditional on the hidden state $S_{t}$.
-In the example data above, $f$ could be a gaussian distribution
+In the example data above, $b$ could be a gaussian distribution
function for the response time variable, and a Bernouilli for the
accuracy data. In the models we are considering here, both the
-transition probabilities $A$ and the initial state probabilities $\pi$
-may depend on covariates as well as the response distributions $f$.
+transition probabilities $\mat{A}$ and the initial state probabilities $\pi$
+may depend on covariates as well as the response distributions $\vc{B}$.
See for example \citet{Fruhwirth2006} for an overview of
hidden Markov models with extensions.
-
\subsection{Likelihood}
The log-likelihood of hidden Markov models is usually computed by the
@@ -268,18 +268,17 @@
\subsection{Parameter estimation}
Parameters are estimated in \pkg{depmixS4} using the EM algorithm or
-through the use of a general Newton-Raphson optimizer.
-
-In the EM algorithm, parameters are estimated by iteratively maximising the
+through the use of a general Newton-Raphson optimizer. In the EM algorithm,
+parameters are estimated by iteratively maximising the
expected joint likelihood of the parameters given the observations and states.
Let $\greekv{\theta} = (\greekv{\theta}_1, \greekv{\theta}_2,\greekv{\theta}_3)$
be the general parameter vector consisting of three subvectors with parameters
for the prior model, transition model, and response model respectively. The
joint log likelihood can be written as
\begin{equation}
-\log Pr(O_{1:T}, S_{1:T}|\greekv{\theta}) = \log Pr(S_1|\greekv{\theta}_1)
-+ \sum_{t=2}^{T} \log Pr(S_t|S_{t-1},\greekv{\theta}_2)
-+ \sum_{t=1}^{T} \log Pr(O_t|S_t,\greekv{\theta}_3)
+\log \Prob(O_{1:T}, S_{1:T}|\greekv{\theta}) = \log \Prob(S_1|\greekv{\theta}_1)
++ \sum_{t=2}^{T} \log \Prob(S_t|S_{t-1},\greekv{\theta}_2)
++ \sum_{t=1}^{T} \log \Prob(O_t|S_t,\greekv{\theta}_3)
\end{equation}
This likelihood depends on the unobserved states $S_t$. In the Expectation step,
we replace these with their expected values given a set of (initial) parameters
@@ -287,17 +286,17 @@
and observations $O_{1:T}$. The expected log likelihood
\begin{equation}
Q(\greekv{\theta},\greekv{\theta}') = E_{\greekv{\theta}'}
-(\log Pr(O_{1:T},S_{1:T}|O_{1:T},\greekv{\theta}))
+(\log \Prob(O_{1:T},S_{1:T}|O_{1:T},\greekv{\theta}))
\end{equation}
can be written as
%\begin{equation}
\begin{multline}
\label{eq:Q}
Q(\greekv{\theta},\greekv{\theta}') =
-\sum_{j=1}^n \gamma_1(j) \log Pr(S_1=j|\greekv{\theta}_1) \\
-+ \sum_{t=2}^T \sum_{j=1}^M \sum_{k=1}^n \xi_t^i(j,k) \log Pr(S_t = k|S_{t-1}
+\sum_{j=1}^n \gamma_1(j) \log \Prob(S_1=j|\greekv{\theta}_1) \\
++ \sum_{t=2}^T \sum_{j=1}^M \sum_{k=1}^n \xi_t^i(j,k) \log \Prob(S_t = k|S_{t-1}
= j,\greekv{\theta}_2) \\ + \sum_{t=1}^T \sum_{j=1}^n \gamma_t^i(j)
-\ln Pr(O_t|S_t=j,\greekv{\theta}_3),
+\ln \Prob(O_t|S_t=j,\greekv{\theta}_3),
\end{multline}
%\end{equation}
where the expected values $\xi_t(j,k) = P(S_t = k, S_{t-1} = j|O_{1:T},\greekv{\theta}')$ and $\gamma_t(j) = P(S_t = j|O_{1:T},\greekv{\theta}')$ can be computed effectively by the Forward-Backward algorithm \citep[see e.g.,][]{Rabiner1989}. The Maximisation step consists of the maximisation of (\ref{eq:Q}) for $\greekv{\theta}$. As the r.h.s. of (\ref{eq:Q}) consists of three separate parts, we can maximise separately for $\greekv{\theta}_1$, $\greekv{\theta}_2$ and $\greekv{\theta}_3$. In common models, maximisation for $\greekv{\theta}_1$ and $\greekv{\theta}_2$ is performed by the \code{nnet.default} routine in \pkg{nnet}, and maximisation for $\greekv{\theta}_3$ by the \code{glm} routine.
@@ -315,14 +314,12 @@
The EM algorithm however has some drawbacks. First, it can be slow to
-converge towards the end of optimization (although it is usually
-faster than direct optimization at the start, so possibly a
-combination of EM and direct optimization is fastest). Second,
+converge towards the end of optimization. Second,
applying constraints to parameters can be problematic; in particular,
EM can lead to wrong parameter estimates when applying constraints.
Hence, in \pkg{depmixS4}, EM is used by default in unconstrained
models, but otherwise, direct optimization is done using \pkg{Rdonlp2}
-\cite{Tamura2007,Spellucci2002}, because it handles general linear
+\cite{Tamura2009,Spellucci2002}, because it handles general linear
(in)equality constraints, and optionally also non-linear constraints.
%Need some more on EM and how/why it is justified to do separate weighted
@@ -336,10 +333,15 @@
Two steps are involved in using \pkg{depmixS4} which are illustrated
below with examples:
\begin{enumerate}
- \item model specification with function \code{depmix}
+ \item model specification with function \code{depmix} (or with \code{mix}
+ for latent class and finite mixture models, see example below on adding
+ covariates to prior probabilities)
\item model fitting with function \code{fit}
\end{enumerate}
+We have separated the stages of model specification and model fitting because
+fitting large models can be fairly time-consuming and it is hence useful to be
+able to check the model specification before actually fitting the model.
\subsection{Example data: speed}
@@ -354,7 +356,7 @@
a guessing state, i.e., is the probability of a correct response at
chance level (0.5)?
-\subsection{A simple example}
+\subsection{A simple model}
A dependent mixture model is defined by the number of states, and by
the response distribution functions, and can be created with the
@@ -373,7 +375,8 @@
in this call using the \code{trstart} argument. The package does not
provide automatic starting values.
-The so-defined models needs to be \code{fit}ted with the following:
+The so-defined models needs to be \code{fit}ted with the following
+line of code:
\begin{CodeChunk}
\begin{CodeInput}
> fm <- fit(mod)
@@ -385,9 +388,7 @@
convergence information (and information about constraints if these
were applied, see below). The \code{print} method provides summary
information on convergence, the log likelihood and the AIC and BIC
-values. These statistics may be extracted using \code{logLik},
-\code{AIC} and \code{BIC}, respectively.
-
+values:
\begin{CodeChunk}
\begin{CodeInput}
> fm
@@ -399,6 +400,8 @@
BIC: 211.275
\end{CodeOutput}
\end{CodeChunk}
+These statistics may be extracted using \code{logLik},
+\code{AIC} and \code{BIC}, respectively.
The \code{summary} method of \code{fit}ted models provides the parameter
estimates, first for the prior probabilities model, second for the
@@ -450,15 +453,17 @@
\end{CodeChunk}
-\subsection{Transition parameters}
+\subsection{Covariates on transition parameters}
-The transition matrix is parametrized as a list of multinomial
-logistic models. The initial state probabilities are similarly
-parametrized as a multinomial logistic model. Both models use a
-baseline category parametrization, meaning that the parameter for the
+The transition probabilities and the initial state probabilities
+are parameterized using the multinomial
+logistic model. In particular, each row of the transition matrix is
+parameterized by a baseline category logistic multinomial,
+meaning that the parameter for the
base category is fixed at zero (see \citet[see][p.\ 267
ff.]{Agresti2002} for multinomial logistic models and various
-parameterizations). The default base category is the first state.
+parameterizations). See also \citet{Chung2007} for similar models, latent transition models using logistic regression on the transition parameters. They fit such models on repeated measurement
+data ($T=2$) using Bayesian methods. The default baseline category is the first state.
Hence, for example, for a 3-state model, the initial state probability
model would have three parameters of which the first is fixed at zero
and the other two are freely estimated.
@@ -467,20 +472,18 @@
one-sided formula as in the following example:
\begin{CodeChunk}
\begin{CodeInput}
-> set.seed(1)
-> mod <- depmix(rt~1, data=speed, nstates=2, family=gaussian(),
+set.seed(1)
+mod <- depmix(rt~1, data=speed, nstates=2, family=gaussian(),
transition=~scale(Pacc), instart=runif(2))
-> fm <- fit(mod)
+fm <- fit(mod)
\end{CodeInput}
+\end{CodeChunk}
+
+Applying \code{summary} to the \code{fit}ted object gives (only transition models
+printed here):
+\begin{CodeChunk}
\begin{CodeOutput}
-Initial state probabilties model
-Model of type multinomial, formula: ~1
-Coefficients:
- [,1] [,2]
-[1,] 0 10.71779
-Probalities at zero values of the covariates.
-2.214681e-05 0.9999779
-
+...
Transition model for state (component) 1
Model of type multinomial, formula: ~scale(Pacc)
Coefficients:
@@ -497,31 +500,15 @@
[1,] 0 2.471442
[2,] 0 3.570856
Probalities at zero values of the covariates.
-0.07788458 0.9221154
-
-Response model(s) for state 1
-
-Response model for response 1
-Model of type gaussian, formula: rt ~ 1
-Coefficients:
-[1] 5.512179
-sd 0.1921098
-
-Response model(s) for state 2
-
-Response model for response 1
-Model of type gaussian, formula: rt ~ 1
-Coefficients:
-[1] 6.3885
-sd 0.2402693
+0.07788458 0.9221154
+...
\end{CodeOutput}
\end{CodeChunk}
-
The summary provides all parameters of the model, also the
(redundant) zeroes for the base-line category in the multinomial model.
-The summary also prints the transition probabilities at the zero value
-of the covariate. Note that scaling of the covariate is useful in this
-regard as it makes interpretation of these intercept probabilities
+The summary also prints the transition probabilities
+at the zero value of the covariate. Note that scaling of the covariate
+is useful in this regard as it makes interpretation of these intercept probabilities
easier.
\subsection{Multivariate data}
@@ -535,15 +522,15 @@
\begin{verbatim}
set.seed(1)
mod <- depmix(list(rt~1,corr~1), data=speed, nstates=2,
- family=list(gaussian(),multinomial()), transition=~scale(Pacc),
- instart=runif(2))
+ family=list(gaussian(),multinomial()),
+ transition=~scale(Pacc),instart=runif(2))
fm <- fit(mod)
\end{verbatim}
which provides the following fitted model parameters (only the
response parameters are given here):
\begin{verbatim}
> summary(fm)
- \ldots
+ ...
Response model(s) for state 1
Response model for response 1
@@ -583,9 +570,8 @@
\subsection{Adding covariates on the prior probabilities}
-The \pkg{depmixS4} contains another data set which is used to
-illustrate the use of covariates on the prior probabilities of models,
-in this case a latent class model. The \code{balance} data consists
+To illustrate the use of covariates on the prior probabilities we have included
+another data set with \pkg{depmixS4}. The \code{balance} data consists
of 4 binary items (correct-incorrect) on a balance scale task
\citet{Siegler1981}. The data form a subset of the data published in
\citet{Jansen2002}.
@@ -616,6 +602,7 @@
[2,] 0 0.5512996
Probalities at zero values of the covariates.
0.9254119 0.07458815
+...
\end{verbatim}
Hence at young ages, children have a high probability of incorrect
responding in class~1, whereas the prior probability for class~2
@@ -624,22 +611,28 @@
\subsection{Fixing and constraining parameters}
-Using package \pkg{Rdonlp2}, parameters may be fitted subject to
+Using package \pkg{Rdonlp2} by \citet{Tamura2009}, parameters may be fitted subject to
general linear (in-)equality constraints. Constraining and fixing
parameters is done using the \code{conpat} argument to the
-\code{depmix.fit}-function, which specifies for each parameter in the
+\code{fit}-function, which specifies for each parameter in the
model whether it's fixed (0) or free (1 or higher). Equality
constraints can be imposed by having two parameters have the same
number in the \code{conpat} vector. When only fixed values are
-required the \code{fixed} argument can be used instead of
+required, the \code{fixed} argument can be used instead of
\code{conpat}, with zeroes for fixed parameters and other values (ones
e.g.) for non-fixed parameters. Fitting the models subject to these
constraints is handled by the optimization routine \code{donlp2}.
+To be able to construct the \code{conpat} and/or \code{fixed} vectors
+one needs the correct ordering of parameters which is briefly discussed
+next before proceeding with an example.
\paragraph{Parameter numbering} When using the \code{conpat} and
\code{fixed} arguments, complete parameter vectors should be supplied,
i.e., these vectors should have length of the number of parameters of
-the model, which can be obtained by calling \code{npar(object)}.
+the model, which can be obtained by calling \code{npar(object)}. Note that
+this is not the same as the degrees of freedom used e.g.\ in the \code{logLik}
+function because \code{npar} also counts the baseline category zeroes
+from the multinomial logistic models.
Parameters are numbered in the following order:
\begin{enumerate}
\item the prior model parameters
@@ -663,26 +656,30 @@
\end{verbatim}
When fitting constraints it is useful to have good starting values
-for the parameters and hence we first fit the following model:
+for the parameters and hence we first fit the following model without
+constraints:
\begin{verbatim}
trst=c(0.9,0.1,0,0,0.1,0.9,0,0)
mod <- depmix(list(rt~1,corr~1),data=speed,transition=~Pacc,
nstates=2,family=list(gaussian(),multinomial()),
trstart=trst,inst=c(.999,0.001))
-fm <- fit(mod)
+fm1 <- fit(mod)
\end{verbatim}
After this, we use the fitted values from this model to constrain the
-regression coefficients on the transition matrix:
+regression coefficients on the transition matrix (parameters numbers~6 and~10):
\begin{verbatim}
# start with fixed and free parameters
conpat <- c(0,1,rep(c(0,1),4),1,1,0,1,1,1,0,1)
# constrain the beta's on the transition parameters to be equal
conpat[6] <- conpat[10] <- 2
-fm <- fit(fm,equal=conpat)
+fm2 <- fit(fm,equal=conpat)
\end{verbatim}
Using \code{summary} on the fitted model shows that the regression
-coefficients are now estimated at the same value of 12.66.
-Note that these arguments provide the possibility for arbitrary
+coefficients are now estimated at the same value of 12.66. The function
+\code{llratio} computes the likelihood ratio $\chi^2$-statistic and the
+associated $p$-value with appropriate degrees of freedom for testing the
+tenability of constraints \citep{Dannemann2007}. Note that these arguments
+provide the possibility for arbitrary
constraints, also between, e.g., a multinomial regression coefficient
for the transition matrix and the mean of a gaussian response model.
Whether such constraints make sense is hence the responsibility of
@@ -732,8 +729,8 @@
Only the constructor function and the fit function are provided here
and the complete code is provided in the helpfile for \code{makeDepmix}.
-The \code{exgaus} example uses the \pkg{gamlss.distr} package for
-computing the \code{dens}ity and for fitting the parameters.
+The \code{exgaus} example uses the \pkg{gamlss} and \pkg{gamlss.distr} packages \cite{Stasinopoulos2009a,Stasinopoulos2009b} for
+computing the \code{dens}ity and for \code{fit}ting the parameters.
The constructor function is defined as:
\begin{verbatim}
@@ -777,7 +774,7 @@
)
\end{verbatim}
-The \code{fit} function defines a trivial \code{gamlss} model with
+The \code{fit} function defines a \code{gamlss} model with
only an intercept to be estimated and then sets the fitted parameters
back into their respective slots in the `exgaus' object. See the help
for \code{gamlss.distr} for interpretation of these parameters.
@@ -823,15 +820,24 @@
gaussian in one state and an exgaus distribution in another state.
+\section[Conclusions and future work]{Conclusions \& future work}
-\section[Conclusion and future work]{Conclusion \& future work}
+\pkg{depmixS4} provides a flexible framework for fitting dependent mixture models
+for a large variety of response distributions. It can also fit latent class regression
+and finite mixture models, although it should be noted that more specialized packages
+are available for doing so, e.g.\ \pkg{FlexMix} \cite{Leisch2004}. The package is intended
+for modeling of (individual) time series data with the aim of characterizing the transition
+processes underlying the data. The possibility to use covariates on the transition matrix
+greatly enhances the flexibility of the model. The EM algorithm uses a very general
+interface that allows easy addition of new response models.
+We are currently working on implementing the gradients for response and transition models
+with two goals in mind. First, to speed up (constrained) parameter optimization using
+\pkg{Rdonlp2}. Second, analytic gradients are useful in computing the Hessian of the
+estimated parameters so as to arrive at standard errors for those. We are also planning to
+implement goodness-of-fit statistics \citep{Titman2008}.
-What are our next plans?
-Adding gradients for speed and computation of the Hessian.
-
-
\section*{Acknowledgements}
Ingmar Visser was supported by an EC Framework 6 grant, project 516542
More information about the depmix-commits
mailing list