[Depmix-commits] r309 - papers/jss
noreply at r-forge.r-project.org
noreply at r-forge.r-project.org
Thu Aug 13 14:47:30 CEST 2009
Author: maarten
Date: 2009-08-13 14:47:29 +0200 (Thu, 13 Aug 2009)
New Revision: 309
Modified:
papers/jss/article.tex
Log:
- more changes to jss paper
Modified: papers/jss/article.tex
===================================================================
--- papers/jss/article.tex 2009-08-05 14:03:04 UTC (rev 308)
+++ papers/jss/article.tex 2009-08-13 12:47:29 UTC (rev 309)
@@ -98,26 +98,27 @@
Markov and latent Markov models are frequently used in the social
sciences, in different areas and applications. In psychology, they
-are used for modelling learning processes, see \citet{Wickens1982},
-for an overview, and e.g.\ \citet{Schmittmann2006} for a recent
-application. In economics, latent Markov models are so-called regime
+are used for modelling learning processes (see \citealp{Wickens1982},
+for an overview, and e.g. \citealp{Schmittmann2006}, for a recent
+application). In economics, latent Markov models are so-called regime
switching models (see e.g., \citealp{Kim1994} and
\citealp{Ghysels1994}). Further applications include speech
recognition \citep{Rabiner1989}, EEG analysis \citep{Rainer2000}, and
-genetics \citep{Krogh1998}. In those latter areas of application,
-latent Markov models are usually referred to as hidden Markov models.
-For more examples of applications, see e.g.\
-\citet[][chapter~1]{Cappe2005}.
+genetics \citep{Krogh1998}. In these latter areas of application,
+latent Markov models are usually referred to as hidden Markov models.
+See for example \citet{Fruhwirth2006} for an overview of hidden Markov models
+with extensions. Further examples of applications can be found in e.g.\
+\citet[][chapter~1]{Cappe2005}.
The \pkg{depmixS4} package was motivated by the fact that while Markov
models are used commonly in the social sciences, no comprehensive
-package was available for fitting such models. Common software for
+package was available for fitting such models. Existing software for
estimating Markovian models include Panmark \citep{Pol1996}, and for
-latent class models Latent Gold \citep{Vermunt2003}. Those programs
-are lacking a number of important features, besides not being freely
-available. There are currently some packages in \proglang{R} that
+latent class models Latent Gold \citep{Vermunt2003}. These programs
+lack a number of important features, besides not being freely
+available. There are currently some packages in \proglang{R} that
handle hidden Markov models but they lack a number of features that we
-needed in our research. In particular, \pkg{depmixS4} was designed to
+needed in our research. In particular, \pkg{depmixS4} was designed to
meet the following goals:
\begin{enumerate}
@@ -129,12 +130,11 @@
to have time-dependent transition matrices
\item to be able to include covariates in the prior or initial
- state probabilities of models
+ state probabilities
\item to be easily extensible, in particular, to allow users to
- easily add new uni- or multivariate response distributions, and
- similarly for the addition of other transition models, e.g.,
- continuous time observation models
+ easily add new uni- or multivariate response distributions and
+ new transition models, e.g., continuous time observation models
\end{enumerate}
@@ -142,26 +142,27 @@
series data, for say $T>100$, it can also handle the limit case when
$T=1$. In this case, there are no time dependencies between
observed data and the model reduces to a finite mixture or
-latent class model. Although there are other specialized packages to
+latent class model. While there are specialized packages to
deal with mixture data, as far as we know these
don't allow the inclusion of covariates on the prior probabilities of
-class membership. The possibility to estimating the effects of covariates on
+class membership. The possibility to estimate the effects of covariates on
prior and transition probabilities is a distinguishing feature of
-\pkg{depmixS4}. In the next section, an outline is provided of the model and
-the likelihood equations.
+\pkg{depmixS4}. In the next section, we provide an outline of the model and
+likelihood equations.
\section{The dependent mixture model}
-The data considered here have the general form $\vc{O}_{1:T}=O_{1}^{1}, \ldots,
+The data considered here have the general form $\vc{O}_{1:T}= (O_{1}^{1}, \ldots,
O_{1}^{m}$, $O_{2}^{1}, \ldots, O_{2}^{m}$, \ldots, $O_{T}^{1},
-\ldots, O_{T}^{m}$ for an $m$-variate time series of length $T$. As
+\ldots, O_{T}^{m})$ for an $m$-variate time series of length $T$. As
an example, consider a time series of responses generated by a single
-participant in a response time experiment. The data consists of three
+participant in a psychological response time experiment. The data consists of three
variables, response time, response accuracy, and a covariate which is a pay-off
-variable reflecting the relative reward for speeded and/or accuracy responding.
-These variables are measured on 168, 134 and 137 occasions respectively (in
-Figure~\ref{fig:speed} the first part of this series is plotted). These data
+variable reflecting the relative reward for speeded and/or accurate responding.
+These variables are measured on 168, 134 and 137 occasions respectively (the
+first part of this series is plotted in
+Figure~\ref{fig:speed}). These data
are more fully described in \citet{Dutilh2009}.
\begin{figure}[htbp]
@@ -173,45 +174,49 @@
\end{center}
\end{figure}
-The latent Markov model is commonly associated with data of this type,
-although usually only multinomial response variables are considered.
-However, common estimation procedures, such as those implemented in
-\citet{Pol1996}, are not suitable for long time series due to
-underflow problems. In contrast, the hidden Markov model is typically
-only used for `long' univariate time series \citep[][,
-chapter~1]{Cappe2005}. In the next paragraphs, the likelihood
-equation and estimation procedures for the dependent mixture model are
-described for data of the above form. We use the term ``dependent
-mixture model'' because one of the authors (Ingmar Visser) thought it
-was time for a new name for these models\footnote{Only later did I
+The latent Markov model is usually associated with data of this type,
+in particular for multinomially distributed responses. % response variables are considered.
+However, commonly employed estimation procedures
+\citep[e.g.,][]{Pol1996}, are not suitable for long time series due to
+underflow problems. In contrast, the hidden Markov model is typically
+only used for `long' univariate time series \citep[][chapter~1]{Cappe2005}.
+We use the term ``dependent mixture model'' because one of the authors
+(Ingmar Visser) thought it
+was time for a new name to relate these models\footnote{Only later did I
find out that \citet{Leroux1992} already coined the term dependent
mixture models in an application with hidden Markov mixtures of
Poisson count data.}.
+The fundamental assumption of a dependent mixture model is that at any time
+point, the observations are distributed as a mixture with $n$ components
+(or states), and that time-dependencies between the observations are due to
+time-dependencies between the mixture components (i.e., transition probabilities
+between the components). These latter dependencies are assumed to
+follow a first-order Markov process. In the models we are considering here, the
+mixture distributions, the initial mixture probabilities and transition
+probabilities can all dependent on covariates $\vc{z}_t$.
-The likelihood of the dependent mixture model conditional on the
-unknown (or hidden) state sequence $S_{1:T}$ and the model is given
-by:
+%transition probability functions $a_{ij}$ and the initial state
+%probability functions $\greekv{\pi}$ may depend on covariates as well
+%as the response distributions $\vc{b_{j}^{k}}$. See for example
+%\citet{Fruhwirth2006} for an overview of hidden Markov models with
+%extensions.
+
+In a dependent mixture model, the joint likelihood of observations $\vc{O}_{1:T}$ and
+latent states $\vc{S}_{1:T} = (S_1,\ldots,S_T)$, given model parameters
+$\greekvec{\theta}$ and covariates $\vc{z}_{1:T} = (\vc{z}_1,\ldots,\vc{z}_T)$,
+can be written as
\begin{equation}
- \Prob( \vc{O}_{1:T} | \vc{S}_{1:T}, \greekv{\theta} ) =
- \prod_{t=1}^{T} \Prob( \vc{O}_{t} | S_{t}, \greekv{\theta}),
+ \Prob(\vc{O}_{1:T},\vc{S}_{1:T}|\greekv{\theta},\vc{z}_{1:T}) =
+ \pi_{i}(\vc{z}_1) \vc{b}_{S_t}(\vc{O}_1|\vc{z}_{1})
+ \prod_{t=1}^{T-1} a_{ij}(\vc{z}_{t}) \vc{b}_{S_t}(\vc{O}_{t+1}|\vc{z}_{t+1}),
\end{equation}
-where $\greekv{\theta}$ is the parameter vector of the model. To arrive
-at the likelihood of the data given the parameter vector
-$\greekv{\theta}$, i.e.\ without the state sequence, we need to sum
-above likelihood over all possible state sequences. This likelihood
-is written as:
-\begin{equation}
- \Prob(\vc{O}_{1:T}|\vc{S}_{1:T},\greekv{\theta})
- \sum_{all Ss} \pi_{i}(\vc{z}_1) \vc{b}_{1}(\vc{O}_{1})
- \prod_{t=1}^{T-1} a_{ij}(z_{t}) \vc{b}_{j}(\vc{O}_{t+1}|\vc{z}_{t+1}),
-\end{equation}
where we have the following elements:
\begin{enumerate}
\item $S_{t}$ is an element of $\mathcal{S}=\{1\ldots n\}$, a set
- of $n$ latent classes or states; we write for short $S_{i}$ to
- denote $S_{t}=i$.
+ of $n$ latent classes or states. %; we write for short $S_{i}$ to
+ %denote $S_{t}=i$.
\item $\pi_{i}(\vc{z}_1) = \Prob(S_1 = i|\vc{z}_1)$,
giving the probability of class/state $i$ at time $t=1$ with
@@ -221,13 +226,57 @@
provides the probability of a transition from state $i$ to state
$j$ with covariate $\vc{z}_t$,
- \item $\vc{b}_{j}$ is a vector of observation densities
- $b_j^k(\vc{z}_t) = \Prob(O_{t}^k|S_t = j, \vc{z}_t)$ that provide the
+ \item $\vc{b}_{S_t}$ is a vector of observation densities
+ $b_{j}^k(\vc{z}_t) = \Prob(O_{t}^k|S_t = j, \vc{z}_t)$ that provide the
conditional densities of observations $O_{t}^k$ associated with
latent class/state $j$ and covariate $\vc{z}_t$, $j=1, \ldots, n$,
$k=1, \ldots, m$.
\end{enumerate}
+%In the next paragraphs, the likelihood
+%equation and estimation procedures for the dependent mixture model are
+%described for data of the above form.
+
+%The likelihood of the dependent mixture model conditional on the
+%unknown (or hidden) state sequence $\vc{S}_{1:T} = (S_1,\ldots,S_T)$
+%and the model is given
+%by:
+%\begin{equation}
+% \Prob( \vc{O}_{1:T} | \vc{S}_{1:T}, \greekv{\theta} ) =
+% \prod_{t=1}^{T} \Prob( \vc{O}_{t} | S_{t}, \greekv{\theta}),
+%\end{equation}
+%where $\greekv{\theta}$ is the parameter vector of the model. To arrive
+%at the marginal likelihood of the data for parameter vector
+%$\greekv{\theta}$, i.e.\ unconditional on the state sequence, we need to sum
+%above likelihood over all possible state sequences. This likelihood
+%is written as:
+%\begin{equation}
+% \Prob(\vc{O}_{1:T}|\greekv{\theta}) =
+% \sum_{all Ss} \pi_{i}(\vc{z}_1) \vc{b}_{1}(\vc{O}_{1})
+% \prod_{t=1}^{T-1} a_{ij}(\vc{z}_{t}) \vc{b}_{j}(\vc{O}_{t+1}|\vc{z}_{t+1}),
+%\end{equation}
+%where we have the following elements:
+%\begin{enumerate}
+%
+% \item $S_{t}$ is an element of $\mathcal{S}=\{1\ldots n\}$, a set
+% of $n$ latent classes or states; we write for short $S_{i}$ to
+% denote $S_{t}=i$.
+%
+% \item $\pi_{i}(\vc{z}_1) = \Prob(S_1 = i|\vc{z}_1)$,
+% giving the probability of class/state $i$ at time $t=1$ with
+% covariate $\vc{z}_1$.
+%
+% \item $a_{ij}(\vc{z}_t) = \Prob(S_{t+1}=j|S_{t}=i,\vc{z}_t)$,
+% provides the probability of a transition from state $i$ to state
+% $j$ with covariate $\vc{z}_t$,
+%
+% \item $\vc{b}_{j}$ is a vector of observation densities
+% $b_j^k(\vc{z}_t) = \Prob(O_{t}^k|S_t = j, \vc{z}_t)$ that provide the
+% conditional densities of observations $O_{t}^k$ associated with
+% latent class/state $j$ and covariate $\vc{z}_t$, $j=1, \ldots, n$,
+% $k=1, \ldots, m$.
+%\end{enumerate}
+
% The dependent mixture model is defined by the following equations:
% %\begin{align}
% % S_{t} &= A S_{t-1}, t=2, \ldots, T \\
@@ -241,37 +290,38 @@
%matrix, $O_{t}$ is an (possibly multivariate) observation and $b$ is a
%density function for $O_{t}$ conditional on the hidden state $S_{t}$.
-In the example data above, $b_j^k$ could be a Gaussian distribution
-function for the response time variable, and a Bernoulli for the
-accuracy data. In the models we are considering here, both the
-transition probability functions $a_{ij}$and the initial state
+For the example data above, $b_j^k$ could be a Gaussian distribution
+function for the response time variable, and a Bernoulli distribution for the
+accuracy variable. In the models we are considering here, both the
+transition probability functions $a_{ij}$ and the initial state
probability functions $\greekv{\pi}$ may depend on covariates as well
-as the response distributions $\vc{b_{j}^{k}}$. See for example
+as the response distributions $b_{j}^{k}$. See for example
\citet{Fruhwirth2006} for an overview of hidden Markov models with
extensions.
\subsection{Likelihood}
-The log-likelihood of hidden Markov models is usually computed by the
+To obtain maximum likelihood estimates of the model parameters, we need the
+marginal likelihood of the observations. For hidden Markov models, this
+marginal (log-)likelihood is usually computed by the
so-called forward-backward algorithm \citep{Baum1966,Rabiner1989}, or
-rather by the forward part of this algorithm. \cite{Lystig2002}
+rather by the forward part of this algorithm. \cite{Lystig2002}
changed the forward algorithm in such a way as to allow computing the
-gradients of the log-likelihood at the same time. They start by
+gradients of the log-likelihood at the same time. They start by
rewriting the likelihood as follows (for ease of exposition the
-dependence on the model parameters is dropped here):
+dependence on the model parameters and covariates is dropped here):
\begin{equation}
- L_{T} = Pr(\vc{O}_{1:T}) = \prod_{t=1}^{T}
- Pr(\vc{O}_{t}|\vc{O}_{1:t-1}),
+ L_{T} = \Prob(\vc{O}_{1:T}) = \prod_{t=1}^{T}
+ \Prob(\vc{O}_{t}|\vc{O}_{1:(t-1)}),
\label{condLike}
\end{equation}
-where $Pr(\vc{O}_{1}|\vc{O}_{0}):=Pr(\vc{O}_{1})$. Note that for a
+where $\Prob(\vc{O}_{1}|\vc{O}_{0}):=\Prob(\vc{O}_{1})$. Note that for a
simple, i.e.\ observed, Markov chain these probabilities reduce to
-$Pr(\vc{O}_{t}|\vc{O}_{1},\ldots,
-\vc{O}_{t-1})=Pr(\vc{O}_{t}|\vc{O}_{t-1})$.
+$\Prob(\vc{O}_{t}|\vc{O}_{1},\ldots,
+\vc{O}_{t-1})=\Prob(\vc{O}_{t}|\vc{O}_{t-1})$.
The log-likelihood can now be expressed as:
\begin{equation}
- l_{T} = \sum_{t=1}^{T} \log[\Prob(\vc{O}_{t}|\vc{O}_{1}, \ldots,
-\vc{O}_{t-1})].
+ l_{T} = \sum_{t=1}^{T} \log[\Prob(\vc{O}_{t}|\vc{O}_{1:(t-1)}].
\label{eq:condLogl}
\end{equation}
@@ -280,16 +330,15 @@
\begin{align}
\phi_{1}(j) &:= \Prob(\vc{O}_{1}, S_{1}=j) = \pi_{j} b_{j}(\vc{O}_{1})
\label{eq:fwd1} \\
-\begin{split}
- \phi_{t}(j) &:= \Prob(\vc{O}_{t}, S_{t}=j|\vc{O}_{1}, \ldots,
-\vc{O}_{t-1}) \\
- &= \sum_{i=1}^{N} [\phi_{t-1}(i)a_{ij}b_{j}(\vc{O}_{t})] \times
+%\begin{split}
+ \phi_{t}(j) &:= \Prob(\vc{O}_{t}, S_{t}=j|\vc{O}_{1:(t-1)}) %\\
+ = \sum_{i=1}^{N} [\phi_{t-1}(i)a_{ij}b_{j}(\vc{O}_{t})] \times
(\Phi_{t-1})^{-1},
\label{eq:fwdt}
-\end{split}
+%\end{split}
\end{align}
where $\Phi_{t}=\sum_{i=1}^{N} \phi_{t}(i)$. Combining
-$\Phi_{t}=\Prob(\vc{O}_{t}|\vc{O}_{1}, \ldots, \vc{O}_{t-1})$, and
+$\Phi_{t}=\Prob(\vc{O}_{t}|\vc{O}_{1:(t-1)})$, and
equation~(\ref{eq:condLogl}) gives the following expression for the
log-likelihood:
\begin{equation}
@@ -307,12 +356,12 @@
Let $\greekv{\theta} = (\greekv{\theta}_1, \greekv{\theta}_2,\greekv{\theta}_3)$
be the general parameter vector consisting of three subvectors with parameters
for the prior model, transition model, and response model respectively. The
-joint log likelihood can be written as
+joint log-likelihood can be written as
\begin{equation}
\log \Prob(\vc{O}_{1:T}, \vc{S}_{1:T}|\vc{z}_{1:T},\greekv{\theta}) = \log
\Prob(S_1|\vc{z}_{1},\greekv{\theta}_1)
+ \sum_{t=2}^{T} \log \Prob(S_t|S_{t-1},\vc{z}_{t-1},\greekv{\theta}_2)
-+ \sum_{t=1}^{T} \log \Prob(O_t|S_t,\vc{z}_t,\greekv{\theta}_3)
++ \sum_{t=1}^{T} \log \Prob(\vc{O}_t|S_t,\vc{z}_t,\greekv{\theta}_3)
\end{equation}
This likelihood depends on the unobserved states $\vc{S}_{1:T}$. In the
Expectation step, we replace these with their expected values given a set of
@@ -329,7 +378,7 @@
\label{eq:Q}
Q(\greekv{\theta},\greekv{\theta}') =
\sum_{j=1}^n \gamma_1(j) \log \Prob(S_1=j|\vc{z}_1,\greekv{\theta}_1) \\
-+ \sum_{t=2}^T \sum_{j=1}^n \sum_{k=1}^n \xi_t^i(j,k) \log \Prob(S_t = k|S_{t-1}
++ \sum_{t=2}^T \sum_{j=1}^n \sum_{k=1}^n \xi_t(j,k) \log \Prob(S_t = k|S_{t-1}
= j,\vc{z}_{t-1},\greekv{\theta}_2) \\
+ \sum_{t=1}^T \sum_{j=1}^n \sum_{k=1}^m \gamma_t(j)
\ln \Prob(O^k_t|S_t=j,\vc{z}_t,\greekv{\theta}_3),
@@ -338,7 +387,7 @@
where the expected values $\xi_t(j,k) = P(S_t = k, S_{t-1} = j|\vc{O}_{1:T},
\vc{z}_{1:T},\greekv{\theta}')$ and $\gamma_t(j) = P(S_t = j|\vc{O}_{1:T},
\vc{z}_{1:T},\greekv{\theta}')$ can be computed effectively by the
-Forward-Backward algorithm \citep[see e.g.,][]{Rabiner1989}. The Maximisation
+forward-backward algorithm \citep[see e.g.,][]{Rabiner1989}. The Maximisation
step consists of the maximisation of (\ref{eq:Q}) for $\greekv{\theta}$. As the
right hand side of (\ref{eq:Q}) consists of three separate parts, we can
maximise separately for $\greekv{\theta}_1$, $\greekv{\theta}_2$ and
@@ -393,7 +442,8 @@
\subsection{Example data: speed}
-Throughout this article a data set called \code{speed} is used. It
+Throughout this article a data set called \code{speed} is used. As already
+indicated in the Introduction, it
consists of three time series with three variables: response time,
accuracy, and a covariate Pacc which defines the relative pay-off for
speeded and accurate responding. The participant in this experiment
@@ -406,9 +456,10 @@
\subsection{A simple model}
-A dependent mixture model is defined by the number of states, and by
-the response distribution functions, and can be created with the
-\code{depmix}-function as follows:
+A dependent mixture model is defined by the number of states and the initial
+state, state transition, and response distribution functions. A default
+dependent mixture model can be created with the \code{depmix}-function as
+follows:
\begin{CodeChunk}
\begin{CodeInput}
@@ -435,7 +486,7 @@
\code{depmix.fitted} which extends the \code{depmix} class, adding
convergence information (and information about constraints if these
were applied, see below). The \code{print} method provides summary
-information on convergence, the log likelihood and the AIC and BIC
+information on convergence, the log-likelihood and the AIC and BIC
values:
\begin{CodeChunk}
\begin{CodeInput}
@@ -500,32 +551,37 @@
\end{CodeOutput}
\end{CodeChunk}
+Note that, since no further arguments were specified, the initial state,
+state transition and response distributions were set to their defaults
+(multinomial distributions for the first two, and Gaussian distributions for the
+response distributions).
\subsection{Covariates on transition parameters}
-The transition probabilities and the initial state probabilities
+By default, the transition probabilities and the initial state probabilities
are parameterized using the multinomial
-logistic model. In particular, each row of the transition matrix is
+logistic model. More precisely, each row of the transition matrix is
parameterized by a baseline category logistic multinomial,
meaning that the parameter for the
base category is fixed at zero \citep[see][p.\ 267
ff., for multinomial logistic models and various
-parameterizations]{Agresti2002}. See also \citet{Chung2007} for similar models,
-latent transition models using logistic regression on the transition parameters.
-They fit such models on repeated measurement
-data ($T=2$) using Bayesian methods. The default baseline category is the
-first state.
-Hence, for example, for a 3-state model, the initial state probability
+parameterizations]{Agresti2002}. The default baseline category is the
+first state. Hence, for example, for a 3-state model, the initial state probability
model would have three parameters of which the first is fixed at zero
-and the other two are freely estimated.
+and the other two are freely estimated.
+The multinomial logistic model allows us to include covariates
+on the initial state and transition probabilities. \citet{Chung2007} discuss a
+related latent transition model for repeated measurement
+data ($T=2$) using logistic regression on the transition
+parameters; they rely on Bayesian methods of estimation.
Covariates on the transition probabilities can be specified using a
one-sided formula as in the following example:
\begin{CodeChunk}
\begin{CodeInput}
> set.seed(1)
> mod <- depmix(rt~1, data=speed, nstates=2, family=gaussian(),
- transition=~scale(Pacc), instart=runif(2))
+ transition=~scale(Pacc), instart=runif(2))
> fm <- fit(mod)
\end{CodeInput}
\end{CodeChunk}
@@ -538,7 +594,7 @@
Transition model for state (component) 1
Model of type multinomial, formula: ~scale(Pacc)
Coefficients:
- [,1] [,2]
+ [,1] [,2]
[1,] 0 -0.9215182
[2,] 0 1.8649734
Probalities at zero values of the covariates.
@@ -547,7 +603,7 @@
Transition model for state (component) 2
Model of type multinomial, formula: ~scale(Pacc)
Coefficients:
- [,1] [,2]
+ [,1] [,2]
[1,] 0 2.471442
[2,] 0 3.570856
Probalities at zero values of the covariates.
@@ -574,8 +630,8 @@
\begin{CodeInput}
> set.seed(1)
> mod <- depmix(list(rt~1,corr~1), data=speed, nstates=2,
- family=list(gaussian(),multinomial()),
- transition=~scale(Pacc),instart=runif(2))
+ family=list(gaussian(),multinomial()),
+ transition=~scale(Pacc),instart=runif(2))
> fm <- fit(mod)
\end{CodeInput}
\end{CodeChunk}
@@ -598,7 +654,7 @@
Response model for response 2
Model of type multinomial, formula: corr ~ 1
Coefficients:
- [,1] [,2]
+ [,1] [,2]
[1,] 0 0.1030554
Probalities at zero values of the covariates.
0.4742589 0.5257411
@@ -614,7 +670,7 @@
Response model for response 2
Model of type multinomial, formula: corr ~ 1
Coefficients:
- [,1] [,2]
+ [,1] [,2]
[1,] 0 2.245514
Probalities at zero values of the covariates.
0.09573715 0.9042629
@@ -665,7 +721,7 @@
\begin{CodeOutput}
Mixture probabilities model
Model of type multinomial, formula: ~age
- [,1] [,2]
+ [,1] [,2]
[1,] 0 -2.5182573
[2,] 0 0.5512996
Probalities at zero values of the covariates.
@@ -713,8 +769,7 @@
To see the ordering of parameters use the following:
\begin{CodeChunk}
\begin{CodeInput}
-> mod <- setpars(mod, value=1:npar(mod))
-> mod
+> setpars(mod, value=1:npar(mod))
\end{CodeInput}
\end{CodeChunk}
@@ -723,8 +778,7 @@
initial state probabilities model):
\begin{CodeChunk}
\begin{CodeInput}
-> mod <- setpars(mod, getpars(mod,which="fixed"))
-> mod
+> setpars(mod, getpars(mod,which="fixed"))
\end{CodeInput}
\end{CodeChunk}
@@ -758,17 +812,17 @@
tenability of constraints \citep{Dannemann2007}. Note that these arguments
provide the possibility for arbitrary
constraints, also between, e.g., a multinomial regression coefficient
-for the transition matrix and the mean of a gaussian response model.
+for the transition matrix and the mean of a Gaussian response model.
Whether such constraints make sense is hence the responsibility of
the user.
\section[Extending depmixS4]{Extending \pkg{depmixS4}}
-The \pkg{depmixS4} package was built with the aim of having the
-possibility of adding new response distributions (and possibly also
-other models for the prior and transition probabilities models).
-Therefore, the EM algorithm simply calls \code{fit} functions from the
-separate response models without knowing what they are. As a
+The \pkg{depmixS4} package was designed with the aim of making it relatively
+easy to add new response distributions (as well as possibly new prior and
+transition models).
+To make this possible, the EM routine simply calls the \code{fit} methods
+of the separate response models without needing to know how they work. As a
consequence, adding user-specified response models is straightforward.
User-defined distributions should extend the \code{response}-class
and have the following slots:
@@ -776,7 +830,7 @@
\item y: the response variable
\item x: the design matrix, possibly only an intercept
\item paramaters: a named list with the coefficients and possibly
- other parameters, e.g., the standard deviation in the gaussian
+ other parameters, e.g., the standard deviation in the Gaussian
response model
\item fixed: a vector of logicals indicating whether parameters are
fixed
@@ -784,11 +838,12 @@
\end{enumerate}
In the \code{speed} data example, it may be more appropriate to model
-the response times as an exgaus distribution rather than using the
-Gaussian. We can do so as follows, first we define our own
-exgaus-class extending the response-class:
+the response times with an exgaus rather than Gaussian distribution.
+To do so, we first define an exgaus-class extending the response-class:
\begin{CodeChunk}
- setClass("exgaus", contains="response")
+\begin{CodeInput}
+setClass("exgaus", contains="response")
+\end{CodeInput}
\end{CodeChunk}
The so-defined class now needs a number of methods:
@@ -803,84 +858,86 @@
by the EM algorithm)
\end{enumerate}
-Only the constructor function and the fit function are provided here
-and the complete code is provided in the helpfile for \code{makeDepmix}.
-The \code{exgaus} example uses the \pkg{gamlss} and \pkg{gamlss.distr} packages \cite{Stasinopoulos2009a,Stasinopoulos2009b} for
+Only the constructor and the fit method are provided here; the complete code
+can be found in the helpfile for \code{makeDepmix}.
+The \code{exgaus} example uses the \pkg{gamlss} and \pkg{gamlss.distr}
+packages \citep{Stasinopoulos2009a,Stasinopoulos2009b} for
computing the \code{dens}ity and for \code{fit}ting the parameters.
-The constructor function is defined as:
+The constructor method is defined as:
\begin{CodeChunk}
\begin{CodeInput}
setGeneric("exgaus", function(y, pstart = NULL, fixed = NULL, ...)
- standardGeneric("exgaus"))
+ standardGeneric("exgaus"))
+
setMethod("exgaus",
- signature(y="ANY"),
- function(y,pstart=NULL,fixed=NULL, ...) {
- y <- matrix(y,length(y))
- x <- matrix(1)
- parameters <- list()
- npar <- 3
- if(is.null(fixed)) fixed <- as.logical(rep(0,npar))
- if(!is.null(pstart)) {
- if(length(pstart)!=npar) stop("length of 'pstart' must be ",npar)
- parameters$mu <- pstart[1]
- parameters$sigma <- log(pstart[2])
- parameters$nu <- log(pstart[3])
- }
- mod <- new("exgaus",parameters=parameters,fixed=fixed,x=x,y=y,npar=npar)
- mod
- }
+ signature(y="ANY"),
+ function(y,pstart=NULL,fixed=NULL, ...) {
+ y <- matrix(y,length(y))
+ x <- matrix(1)
+ parameters <- list()
+ npar <- 3
+ if(is.null(fixed)) fixed <- as.logical(rep(0,npar))
+ if(!is.null(pstart)) {
+ if(length(pstart)!=npar) stop("length of 'pstart' must be ",npar)
+ parameters$mu <- pstart[1]
+ parameters$sigma <- log(pstart[2])
+ parameters$nu <- log(pstart[3])
+ }
+ mod <- new("exgaus",parameters=parameters,fixed=fixed,x=x,y=y,npar=npar)
+ mod
+ }
)
\end{CodeInput}
\end{CodeChunk}
-The \code{fit} function is defined as follows:
+The \code{fit} method is defined as follows:
\begin{CodeChunk}
\begin{CodeInput}
setMethod("fit","exgaus",
- function(object,w) {
- if(missing(w)) w <- NULL
- y <- object at y
- fit <- gamlss(y~1,weights=w,family=exGAUS(),
- control=gamlss.control(n.cyc=100,trace=FALSE),
- mu.start=object at parameters$mu,
- sigma.start=exp(object at parameters$sigma),
- nu.start=exp(object at parameters$nu))
- pars <- c(fit$mu.coefficients,fit$sigma.coefficients,fit$nu.coefficients)
- object <- setpars(object,pars)
- object
- }
+ function(object,w) {
+ if(missing(w)) w <- NULL
+ y <- object at y
+ fit <- gamlss(y~1,weights=w,family=exGAUS(),
+ control=gamlss.control(n.cyc=100,trace=FALSE),
+ mu.start=object at parameters$mu,
+ sigma.start=exp(object at parameters$sigma),
+ nu.start=exp(object at parameters$nu))
+ pars <- c(fit$mu.coefficients,fit$sigma.coefficients,fit$nu.coefficients)
+ object <- setpars(object,pars)
+ object
+ }
)
\end{CodeInput}
\end{CodeChunk}
-The \code{fit} function defines a \code{gamlss} model with
+The \code{fit} method defines a \code{gamlss} model with
only an intercept to be estimated and then sets the fitted parameters
back into their respective slots in the `exgaus' object. See the help
for \code{gamlss.distr} for interpretation of these parameters.
After defining all the necessary methods for the new response model,
-we can now define the dependent mixture model using this reponse.
-The function \code{makeDepmix} is added to \pkg{depmixS4} to have
+we can now define the dependent mixture model using this reponse model.
+The function \code{makeDepmix} is included in \pkg{depmixS4} to have
full control over model specification, and we need it here.
We first create all the response models that we want as a double list:
\begin{CodeChunk}
\begin{CodeInput}
rModels <- list(
- list(
- exgaus(rt,pstart=c(5,.1,.1)),
- GLMresponse(formula=corr~1,data=speed,family=multinomial(),pstart=c(0.5,0.5))
- ),
- list(
- exgaus(rt,pstart=c(6,.1,.1)),
- GLMresponse(formula=corr~1,data=speed,family=multinomial(),pstart=c(.1,.9))
- )
+ list(
+ exgaus(rt,pstart=c(5,.1,.1)),
+ GLMresponse(formula=corr~1,data=speed,family=multinomial(),pstart=c(0.5,0.5))
+ ),
+ list(
+ exgaus(rt,pstart=c(6,.1,.1)),
+ GLMresponse(formula=corr~1,data=speed,family=multinomial(),pstart=c(.1,.9))
+ )
)
\end{CodeInput}
\end{CodeChunk}
Next, we define the transition and prior probability models using the
-transInit function (which produces a transInit model, which also extends
+\code{transInit} function (which produces a transInit model, which also extends
the response class):
\begin{CodeChunk}
\begin{CodeInput}
@@ -895,9 +952,9 @@
the model:
\begin{CodeChunk}
\begin{CodeInput}
-mod <- makeDepmix(response=rModels,transition=transition,
-prior=inMod,stat=FALSE)
-fm <- fit(mod)
+> mod <- makeDepmix(response=rModels,transition=transition,
+ prior=inMod,stat=FALSE)
+> fm <- fit(mod)
\end{CodeInput}
\end{CodeChunk}
@@ -911,7 +968,7 @@
\pkg{depmixS4} provides a flexible framework for fitting dependent mixture models
for a large variety of response distributions. It can also fit latent class regression
and finite mixture models, although it should be noted that more specialized packages
-are available for doing so, e.g.\ \pkg{FlexMix} \citep{Leisch2004}. The package is intended
+are available for this such as \pkg{FlexMix} \citep{Leisch2004}. The package is intended
for modeling of (individual) time series data with the aim of characterizing the transition
processes underlying the data. The possibility to use covariates on the transition matrix
greatly enhances the flexibility of the model. The EM algorithm uses a very general
More information about the depmix-commits
mailing list