[Depmix-commits] r308 - papers/jss

Wed Aug 5 16:03:05 CEST 2009

Author: maarten
Date: 2009-08-05 16:03:04 +0200 (Wed, 05 Aug 2009)
New Revision: 308

Modified:
   papers/jss/article.tex
Log:
- changes to EM description

Modified: papers/jss/article.tex
===================================================================

--- papers/jss/article.tex	2009-08-05 13:35:22 UTC (rev 307)
+++ papers/jss/article.tex	2009-08-05 14:03:04 UTC (rev 308)
@@ -309,31 +309,45 @@
 for the prior model, transition model, and response model respectively. The 
 joint log likelihood can be written as
 \begin{equation}
-\log \Prob(O_{1:T}, S_{1:T}|\greekv{\theta}) = \log \Prob(S_1|\greekv{\theta}_1) 
-+ \sum_{t=2}^{T} \log \Prob(S_t|S_{t-1},\greekv{\theta}_2) 
-+ \sum_{t=1}^{T} \log \Prob(O_t|S_t,\greekv{\theta}_3)
+\log \Prob(\vc{O}_{1:T}, \vc{S}_{1:T}|\vc{z}_{1:T},\greekv{\theta}) = \log 
+\Prob(S_1|\vc{z}_{1},\greekv{\theta}_1) 
++ \sum_{t=2}^{T} \log \Prob(S_t|S_{t-1},\vc{z}_{t-1},\greekv{\theta}_2) 
++ \sum_{t=1}^{T} \log \Prob(O_t|S_t,\vc{z}_t,\greekv{\theta}_3)
 \end{equation}
-This likelihood depends on the unobserved states $S_t$. In the Expectation step,
-we replace these with their expected values given a set of (initial) parameters 
-$\greekv{\theta}' = (\greekv{\theta}'_1, \greekv{\theta}'_2,\greekv{\theta}'_3)$
-and observations $O_{1:T}$. The expected log likelihood 
+This likelihood depends on the unobserved states $\vc{S}_{1:T}$. In the 
+Expectation step, we replace these with their expected values given a set of 
+(initial) parameters $\greekv{\theta}' = (\greekv{\theta}'_1, 
+\greekv{\theta}'_2,\greekv{\theta}'_3)$ and observations $O_{1:T}$. The expected 
+log likelihood 
 \begin{equation}
 Q(\greekv{\theta},\greekv{\theta}') = E_{\greekv{\theta}'} 
-(\log \Prob(O_{1:T},S_{1:T}|O_{1:T},\greekv{\theta}))
+(\log \Prob(\vc{O}_{1:T},\vc{S}_{1:T}|\vc{O}_{1:T},\vc{z}_{1:T},\greekv{\theta}))
 \end{equation}
 can be written as
 %\begin{equation}
 \begin{multline}
 \label{eq:Q}
 Q(\greekv{\theta},\greekv{\theta}') = 
-\sum_{j=1}^n \gamma_1(j) \log \Prob(S_1=j|\greekv{\theta}_1) \\ 
+\sum_{j=1}^n \gamma_1(j) \log \Prob(S_1=j|\vc{z}_1,\greekv{\theta}_1) \\ 
 + \sum_{t=2}^T \sum_{j=1}^n \sum_{k=1}^n \xi_t^i(j,k) \log \Prob(S_t = k|S_{t-1} 
-= j,\greekv{\theta}_2)  \\
+= j,\vc{z}_{t-1},\greekv{\theta}_2)  \\
  + \sum_{t=1}^T \sum_{j=1}^n \sum_{k=1}^m \gamma_t(j) 
-\ln \Prob(O^k_t|S_t=j,\greekv{\theta}_3),
+\ln \Prob(O^k_t|S_t=j,\vc{z}_t,\greekv{\theta}_3),
 \end{multline}
 %\end{equation}
-where the expected values $\xi_t(j,k) =  P(S_t = k, S_{t-1} = j|O_{1:T},\greekv{\theta}')$ and $\gamma_t(j) = P(S_t = j|O_{1:T},\greekv{\theta}')$ can be computed effectively by the Forward-Backward algorithm \citep[see e.g.,][]{Rabiner1989}. The Maximisation step consists of the maximisation of (\ref{eq:Q}) for $\greekv{\theta}$. As the r.h.s. of (\ref{eq:Q}) consists of three separate parts, we can maximise separately for $\greekv{\theta}_1$, $\greekv{\theta}_2$ and $\greekv{\theta}_3$. In common models, maximisation for $\greekv{\theta}_1$ and $\greekv{\theta}_2$ is performed by the \code{nnet.default} routine in \pkg{MASS}, and maximisation for $\greekv{\theta}_3$ by the \code{glm} routine. 
+where the expected values $\xi_t(j,k) =  P(S_t = k, S_{t-1} = j|\vc{O}_{1:T},
+\vc{z}_{1:T},\greekv{\theta}')$ and $\gamma_t(j) = P(S_t = j|\vc{O}_{1:T},
+\vc{z}_{1:T},\greekv{\theta}')$ can be computed effectively by the 
+Forward-Backward algorithm \citep[see e.g.,][]{Rabiner1989}. The Maximisation 
+step consists of the maximisation of (\ref{eq:Q}) for $\greekv{\theta}$. As the 
+right hand side of (\ref{eq:Q}) consists of three separate parts, we can 
+maximise separately for $\greekv{\theta}_1$, $\greekv{\theta}_2$ and 
+$\greekv{\theta}_3$. In common models, maximisation for $\greekv{\theta}_1$ and 
+$\greekv{\theta}_2$ is performed by the \code{nnet.default} routine in the 
+\pkg{nnet} package \citep{Venables2002}, and maximisation for 
+$\greekv{\theta}_3$ by the standard \code{glm} routine. Note that for the latter 
+maximisation, the expected values $\gamma_t(j)$ are used as prior weights of the 
+observations $O^k_t$.
 
 
 
@@ -353,7 +367,7 @@
 EM can lead to wrong parameter estimates when applying constraints.
 Hence, in \pkg{depmixS4}, EM is used by default in unconstrained
 models, but otherwise, direct optimization is done using \pkg{Rdonlp2}
-\cite{Tamura2009,Spellucci2002}, because it handles general linear
+\citep{Tamura2009,Spellucci2002}, because it handles general linear
 (in)equality constraints, and optionally also non-linear constraints.
 
 %Need some more on EM and how/why it is justified to do separate weighted
@@ -379,7 +393,7 @@
 
 \subsection{Example data: speed}
 
-Throughout this manual a data set called \code{speed} is used.  It
+Throughout this article a data set called \code{speed} is used.  It
 consists of three time series with three variables: response time,
 accuracy, and a covariate Pacc which defines the relative pay-off for
 speeded and accurate responding.  The participant in this experiment
@@ -406,8 +420,8 @@
 The \code{depmix} function returns an object of class \code{depmix}
 which contains the model specification (and not a fitted model!).
 Note also that start values for the transition parameters are provided
-in this call using the \code{trstart} argument. The package does not 
-provide automatic starting values. 
+in this call using the \code{trstart} argument. At this time, the package does 
+not provide automatic starting values. 
 
 The so-defined models needs to be \code{fit}ted with the following
 line of code:
@@ -434,7 +448,7 @@
 BIC:  211.275 
 \end{CodeOutput}
 \end{CodeChunk}
-These statistics may be extracted using \code{logLik},
+These statistics can also be extracted using \code{logLik},
 \code{AIC} and \code{BIC}, respectively.
 
 The \code{summary} method of \code{fit}ted models provides the parameter
@@ -494,10 +508,13 @@
 logistic model.   In particular, each row of the transition matrix is
 parameterized by a baseline category logistic multinomial, 
 meaning that the parameter for the
-base category is fixed at zero (see \citet[see][p.\ 267
-ff.]{Agresti2002} for multinomial logistic models and various
-parameterizations). See also \citet{Chung2007} for similar models, latent transition models using logistic regression on the transition parameters. They fit such models on repeated measurement
-data ($T=2$) using Bayesian methods.  The default baseline category is the first state.
+base category is fixed at zero \citep[see][p.\ 267
+ff., for multinomial logistic models and various
+parameterizations]{Agresti2002}. See also \citet{Chung2007} for similar models, 
+latent transition models using logistic regression on the transition parameters. 
+They fit such models on repeated measurement
+data ($T=2$) using Bayesian methods.  The default baseline category is the 
+first state.
 Hence, for example, for a 3-state model, the initial state probability
 model would have three parameters of which the first is fixed at zero
 and the other two are freely estimated.