[Depmix-commits] r362 - papers/jss

Tue Feb 23 15:05:21 CET 2010

Author: ingmarvisser
Date: 2010-02-23 15:05:21 +0100 (Tue, 23 Feb 2010)
New Revision: 362

Modified:
   papers/jss/dpx4Rev.Rnw
   papers/jss/dpx4Rev.tex
Log:
Various changes in response to reviewers for jss paper

Modified: papers/jss/dpx4Rev.Rnw
===================================================================

--- papers/jss/dpx4Rev.Rnw	2010-02-23 14:04:48 UTC (rev 361)
+++ papers/jss/dpx4Rev.Rnw	2010-02-23 14:05:21 UTC (rev 362)
@@ -83,7 +83,7 @@
 
 %\batchmode
 
-\SweaveOpts{echo=FALSE}
+\SweaveOpts{echo=TRUE}
 \usepackage{a4wide}
 
 %\usepackage{Sweave}
@@ -166,13 +166,14 @@
 The data considered here have the general form $\vc{O}_{1:T}=
 (O_{1}^{1}, \ldots, O_{1}^{m}$, $O_{2}^{1}, \ldots, O_{2}^{m}$,
 \ldots, $O_{T}^{1}, \ldots, O_{T}^{m})$ for an $m$-variate time series
-of length $T$.  As an example, consider a time series of responses
-generated by a single participant in a psychological response time
-experiment.  The data consists of three variables, response time,
-response accuracy, and a covariate which is a pay-off variable
-reflecting the relative reward for speeded and/or accurate responding.
-These variables are measured on 168, 134 and 137 occasions
-respectively (the first part of this series is plotted in
+of length $T$.  In the following, we use $\vc{O}_{t}$ as shorthand for
+$O_{t}^{1}, \ldots, O_{t}^{m}$.  As an example, consider a time series
+of responses generated by a single participant in a psychological
+response time experiment.  The data consists of three variables,
+response time, response accuracy, and a covariate which is a pay-off
+variable reflecting the relative reward for speeded and/or accurate
+responding.  These variables are measured on 168, 134 and 137
+occasions respectively (the first part of this series is plotted in
 Figure~\ref{fig:speed}).  These data are more fully described in
 \citet{Dutilh2009}.
 
@@ -342,14 +343,12 @@
 To compute the log-likelihood, \cite{Lystig2002} define the following 
 (forward) recursion:
 \begin{align}
-	\phi_{1}(j) &:= \Prob(\vc{O}_{1}, S_{1}=j) = \pi_{j} b_{j}(\vc{O}_{1}) 
+	\phi_{1}(j) &:= \Prob(\vc{O}_{1}, S_{1}=j) = \pi_{j} \vc{b}_{j}(\vc{O}_{1}) 
 	\label{eq:fwd1} \\
-%\begin{split}
 	\phi_{t}(j) &:= \Prob(\vc{O}_{t}, S_{t}=j|\vc{O}_{1:(t-1)}) %\\
-	= \sum_{i=1}^{N} [\phi_{t-1}(i)a_{ij}b_{j}(\vc{O}_{t})] \times 
+	= \sum_{i=1}^{N} [\phi_{t-1}(i)a_{ij} \vc{b}_{j}(\vc{O}_{t})] \times 
 (\Phi_{t-1})^{-1},
 	\label{eq:fwdt} 
-%\end{split} 
 \end{align}
 where $\Phi_{t}=\sum_{i=1}^{N} \phi_{t}(i)$. Combining 
 $\Phi_{t}=\Prob(\vc{O}_{t}|\vc{O}_{1:(t-1)})$, and 
@@ -366,7 +365,7 @@
 Parameters are estimated in \pkg{depmixS4} using the EM algorithm or
 through the use of a general Newton-Raphson optimizer.  In the EM
 algorithm, parameters are estimated by iteratively maximising the
-expected joint likelihood of the parameters given the observations and
+expected joint log-likelihood of the parameters given the observations and
 states.  Let $\greekv{\theta} = (\greekv{\theta}_1,
 \greekv{\theta}_2,\greekv{\theta}_3)$ be the general parameter vector
 consisting of three subvectors with parameters for the prior model,
@@ -381,14 +380,13 @@
 This likelihood depends on the unobserved states $\vc{S}_{1:T}$. In the 
 Expectation step, we replace these with their expected values given a set of 
 (initial) parameters $\greekv{\theta}' = (\greekv{\theta}'_1, 
-\greekv{\theta}'_2,\greekv{\theta}'_3)$ and observations $\vc{O}_{1:T}$. The expected 
-log likelihood 
+\greekv{\theta}'_2,\greekv{\theta}'_3)$ and observations $\vc{O}_{1:T}$. 
+The expected log-likelihood:
 \begin{equation}
 Q(\greekv{\theta},\greekv{\theta}') = E_{\greekv{\theta}'} 
 (\log \Prob(\vc{O}_{1:T},\vc{S}_{1:T}|\vc{O}_{1:T},\vc{z}_{1:T},\greekv{\theta}))
 \end{equation}
 can be written as
-%\begin{equation}
 \begin{multline}
 \label{eq:Q}
 Q(\greekv{\theta},\greekv{\theta}') = 
@@ -396,21 +394,22 @@
 + \sum_{t=2}^T \sum_{j=1}^n \sum_{k=1}^n \xi_t(j,k) \log \Prob(S_t = k|S_{t-1} 
 = j,\vc{z}_{t-1},\greekv{\theta}_2)  \\
  + \sum_{t=1}^T \sum_{j=1}^n \sum_{k=1}^m \gamma_t(j) 
-\ln \Prob(O^k_t|S_t=j,\vc{z}_t,\greekv{\theta}_3),
+\log \Prob(O^k_t|S_t=j,\vc{z}_t,\greekv{\theta}_3),
 \end{multline}
-%\end{equation}
-where the expected values $\xi_t(j,k) =  P(S_t = k, S_{t-1} = j|\vc{O}_{1:T},
-\vc{z}_{1:T},\greekv{\theta}')$ and $\gamma_t(j) = P(S_t = j|\vc{O}_{1:T},
-\vc{z}_{1:T},\greekv{\theta}')$ can be computed effectively by the 
-forward-backward algorithm \citep[see e.g.,][]{Rabiner1989}. The Maximisation 
-step consists of the maximisation of (\ref{eq:Q}) for $\greekv{\theta}$. As the 
-right hand side of (\ref{eq:Q}) consists of three separate parts, we can 
-maximise separately for $\greekv{\theta}_1$, $\greekv{\theta}_2$ and 
-$\greekv{\theta}_3$. In common models, maximisation for $\greekv{\theta}_1$ and 
-$\greekv{\theta}_2$ is performed by the \code{nnet.default} routine in the 
-\pkg{nnet} package \citep{Venables2002}, and maximisation for 
-$\greekv{\theta}_3$ by the standard \code{glm} routine. Note that for the latter 
-maximisation, the expected values $\gamma_t(j)$ are used as prior weights of the 
+where the expected values $\xi_t(j,k) = P(S_t = k, S_{t-1} =
+j|\vc{O}_{1:T}, \vc{z}_{1:T},\greekv{\theta}')$ and $\gamma_t(j) =
+P(S_t = j|\vc{O}_{1:T}, \vc{z}_{1:T},\greekv{\theta}')$ can be
+computed effectively by the forward-backward algorithm \citep[see
+e.g.,][]{Rabiner1989}.  The Maximisation step consists of the
+maximisation of (\ref{eq:Q}) for $\greekv{\theta}$.  As the right hand
+side of (\ref{eq:Q}) consists of three separate parts, we can maximise
+separately for $\greekv{\theta}_1$, $\greekv{\theta}_2$ and
+$\greekv{\theta}_3$.  In common models, maximisation for
+$\greekv{\theta}_1$ and $\greekv{\theta}_2$ is performed by the
+\code{nnet.default} routine in the \pkg{nnet} package
+\citep{Venables2002}, and maximisation for $\greekv{\theta}_3$ by the
+standard \code{glm} routine.  Note that for the latter maximisation,
+the expected values $\gamma_t(j)$ are used as prior weights of the
 observations $O^k_t$.
 
 
@@ -430,9 +429,11 @@
 constraints to parameters can be problematic; in particular, EM can
 lead to wrong parameter estimates when applying constraints.  Hence,
 in \pkg{depmixS4}, EM is used by default in unconstrained models, but
-otherwise, direct optimization is done using \pkg{Rdonlp2}
-\citep{Tamura2009,Spellucci2002}, because it handles general linear
-(in)equality constraints, and optionally also non-linear constraints.
+otherwise, direct optimization is used.  Two options are available for
+direct optimization using package \pkg{Rdonlp2}
+\citep{Tamura2009,Spellucci2002}, or package \pkg{Rsolnp}.  Both
+packages can handle general linear (in)equality constraints, and
+optionally also non-linear constraints.
 
 %Need some more on EM and how/why it is justified to do separate weighted
 %fits of the response models and transition and prior models. 

Modified: papers/jss/dpx4Rev.tex
===================================================================
--- papers/jss/dpx4Rev.tex	2010-02-23 14:04:48 UTC (rev 361)
+++ papers/jss/dpx4Rev.tex	2010-02-23 14:05:21 UTC (rev 362)
@@ -166,7 +166,8 @@
 The data considered here have the general form $\vc{O}_{1:T}=
 (O_{1}^{1}, \ldots, O_{1}^{m}$, $O_{2}^{1}, \ldots, O_{2}^{m}$,
 \ldots, $O_{T}^{1}, \ldots, O_{T}^{m})$ for an $m$-variate time series
-of length $T$.  As an example, consider a time series of responses
+of length $T$.  In the following, we use $\vc{O}_{t}$ as shorthand for
+$O_{t}^{1}, \ldots, O_{t}^{m}$. As an example, consider a time series of responses
 generated by a single participant in a psychological response time
 experiment.  The data consists of three variables, response time,
 response accuracy, and a covariate which is a pay-off variable