[Depmix-commits] r362 - papers/jss
noreply at r-forge.r-project.org
noreply at r-forge.r-project.org
Tue Feb 23 15:05:21 CET 2010
Author: ingmarvisser
Date: 2010-02-23 15:05:21 +0100 (Tue, 23 Feb 2010)
New Revision: 362
Modified:
papers/jss/dpx4Rev.Rnw
papers/jss/dpx4Rev.tex
Log:
Various changes in response to reviewers for jss paper
Modified: papers/jss/dpx4Rev.Rnw
===================================================================
--- papers/jss/dpx4Rev.Rnw 2010-02-23 14:04:48 UTC (rev 361)
+++ papers/jss/dpx4Rev.Rnw 2010-02-23 14:05:21 UTC (rev 362)
@@ -83,7 +83,7 @@
%\batchmode
-\SweaveOpts{echo=FALSE}
+\SweaveOpts{echo=TRUE}
\usepackage{a4wide}
%\usepackage{Sweave}
@@ -166,13 +166,14 @@
The data considered here have the general form $\vc{O}_{1:T}=
(O_{1}^{1}, \ldots, O_{1}^{m}$, $O_{2}^{1}, \ldots, O_{2}^{m}$,
\ldots, $O_{T}^{1}, \ldots, O_{T}^{m})$ for an $m$-variate time series
-of length $T$. As an example, consider a time series of responses
-generated by a single participant in a psychological response time
-experiment. The data consists of three variables, response time,
-response accuracy, and a covariate which is a pay-off variable
-reflecting the relative reward for speeded and/or accurate responding.
-These variables are measured on 168, 134 and 137 occasions
-respectively (the first part of this series is plotted in
+of length $T$. In the following, we use $\vc{O}_{t}$ as shorthand for
+$O_{t}^{1}, \ldots, O_{t}^{m}$. As an example, consider a time series
+of responses generated by a single participant in a psychological
+response time experiment. The data consists of three variables,
+response time, response accuracy, and a covariate which is a pay-off
+variable reflecting the relative reward for speeded and/or accurate
+responding. These variables are measured on 168, 134 and 137
+occasions respectively (the first part of this series is plotted in
Figure~\ref{fig:speed}). These data are more fully described in
\citet{Dutilh2009}.
@@ -342,14 +343,12 @@
To compute the log-likelihood, \cite{Lystig2002} define the following
(forward) recursion:
\begin{align}
- \phi_{1}(j) &:= \Prob(\vc{O}_{1}, S_{1}=j) = \pi_{j} b_{j}(\vc{O}_{1})
+ \phi_{1}(j) &:= \Prob(\vc{O}_{1}, S_{1}=j) = \pi_{j} \vc{b}_{j}(\vc{O}_{1})
\label{eq:fwd1} \\
-%\begin{split}
\phi_{t}(j) &:= \Prob(\vc{O}_{t}, S_{t}=j|\vc{O}_{1:(t-1)}) %\\
- = \sum_{i=1}^{N} [\phi_{t-1}(i)a_{ij}b_{j}(\vc{O}_{t})] \times
+ = \sum_{i=1}^{N} [\phi_{t-1}(i)a_{ij} \vc{b}_{j}(\vc{O}_{t})] \times
(\Phi_{t-1})^{-1},
\label{eq:fwdt}
-%\end{split}
\end{align}
where $\Phi_{t}=\sum_{i=1}^{N} \phi_{t}(i)$. Combining
$\Phi_{t}=\Prob(\vc{O}_{t}|\vc{O}_{1:(t-1)})$, and
@@ -366,7 +365,7 @@
Parameters are estimated in \pkg{depmixS4} using the EM algorithm or
through the use of a general Newton-Raphson optimizer. In the EM
algorithm, parameters are estimated by iteratively maximising the
-expected joint likelihood of the parameters given the observations and
+expected joint log-likelihood of the parameters given the observations and
states. Let $\greekv{\theta} = (\greekv{\theta}_1,
\greekv{\theta}_2,\greekv{\theta}_3)$ be the general parameter vector
consisting of three subvectors with parameters for the prior model,
@@ -381,14 +380,13 @@
This likelihood depends on the unobserved states $\vc{S}_{1:T}$. In the
Expectation step, we replace these with their expected values given a set of
(initial) parameters $\greekv{\theta}' = (\greekv{\theta}'_1,
-\greekv{\theta}'_2,\greekv{\theta}'_3)$ and observations $\vc{O}_{1:T}$. The expected
-log likelihood
+\greekv{\theta}'_2,\greekv{\theta}'_3)$ and observations $\vc{O}_{1:T}$.
+The expected log-likelihood:
\begin{equation}
Q(\greekv{\theta},\greekv{\theta}') = E_{\greekv{\theta}'}
(\log \Prob(\vc{O}_{1:T},\vc{S}_{1:T}|\vc{O}_{1:T},\vc{z}_{1:T},\greekv{\theta}))
\end{equation}
can be written as
-%\begin{equation}
\begin{multline}
\label{eq:Q}
Q(\greekv{\theta},\greekv{\theta}') =
@@ -396,21 +394,22 @@
+ \sum_{t=2}^T \sum_{j=1}^n \sum_{k=1}^n \xi_t(j,k) \log \Prob(S_t = k|S_{t-1}
= j,\vc{z}_{t-1},\greekv{\theta}_2) \\
+ \sum_{t=1}^T \sum_{j=1}^n \sum_{k=1}^m \gamma_t(j)
-\ln \Prob(O^k_t|S_t=j,\vc{z}_t,\greekv{\theta}_3),
+\log \Prob(O^k_t|S_t=j,\vc{z}_t,\greekv{\theta}_3),
\end{multline}
-%\end{equation}
-where the expected values $\xi_t(j,k) = P(S_t = k, S_{t-1} = j|\vc{O}_{1:T},
-\vc{z}_{1:T},\greekv{\theta}')$ and $\gamma_t(j) = P(S_t = j|\vc{O}_{1:T},
-\vc{z}_{1:T},\greekv{\theta}')$ can be computed effectively by the
-forward-backward algorithm \citep[see e.g.,][]{Rabiner1989}. The Maximisation
-step consists of the maximisation of (\ref{eq:Q}) for $\greekv{\theta}$. As the
-right hand side of (\ref{eq:Q}) consists of three separate parts, we can
-maximise separately for $\greekv{\theta}_1$, $\greekv{\theta}_2$ and
-$\greekv{\theta}_3$. In common models, maximisation for $\greekv{\theta}_1$ and
-$\greekv{\theta}_2$ is performed by the \code{nnet.default} routine in the
-\pkg{nnet} package \citep{Venables2002}, and maximisation for
-$\greekv{\theta}_3$ by the standard \code{glm} routine. Note that for the latter
-maximisation, the expected values $\gamma_t(j)$ are used as prior weights of the
+where the expected values $\xi_t(j,k) = P(S_t = k, S_{t-1} =
+j|\vc{O}_{1:T}, \vc{z}_{1:T},\greekv{\theta}')$ and $\gamma_t(j) =
+P(S_t = j|\vc{O}_{1:T}, \vc{z}_{1:T},\greekv{\theta}')$ can be
+computed effectively by the forward-backward algorithm \citep[see
+e.g.,][]{Rabiner1989}. The Maximisation step consists of the
+maximisation of (\ref{eq:Q}) for $\greekv{\theta}$. As the right hand
+side of (\ref{eq:Q}) consists of three separate parts, we can maximise
+separately for $\greekv{\theta}_1$, $\greekv{\theta}_2$ and
+$\greekv{\theta}_3$. In common models, maximisation for
+$\greekv{\theta}_1$ and $\greekv{\theta}_2$ is performed by the
+\code{nnet.default} routine in the \pkg{nnet} package
+\citep{Venables2002}, and maximisation for $\greekv{\theta}_3$ by the
+standard \code{glm} routine. Note that for the latter maximisation,
+the expected values $\gamma_t(j)$ are used as prior weights of the
observations $O^k_t$.
@@ -430,9 +429,11 @@
constraints to parameters can be problematic; in particular, EM can
lead to wrong parameter estimates when applying constraints. Hence,
in \pkg{depmixS4}, EM is used by default in unconstrained models, but
-otherwise, direct optimization is done using \pkg{Rdonlp2}
-\citep{Tamura2009,Spellucci2002}, because it handles general linear
-(in)equality constraints, and optionally also non-linear constraints.
+otherwise, direct optimization is used. Two options are available for
+direct optimization using package \pkg{Rdonlp2}
+\citep{Tamura2009,Spellucci2002}, or package \pkg{Rsolnp}. Both
+packages can handle general linear (in)equality constraints, and
+optionally also non-linear constraints.
%Need some more on EM and how/why it is justified to do separate weighted
%fits of the response models and transition and prior models.
Modified: papers/jss/dpx4Rev.tex
===================================================================
--- papers/jss/dpx4Rev.tex 2010-02-23 14:04:48 UTC (rev 361)
+++ papers/jss/dpx4Rev.tex 2010-02-23 14:05:21 UTC (rev 362)
@@ -166,7 +166,8 @@
The data considered here have the general form $\vc{O}_{1:T}=
(O_{1}^{1}, \ldots, O_{1}^{m}$, $O_{2}^{1}, \ldots, O_{2}^{m}$,
\ldots, $O_{T}^{1}, \ldots, O_{T}^{m})$ for an $m$-variate time series
-of length $T$. As an example, consider a time series of responses
+of length $T$. In the following, we use $\vc{O}_{t}$ as shorthand for
+$O_{t}^{1}, \ldots, O_{t}^{m}$. As an example, consider a time series of responses
generated by a single participant in a psychological response time
experiment. The data consists of three variables, response time,
response accuracy, and a covariate which is a pay-off variable
More information about the depmix-commits
mailing list