[Depmix-commits] r372 - papers/jss

Thu Feb 25 22:55:10 CET 2010

Author: ingmarvisser
Date: 2010-02-25 22:55:09 +0100 (Thu, 25 Feb 2010)
New Revision: 372

Modified:
   papers/jss/dpx4Rev.R
   papers/jss/dpx4Rev.Rnw
   papers/jss/dpx4Rev.tex
Log:
Minor updates in jss paper

Modified: papers/jss/dpx4Rev.R
===================================================================

--- papers/jss/dpx4Rev.R	2010-02-25 21:54:31 UTC (rev 371)
+++ papers/jss/dpx4Rev.R	2010-02-25 21:55:09 UTC (rev 372)
@@ -12,7 +12,7 @@
 library(depmixS4)
 data(speed)
 set.seed(1)
-mod <- depmix(rt~1, data=speed, nstates=2, trstart=runif(4))
+mod <- depmix(response=rt~1, data=speed, nstates=2, trstart=runif(4))
 
 
 ###################################################
@@ -24,7 +24,7 @@
 ###################################################
 ### chunk number 4: 
 ###################################################
-fm
+fm 
 
 
 ###################################################
@@ -45,7 +45,7 @@
 ###################################################
 ### chunk number 7: 
 ###################################################
-summary(fm)
+summary(fm, which="transition") 
 
 
 ###################################################
@@ -61,7 +61,7 @@
 ###################################################
 ### chunk number 9: 
 ###################################################
-summary(fm)
+summary(fm,which="response")
 
 
 ###################################################
@@ -80,7 +80,7 @@
 ###################################################
 ### chunk number 11: 
 ###################################################
-summary(fm)
+summary(fm,which="prior")
 
 
 ###################################################
@@ -151,12 +151,16 @@
 )
 
 
+
 ###################################################
 ### chunk number 18: 
 ###################################################
 setMethod("dens","exgaus",
     function(object,log=FALSE) {
-        dexGAUS(object at y, mu = predict(object), sigma = exp(object at parameters$sigma), nu = exp(object at parameters$nu), log = log)
+    	dexGAUS(object at y, mu = predict(object), 
+	sigma = exp(object at parameters$sigma), 
+	nu = exp(object at parameters$nu), 
+	log = log)
     }
 )
 
@@ -191,7 +195,7 @@
 		  "fixed" = {
 		      object at fixed <- as.logical(values)
 		  }
-		  )
+	    )
         names(object at parameters) <- nms
         return(object)
     }

Modified: papers/jss/dpx4Rev.Rnw
===================================================================
--- papers/jss/dpx4Rev.Rnw	2010-02-25 21:54:31 UTC (rev 371)
+++ papers/jss/dpx4Rev.Rnw	2010-02-25 21:55:09 UTC (rev 372)
@@ -516,16 +516,16 @@
 complete description of the experiment and the relevant theoretical
 background.
 
-The central hypothesis about this data is whether indeed it is best 
-described by two modes of responding rather than a single mode of 
-responding with a continuous trade-off between speed and accuracy. 
-The hallmark of having a discontinuity between slow versus speeded 
+The central hypothesis about this data is whether indeed it is best
+described by two modes of responding rather than a single mode of
+responding with a continuous trade-off between speed and accuracy.
+The hallmark of having a discontinuity between slow versus speeded
 responding is that the switching between the two modes is assymetric
-\citep[see e.g.][for a theoretical underpinning of this 
-claim]{Maas1992}. The \code{fit} help page of \pkg{depmixS4} provides 
-a number of examples in which the assymetricity of the switching 
-process is tested; those examples are discussed at length in 
-\citet{Visser2009b}. 
+\citep[see e.g.][for a theoretical underpinning of this
+claim]{Maas1992}.  The \code{fit} help page of \pkg{depmixS4} provides
+a number of examples in which the assymetricity of the switching
+process is tested; those examples and other candidate models are
+discussed at length in \citet{Visser2009b}.
 
 
 \subsection{A simple model}
@@ -554,37 +554,37 @@
 the variables from \code{response} are interpreted.  Third, the number
 of states of the model is given by \code{nstates=2}. 
 
-\subsubsection{Starting values}
 
-Note also that start values for the transition parameters are provided
-in this call using the \code{trstart} argument.  At this time, the
-package does not provide automatic starting values.  Starting values
-for parameters can be provided using three arguments: \code{instart}
-for the parameters relating to the prior probabilities,
-\coce{trstart}, relating the transition models, and \code{respstart}
-for the parameters of the response models.  Note that the starting
-values for the initial and transition models as well as of the
-multinomial logit response models are interpreted as {\em probabilities},
-and internally converted to multinomial logit parameters.
+\paragraph{Starting values} Note also that start values for the
+transition parameters are provided in this call using the
+\code{trstart} argument.  At this time, the package does not provide
+automatic starting values.  Starting values for parameters can be
+provided using three arguments: \code{instart} for the parameters
+relating to the prior probabilities, \coce{trstart}, relating the
+transition models, and \code{respstart} for the parameters of the
+response models.  Note that the starting values for the initial and
+transition models as well as of the multinomial logit response models
+are interpreted as {\em probabilities}, and internally converted to
+multinomial logit parameters.
 
 
-\subsubsection{Fitting the model and printing results}
-
-The \code{depmix} function returns an object of class \code{depmix}
-which contains the model specification, and not a fitted model. Hence,
-the model needs to be fitted by calling \code{fit}: 
-
+\paragraph{Fitting the model and printing results} The \code{depmix}
+function returns an object of class \code{depmix} which contains the
+model specification, and not a fitted model.  Hence, the model needs
+to be fitted by calling \code{fit}:
 <<>>=
 fm <- fit(mod)
 @
+
 The \code{fit}-function returns an object of class
 \code{depmix.fitted} which extends the \code{depmix} class, adding
 convergence information (and information about constraints if these
-were applied, see below).  The \code{print} method provides summary
-information on convergence, the log-likelihood and the AIC and BIC
-values:
+were applied, see below).  The class has \code{print} and
+\code{summary} methods to see the results.  The \code{print} method
+provides summary information on convergence, the log-likelihood and
+the AIC and BIC values:
 <<>>=
-fm
+fm 
 @
 
 These statistics can also be extracted using \code{logLik}, \code{AIC}
@@ -603,8 +603,6 @@
 summary(fm)
 @
 
-
-
 Since no further arguments were specified, the initial state, state
 transition and response distributions were set to their defaults
 (multinomial distributions for the first two, and a Gaussian
@@ -640,7 +638,7 @@
 Applying \code{summary} to the \code{fit}ted object gives (only
 transition models printed here):
 <<>>=
-summary(fm) 
+summary(fm, which="transition") 
 @
 
 The summary provides all parameters of the model, also the (redundant)
@@ -669,7 +667,7 @@
 This provides the following fitted model parameters (only the 
 response parameters are given here): 
 <<>>=
-summary(fm)
+summary(fm,which="response")
 @
 
 As can be seen, state 1 has fast response times and accuracy is
@@ -710,7 +708,6 @@
 \end{center}
 \end{figure}
 
-
 Children in the lower grades of primary school are known to ignore the
 distance dimension, and base their answer only on the number of
 weights on each side.  Hence, they would typically provide the wrong
@@ -751,9 +748,14 @@
 The summary of the \code{fit}ted model gives the following (only the
 prior model is shown here):
 <<>>=
-summary(fm)
+summary(fm,which="prior")
 @
 
+%
+% add more explanation about the parameters, possibly add a plot of 
+% the class membership as function of age
+%
+
 Hence at young ages, children have a high probability of incorrect 
 responding in class~1, whereas the prior probability for class~2 
 increases with age. 
@@ -761,20 +763,20 @@
 
 \subsection{Fixing and constraining parameters}
 
-Using package \pkg{Rdonlp2} by \citet{Tamura2009}, parameters may be
-fitted subject to general linear (in-)equality constraints.
-Constraining and fixing parameters is done using the \code{conpat}
-argument to the \code{fit}-function, which specifies for each
-parameter in the model whether it's fixed (0) or free (1 or higher).
-Equality constraints can be imposed by having two parameters have the
-same number in the \code{conpat} vector.  When only fixed values are
-required, the \code{fixed} argument can be used instead of
-\code{conpat}, with zeroes for fixed parameters and other values (ones
-e.g.) for non-fixed parameters.  Fitting the models subject to these
-constraints is handled by the optimization routine \code{donlp2}.  To
-be able to construct the \code{conpat} and/or \code{fixed} vectors one
-needs the correct ordering of parameters which is briefly discussed
-next before proceeding with an example.
+Using package \pkg{Rdonlp2} by \citet{Tamura2009} or \pkg{Rsolnp}
+\citet{ZZZZ???}, parameters may be fitted subject to general linear
+(in-)equality constraints.  Constraining and fixing parameters is done
+using the \code{conpat} argument to the \code{fit}-function, which
+specifies for each parameter in the model whether it's fixed (0) or
+free (1 or higher).  Equality constraints can be imposed by having two
+parameters have the same number in the \code{conpat} vector.  When
+only fixed values are required, the \code{fixed} argument can be used
+instead of \code{conpat}, with zeroes for fixed parameters and other
+values (ones e.g.) for non-fixed parameters.  Fitting the models
+subject to these constraints is handled by the optimization routine
+\code{donlp2}.  To be able to construct the \code{conpat} and/or
+\code{fixed} vectors one needs the correct ordering of parameters
+which is briefly discussed next before proceeding with an example.
 
 \paragraph{Parameter numbering} When using the \code{conpat} and
 \code{fixed} arguments, complete parameter vectors should be supplied,
@@ -916,12 +918,18 @@
     mod
   }
 )
+
 @
 
+
+
 <<echo=FALSE>>=
 setMethod("dens","exgaus",
     function(object,log=FALSE) {
-        dexGAUS(object at y, mu = predict(object), sigma = exp(object at parameters$sigma), nu = exp(object at parameters$nu), log = log)
+    	dexGAUS(object at y, mu = predict(object), 
+	sigma = exp(object at parameters$sigma), 
+	nu = exp(object at parameters$nu), 
+	log = log)
     }
 )
 
@@ -956,7 +964,7 @@
 		  "fixed" = {
 		      object at fixed <- as.logical(values)
 		  }
-		  )
+	    )
         names(object at parameters) <- nms
         return(object)
     }

Modified: papers/jss/dpx4Rev.tex
===================================================================
--- papers/jss/dpx4Rev.tex	2010-02-25 21:54:31 UTC (rev 371)
+++ papers/jss/dpx4Rev.tex	2010-02-25 21:55:09 UTC (rev 372)
@@ -167,29 +167,18 @@
 (O_{1}^{1}, \ldots, O_{1}^{m}$, $O_{2}^{1}, \ldots, O_{2}^{m}$,
 \ldots, $O_{T}^{1}, \ldots, O_{T}^{m})$ for an $m$-variate time series
 of length $T$.  In the following, we use $\vc{O}_{t}$ as shorthand for
-$O_{t}^{1}, \ldots, O_{t}^{m}$. As an example, consider a time series of responses
-generated by a single participant in a psychological response time
-experiment.  The data consists of three variables, response time,
-response accuracy, and a covariate which is a pay-off variable
-reflecting the relative reward for speeded and/or accurate responding.
-These variables are measured on 168, 134 and 137 occasions
-respectively (the first part of this series is plotted in
+$O_{t}^{1}, \ldots, O_{t}^{m}$.  As an example, consider a time series
+of responses generated by a single participant in a psychological
+response time experiment.  The data consists of three variables,
+response time, response accuracy, and a covariate which is a pay-off
+variable reflecting the relative reward for speeded and/or accurate
+responding.  These variables are measured on 168, 134 and 137
+occasions respectively (the first part of this series is plotted in
 Figure~\ref{fig:speed}).  These data are more fully described in
 \citet{Dutilh2009}.
 
 \begin{figure}[htbp]
 \begin{center}
-\begin{Schunk}
-\begin{Soutput}
-##
-## Markov Chain Monte Carlo Package (MCMCpack)
-## Copyright (C) 2003-2010 Andrew D. Martin, Kevin M. Quinn, and Jong Hee Park
-##
-## Support provided by the U.S. National Science Foundation
-## (Grants SES-0350646 and SES-0350613)
-##
-\end{Soutput}
-\end{Schunk}
 \includegraphics{dpx4Rev-001}
 	 \caption{Response times (rt), accuracy (corr) and pay-off values (Pacc) for
 	  the first series of responses in dataset \code{speed}.}
@@ -350,14 +339,12 @@
 To compute the log-likelihood, \cite{Lystig2002} define the following 
 (forward) recursion:
 \begin{align}
-	\phi_{1}(j) &:= \Prob(\vc{O}_{1}, S_{1}=j) = \pi_{j} b_{j}(\vc{O}_{1}) 
+	\phi_{1}(j) &:= \Prob(\vc{O}_{1}, S_{1}=j) = \pi_{j} \vc{b}_{j}(\vc{O}_{1}) 
 	\label{eq:fwd1} \\
-%\begin{split}
 	\phi_{t}(j) &:= \Prob(\vc{O}_{t}, S_{t}=j|\vc{O}_{1:(t-1)}) %\\
-	= \sum_{i=1}^{N} [\phi_{t-1}(i)a_{ij}b_{j}(\vc{O}_{t})] \times 
+	= \sum_{i=1}^{N} [\phi_{t-1}(i)a_{ij} \vc{b}_{j}(\vc{O}_{t})] \times 
 (\Phi_{t-1})^{-1},
 	\label{eq:fwdt} 
-%\end{split} 
 \end{align}
 where $\Phi_{t}=\sum_{i=1}^{N} \phi_{t}(i)$. Combining 
 $\Phi_{t}=\Prob(\vc{O}_{t}|\vc{O}_{1:(t-1)})$, and 
@@ -374,7 +361,7 @@
 Parameters are estimated in \pkg{depmixS4} using the EM algorithm or
 through the use of a general Newton-Raphson optimizer.  In the EM
 algorithm, parameters are estimated by iteratively maximising the
-expected joint likelihood of the parameters given the observations and
+expected joint log-likelihood of the parameters given the observations and
 states.  Let $\greekv{\theta} = (\greekv{\theta}_1,
 \greekv{\theta}_2,\greekv{\theta}_3)$ be the general parameter vector
 consisting of three subvectors with parameters for the prior model,
@@ -389,14 +376,13 @@
 This likelihood depends on the unobserved states $\vc{S}_{1:T}$. In the 
 Expectation step, we replace these with their expected values given a set of 
 (initial) parameters $\greekv{\theta}' = (\greekv{\theta}'_1, 
-\greekv{\theta}'_2,\greekv{\theta}'_3)$ and observations $\vc{O}_{1:T}$. The expected 
-log likelihood 
+\greekv{\theta}'_2,\greekv{\theta}'_3)$ and observations $\vc{O}_{1:T}$. 
+The expected log-likelihood:
 \begin{equation}
 Q(\greekv{\theta},\greekv{\theta}') = E_{\greekv{\theta}'} 
 (\log \Prob(\vc{O}_{1:T},\vc{S}_{1:T}|\vc{O}_{1:T},\vc{z}_{1:T},\greekv{\theta}))
 \end{equation}
 can be written as
-%\begin{equation}
 \begin{multline}
 \label{eq:Q}
 Q(\greekv{\theta},\greekv{\theta}') = 
@@ -404,21 +390,22 @@
 + \sum_{t=2}^T \sum_{j=1}^n \sum_{k=1}^n \xi_t(j,k) \log \Prob(S_t = k|S_{t-1} 
 = j,\vc{z}_{t-1},\greekv{\theta}_2)  \\
  + \sum_{t=1}^T \sum_{j=1}^n \sum_{k=1}^m \gamma_t(j) 
-\ln \Prob(O^k_t|S_t=j,\vc{z}_t,\greekv{\theta}_3),
+\log \Prob(O^k_t|S_t=j,\vc{z}_t,\greekv{\theta}_3),
 \end{multline}
-%\end{equation}
-where the expected values $\xi_t(j,k) =  P(S_t = k, S_{t-1} = j|\vc{O}_{1:T},
-\vc{z}_{1:T},\greekv{\theta}')$ and $\gamma_t(j) = P(S_t = j|\vc{O}_{1:T},
-\vc{z}_{1:T},\greekv{\theta}')$ can be computed effectively by the 
-forward-backward algorithm \citep[see e.g.,][]{Rabiner1989}. The Maximisation 
-step consists of the maximisation of (\ref{eq:Q}) for $\greekv{\theta}$. As the 
-right hand side of (\ref{eq:Q}) consists of three separate parts, we can 
-maximise separately for $\greekv{\theta}_1$, $\greekv{\theta}_2$ and 
-$\greekv{\theta}_3$. In common models, maximisation for $\greekv{\theta}_1$ and 
-$\greekv{\theta}_2$ is performed by the \code{nnet.default} routine in the 
-\pkg{nnet} package \citep{Venables2002}, and maximisation for 
-$\greekv{\theta}_3$ by the standard \code{glm} routine. Note that for the latter 
-maximisation, the expected values $\gamma_t(j)$ are used as prior weights of the 
+where the expected values $\xi_t(j,k) = P(S_t = k, S_{t-1} =
+j|\vc{O}_{1:T}, \vc{z}_{1:T},\greekv{\theta}')$ and $\gamma_t(j) =
+P(S_t = j|\vc{O}_{1:T}, \vc{z}_{1:T},\greekv{\theta}')$ can be
+computed effectively by the forward-backward algorithm \citep[see
+e.g.,][]{Rabiner1989}.  The Maximisation step consists of the
+maximisation of (\ref{eq:Q}) for $\greekv{\theta}$.  As the right hand
+side of (\ref{eq:Q}) consists of three separate parts, we can maximise
+separately for $\greekv{\theta}_1$, $\greekv{\theta}_2$ and
+$\greekv{\theta}_3$.  In common models, maximisation for
+$\greekv{\theta}_1$ and $\greekv{\theta}_2$ is performed by the
+\code{nnet.default} routine in the \pkg{nnet} package
+\citep{Venables2002}, and maximisation for $\greekv{\theta}_3$ by the
+standard \code{glm} routine.  Note that for the latter maximisation,
+the expected values $\gamma_t(j)$ are used as prior weights of the
 observations $O^k_t$.
 
 
@@ -428,19 +415,43 @@
 %\label{eq:MLA}
 %\hat{a}_{jk} = \frac{1}{N(T-1)} \sum_{i=1}^N \sum_{t=2}^T  \frac{\xi^i_t(j,k)}{\gamma_{t-1}^i(j)}.
 %\end{equation}
-%Fixing certain elements of $\mat{A}$ to 0, as in the DT version, does not affect the estimation of the other elements. When elements $a_{jk}$ are assumed identical, we simply extend the summation in (\ref{eq:MLA}) to include all those elements, changing the denominator $N(T-1)$ accordingly.
 
-%To estimate $\greekv{\lambda}$, we note that the term containing this parameter has the form of a weighted likelihood, where $\gamma_t^i(j)$ can be interpreted as the number of replications of $r^i_t$. Hence, we can rely on standard ML estimates of the logistic regression coefficients $\lambda_i$, using the values $\gamma_t^i(j)$ as ``case weights'' \cite<e.g.,>{Agresti02,McCullagh83}\footnote{To be more specific, we replicate each observation $r^i_t$ a total of 6 times, once for each of the states besides the random state (which offers no information regarding $\lambda_i$). For the $j$-th replication (corresponding to the $j$-th state), we used $v_j(\vec{x}_t)$ as a predictor variable and $\gamma^i_t(j)$ as a case weight. All these replications were used to obtain the maximum likelihood estimate of $\lambda$ from a single GLM, using the ``glm.fit'' function in R \cite{R}.}. The resulting parameter estimates are then used in a new Expectation step, followed by another Maximisation step, until convergence. Under mild regularity conditions, it can be shown that the EM algorithm converges to a (local) maximum of the likelihood \cite{EM}. 
+%Fixing certain elements of $\mat{A}$ to 0, as in the DT version, does
+%not affect the estimation of the other elements.  When elements
+%$a_{jk}$ are assumed identical, we simply extend the summation in
+%(\ref{eq:MLA}) to include all those elements, changing the
+%denominator $N(T-1)$ accordingly.
 
+%To estimate $\greekv{\lambda}$, we note that the term containing this
+%parameter has the form of a weighted likelihood, where
+%$\gamma_t^i(j)$ can be interpreted as the number of replications of
+%$r^i_t$.  Hence, we can rely on standard ML estimates of the logistic
+%regression coefficients $\lambda_i$, using the values $\gamma_t^i(j)$
+%as ``case weights'' \cite<e.g.,>{Agresti02,McCullagh83}\footnote{To
+%be more specific, we replicate each observation $r^i_t$ a total of 6
+%times, once for each of the states besides the random state (which
+%offers no information regarding $\lambda_i$).  For the $j$-th
+%replication (corresponding to the $j$-th state), we used
+%$v_j(\vec{x}_t)$ as a predictor variable and $\gamma^i_t(j)$ as a
+%case weight.  All these replications were used to obtain the maximum
+%likelihood estimate of $\lambda$ from a single GLM, using the
+%``glm.fit'' function in R \cite{R}.}.  The resulting parameter
+%estimates are then used in a new Expectation step, followed by
+%another Maximisation step, until convergence.  Under mild regularity
+%conditions, it can be shown that the EM algorithm converges to a
+%(local) maximum of the likelihood \cite{EM}.
 
+
 The EM algorithm however has some drawbacks.  First, it can be slow to
 converge towards the end of optimization.  Second, applying
 constraints to parameters can be problematic; in particular, EM can
 lead to wrong parameter estimates when applying constraints.  Hence,
 in \pkg{depmixS4}, EM is used by default in unconstrained models, but
-otherwise, direct optimization is done using \pkg{Rdonlp2}
-\citep{Tamura2009,Spellucci2002}, because it handles general linear
-(in)equality constraints, and optionally also non-linear constraints.
+otherwise, direct optimization is used.  Two options are available for
+direct optimization using package \pkg{Rdonlp2}
+\citep{Tamura2009,Spellucci2002}, or package \pkg{Rsolnp}.  Both
+packages can handle general linear (in)equality constraints, and
+optionally also non-linear constraints.
 
 %Need some more on EM and how/why it is justified to do separate weighted
 %fits of the response models and transition and prior models. 
@@ -454,7 +465,7 @@
 \begin{enumerate}
 	\item model specification with function \code{depmix} (or with \code{mix}
 	for latent class and finite mixture models, see example below on adding
-	covariates to prior probabilities)
+	covariates to prior probabilities in section~\ref{sec:prior})
 	
 	\item  model fitting with function \code{fit}
 \end{enumerate}
@@ -467,30 +478,102 @@
 
 Throughout this article a data set called \code{speed} is used.  As
 already indicated in the Introduction, it consists of three time
-series with three variables: response time, accuracy, and a covariate
-Pacc which defines the relative pay-off for speeded versus accurate
-responding.  The participant in this experiment switches between fast
-responding at chance level and relatively slower responding at a high
-level of accuracy.  Interesting hypotheses to test are: is the
-switching regime symmetric?  Is there evidence for two states or does
-one state suffice?  Is the guessing state actually a guessing state,
-i.e., is the probability of a correct response at chance level (0.5)?
+series with three variables: response time, accuracy, and a covariate,
+Pacc, which defines the relative pay-off for speeded versus accurate
+responding.  Before describing some of the models that are fitted
+to these data, we provide a brief sketch of the reasons for gathering
+these data in the first place.
 
+Response times are a very common dependent variable in psychological
+experiments and hence form the basis for infernce about many
+psychological processes.  A potential threat to such inference based
+on response times is formed by the speed-accuracy trade-off: different
+participants in an experiment may respond differently to typical
+instructions to `respond as fast and accurate as possible'.  A popular
+model which takes the speed-accuracy trade-off into account is the
+diffusion model \citep{Ratcliff1978}, which has proven to provide
+accurate descriptions of response times in many different settings.
+
+One potential problem with the diffusion model is that it predicts a
+continuous trade-off between speed and accuracy of responding, i.e.,
+when participants are pressed to respond faster and faster, the
+diffusion model predicts that this would lead to a gradual decrease in
+accuracy.  The \code{speed} data set that we analyze below was
+gathered to test this hypothesis versus the alternative hypothesis
+stating that there is a sudden transition from slow and accurate
+responding to fast responding at chance level.  At each trial of the
+experiment, the participant is shown the current setting of the
+relative reward for speed versus accuracy.  The bottom panel of
+figure~ref{fig:speed} shows the values of this variable.  The main
+question in this experiment was what would happen when this reward
+variable would change from reward for accuracy only to reward for
+speed only.  The \code{speed} data that we analyse here are from
+participant A in Experiment 1 in \citet{Dutilh2009}, who provide a
+complete description of the experiment and the relevant theoretical
+background.
+
+The central hypothesis about this data is whether indeed it is best
+described by two modes of responding rather than a single mode of
+responding with a continuous trade-off between speed and accuracy.
+The hallmark of having a discontinuity between slow versus speeded
+responding is that the switching between the two modes is assymetric
+\citep[see e.g.][for a theoretical underpinning of this
+claim]{Maas1992}.  The \code{fit} help page of \pkg{depmixS4} provides
+a number of examples in which the assymetricity of the switching
+process is tested; those examples and other candidate models are
+discussed at length in \citet{Visser2009b}.
+
+
 \subsection{A simple model}
 
 A dependent mixture model is defined by the number of states and the
 initial state, state transition, and response distribution functions.
 A dependent mixture model can be created with the
 \code{depmix}-function as follows:
+\begin{Schunk}
+\begin{Sinput}
+> library(depmixS4)
+> data(speed)
+> set.seed(1)
+> mod <- depmix(response = rt ~ 1, data = speed, nstates = 2, trstart = runif(4))
+\end{Sinput}
+\end{Schunk}
 
+The first line of code loads the \pkg{depmixS4} package and 
+\code{data(speed)} loads the \code{speed} data set. The line
+\code{set.seed(1)} is necessary to get starting values that will 
+result in the right model, see more on starting values below. 
 
-The \code{depmix} function returns an object of class \code{depmix}
-which contains the model specification (and not a fitted model!).
-Note also that start values for the transition parameters are provided
-in this call using the \code{trstart} argument.  At this time, the
-package does not provide automatic starting values.
+The call to \code{depmix} specifies the model with a number of
+arguments.  The \code{response} argument is used to specify the
+response variable, possibly with covariates, in the familiar format
+using formulae such as in \code{lm} or \code{glm} models.  The second
+argument, \code{data=speed}, provides the \code{data.frame} in which
+the variables from \code{response} are interpreted.  Third, the number
+of states of the model is given by \code{nstates=2}. 
 
+
+\paragraph{Starting values} Note also that start values for the
+transition parameters are provided in this call using the
+\code{trstart} argument.  At this time, the package does not provide
+automatic starting values.  Starting values for parameters can be
+provided using three arguments: \code{instart} for the parameters
+relating to the prior probabilities, \coce{trstart}, relating the
+transition models, and \code{respstart} for the parameters of the
+response models.  Note that the starting values for the initial and
+transition models as well as of the multinomial logit response models
+are interpreted as {\em probabilities}, and internally converted to
+multinomial logit parameters.
+
+
+\paragraph{Fitting the model and printing results} The \code{depmix}
+function returns an object of class \code{depmix} which contains the
+model specification, and not a fitted model.  Hence, the model needs
+to be fitted by calling \code{fit}:
 \begin{Schunk}
+\begin{Sinput}
+> fm <- fit(mod)
+\end{Sinput}
 \begin{Soutput}
 iteration 0 logLik: -305.3302 
 iteration 5 logLik: -305.3251 
@@ -506,22 +589,26 @@
 iteration 50 logLik: -84.34175 
 \end{Soutput}
 \end{Schunk}
+
 The \code{fit}-function returns an object of class
 \code{depmix.fitted} which extends the \code{depmix} class, adding
 convergence information (and information about constraints if these
-were applied, see below).  The \code{print} method provides summary
-information on convergence, the log-likelihood and the AIC and BIC
-values:
+were applied, see below).  The class has \code{print} and
+\code{summary} methods to see the results.  The \code{print} method
+provides summary information on convergence, the log-likelihood and
+the AIC and BIC values:
 \begin{Schunk}
+\begin{Sinput}
+> fm
+\end{Sinput}
 \begin{Soutput}
-Convergence info: Log likelihood converged to within tol. 
+Convergence info: Log likelihood converged to within tol. (relative change crit.) 
 'log Lik.' -84.34175 (df=7)
 AIC:  182.6835 
 BIC:  211.275 
 \end{Soutput}
 \end{Schunk}
 
-
 These statistics can also be extracted using \code{logLik}, \code{AIC}
 and \code{BIC}, respectively.  By comparison, a 1-state model for
 these data, i.e. assuming there is no mixture, has a log-likelihood of
@@ -533,8 +620,11 @@
 
 The \code{summary} method of \code{fit}ted models provides the
 parameter estimates, first for the prior probabilities model, second
-for the transition model, and third for the response models.
+for the transition models, and third for the response models.
 \begin{Schunk}
+\begin{Sinput}
+> summary(fm)
+\end{Sinput}
 \begin{Soutput}
 Initial state probabilties model 
 Model of type multinomial, formula: ~1
@@ -578,14 +668,13 @@
 \end{Soutput}
 \end{Schunk}
 
+Since no further arguments were specified, the initial state, state
+transition and response distributions were set to their defaults
+(multinomial distributions for the first two, and a Gaussian
+distribution for the response).
 
-Note that, since no further arguments were specified, the initial
-state, state transition and response distributions were set to their
-defaults (multinomial distributions for the first two, and Gaussian
-distributions for the response distributions).
+\subsection{Covariates on transition parameters\label{sec:trans}}
 
-\subsection{Covariates on transition parameters}
-
 By default, the transition probabilities and the initial state
 probabilities are parameterized using the multinomial logistic model.
 More precisely, each row of the transition matrix is parameterized by
@@ -605,6 +694,12 @@
 probabilities can be specified using a one-sided formula as in the
 following example:
 \begin{Schunk}
+\begin{Sinput}
+> set.seed(1)
+> mod <- depmix(rt ~ 1, data = speed, nstates = 2, family = gaussian(), 
++     transition = ~scale(Pacc), instart = runif(2))
+> fm <- fit(mod)
+\end{Sinput}
 \begin{Soutput}
 iteration 0 logLik: -305.3304 
 iteration 5 logLik: -305.2360 
@@ -617,18 +712,13 @@
 \end{Soutput}
 \end{Schunk}
 
-Applying \code{summary} to the \code{fit}ted object gives (only transition models
-printed here): 
+Applying \code{summary} to the \code{fit}ted object gives (only
+transition models printed here):
 \begin{Schunk}
+\begin{Sinput}
+> summary(fm, which = "transition")
+\end{Sinput}
 \begin{Soutput}
-Initial state probabilties model 
-Model of type multinomial, formula: ~1
-Coefficients: 
-     [,1]     [,2]
-[1,]    0 10.71779
-Probalities at zero values of the covariates.
-2.214681e-05 0.9999779 
-
 Transition model for state (component) 1 
 Model of type multinomial, formula: ~scale(Pacc)
 Coefficients: 
@@ -646,24 +736,6 @@
 [2,]    0 3.570856
 Probalities at zero values of the covariates.
 0.07788458 0.9221154 
-
-
-Response model(s) for state 1 
-
-Response model for response 1 
-Model of type gaussian, formula: rt ~ 1
-Coefficients: 
-[1] 5.512179
-sd  0.1921098 
-
-
-Response model(s) for state 2 
-
-Response model for response 1 
-Model of type gaussian, formula: rt ~ 1
-Coefficients: 
-[1] 6.3885
-sd  0.2402693 
 \end{Soutput}
 \end{Schunk}
 
@@ -683,6 +755,13 @@
 variable in the \code{speed} data can be modelled with a multinomial
 by specifying the following:
 \begin{Schunk}
+\begin{Sinput}
+> set.seed(1)
+> mod <- depmix(list(rt ~ 1, corr ~ 1), data = speed, nstates = 2, 
++     family = list(gaussian(), multinomial()), transition = ~scale(Pacc), 
++     instart = runif(2))
+> fm <- fit(mod)
+\end{Sinput}
 \begin{Soutput}
 iteration 0 logLik: -554.6365 
 iteration 5 logLik: -553.9195 
@@ -691,56 +770,32 @@
 iteration 20 logLik: -248.9757 
 iteration 25 logLik: -248.9724 
 iteration 30 logLik: -248.9723 
-iteration 33 logLik: -248.9723 
+iteration 31 logLik: -248.9723 
 \end{Soutput}
 \end{Schunk}
 
 This provides the following fitted model parameters (only the 
 response parameters are given here): 
 \begin{Schunk}
+\begin{Sinput}
+> summary(fm, which = "response")
+\end{Sinput}
 \begin{Soutput}
-Initial state probabilties model 
-Model of type multinomial, formula: ~1
-Coefficients: 
-     [,1]     [,2]
-[1,]    0 10.24230
-Probalities at zero values of the covariates.
-3.562961e-05 0.9999644 
-
-Transition model for state (component) 1 
-Model of type multinomial, formula: ~scale(Pacc)
-Coefficients: 
-     [,1]       [,2]
-[1,]    0 -0.8924638
-[2,]    0  2.1293542
-Probalities at zero values of the covariates.
-0.7093984 0.2906016 
-
-Transition model for state (component) 2 
-Model of type multinomial, formula: ~scale(Pacc)
-Coefficients: 
-     [,1]     [,2]
-[1,]    0 2.390396
-[2,]    0 3.690785
-Probalities at zero values of the covariates.
-0.08390802 0.916092 
-
-
 Response model(s) for state 1 
 
 Response model for response 1 
 Model of type gaussian, formula: rt ~ 1
 Coefficients: 
-[1] 5.52169
-sd  0.2028857 
+[1] 5.521695
+sd  0.2028917 
 
 Response model for response 2 
 Model of type multinomial, formula: corr ~ 1
 Coefficients: 
      [,1]      [,2]
-[1,]    0 0.1030554
+[1,]    0 0.1030452
 Probalities at zero values of the covariates.
-0.4742589 0.5257411 
+0.4742615 0.5257385 
 
 
 Response model(s) for state 2 
@@ -748,16 +803,16 @@
 Response model for response 1 
 Model of type gaussian, formula: rt ~ 1
 Coefficients: 
-[1] 6.39369
-sd  0.2373650 
+[1] 6.393693
+sd  0.2373635 
 
 Response model for response 2 
 Model of type multinomial, formula: corr ~ 1
 Coefficients: 
      [,1]     [,2]
-[1,]    0 2.245514
+[1,]    0 2.245562
[TRUNCATED]

To get the complete diff run:
    svnlook diff /svnroot/depmix -r 372