[Depmix-commits] r367 - papers/jss
noreply at r-forge.r-project.org
noreply at r-forge.r-project.org
Wed Feb 24 17:08:41 CET 2010
Author: ingmarvisser
Date: 2010-02-24 17:08:41 +0100 (Wed, 24 Feb 2010)
New Revision: 367
Modified:
papers/jss/dpx4Rev.Rnw
Log:
Added notes on balance and speed data sets.
Modified: papers/jss/dpx4Rev.Rnw
===================================================================
--- papers/jss/dpx4Rev.Rnw 2010-02-24 14:49:50 UTC (rev 366)
+++ papers/jss/dpx4Rev.Rnw 2010-02-24 16:08:41 UTC (rev 367)
@@ -419,11 +419,33 @@
%\label{eq:MLA}
%\hat{a}_{jk} = \frac{1}{N(T-1)} \sum_{i=1}^N \sum_{t=2}^T \frac{\xi^i_t(j,k)}{\gamma_{t-1}^i(j)}.
%\end{equation}
-%Fixing certain elements of $\mat{A}$ to 0, as in the DT version, does not affect the estimation of the other elements. When elements $a_{jk}$ are assumed identical, we simply extend the summation in (\ref{eq:MLA}) to include all those elements, changing the denominator $N(T-1)$ accordingly.
-%To estimate $\greekv{\lambda}$, we note that the term containing this parameter has the form of a weighted likelihood, where $\gamma_t^i(j)$ can be interpreted as the number of replications of $r^i_t$. Hence, we can rely on standard ML estimates of the logistic regression coefficients $\lambda_i$, using the values $\gamma_t^i(j)$ as ``case weights'' \cite<e.g.,>{Agresti02,McCullagh83}\footnote{To be more specific, we replicate each observation $r^i_t$ a total of 6 times, once for each of the states besides the random state (which offers no information regarding $\lambda_i$). For the $j$-th replication (corresponding to the $j$-th state), we used $v_j(\vec{x}_t)$ as a predictor variable and $\gamma^i_t(j)$ as a case weight. All these replications were used to obtain the maximum likelihood estimate of $\lambda$ from a single GLM, using the ``glm.fit'' function in R \cite{R}.}. The resulting parameter estimates are then used in a new Expectation step, followed by another Maximisation step, until convergence. Under mild regularity conditions, it can be shown that the EM algorithm converges to a (local) maximum of the likelihood \cite{EM}.
+%Fixing certain elements of $\mat{A}$ to 0, as in the DT version, does
+%not affect the estimation of the other elements. When elements
+%$a_{jk}$ are assumed identical, we simply extend the summation in
+%(\ref{eq:MLA}) to include all those elements, changing the
+%denominator $N(T-1)$ accordingly.
+%To estimate $\greekv{\lambda}$, we note that the term containing this
+%parameter has the form of a weighted likelihood, where
+%$\gamma_t^i(j)$ can be interpreted as the number of replications of
+%$r^i_t$. Hence, we can rely on standard ML estimates of the logistic
+%regression coefficients $\lambda_i$, using the values $\gamma_t^i(j)$
+%as ``case weights'' \cite<e.g.,>{Agresti02,McCullagh83}\footnote{To
+%be more specific, we replicate each observation $r^i_t$ a total of 6
+%times, once for each of the states besides the random state (which
+%offers no information regarding $\lambda_i$). For the $j$-th
+%replication (corresponding to the $j$-th state), we used
+%$v_j(\vec{x}_t)$ as a predictor variable and $\gamma^i_t(j)$ as a
+%case weight. All these replications were used to obtain the maximum
+%likelihood estimate of $\lambda$ from a single GLM, using the
+%``glm.fit'' function in R \cite{R}.}. The resulting parameter
+%estimates are then used in a new Expectation step, followed by
+%another Maximisation step, until convergence. Under mild regularity
+%conditions, it can be shown that the EM algorithm converges to a
+%(local) maximum of the likelihood \cite{EM}.
+
The EM algorithm however has some drawbacks. First, it can be slow to
converge towards the end of optimization. Second, applying
constraints to parameters can be problematic; in particular, EM can
@@ -447,7 +469,7 @@
\begin{enumerate}
\item model specification with function \code{depmix} (or with \code{mix}
for latent class and finite mixture models, see example below on adding
- covariates to prior probabilities)
+ covariates to prior probabilities in section~\ref{sec:prior})
\item model fitting with function \code{fit}
\end{enumerate}
@@ -460,35 +482,98 @@
Throughout this article a data set called \code{speed} is used. As
already indicated in the Introduction, it consists of three time
-series with three variables: response time, accuracy, and a covariate
-Pacc which defines the relative pay-off for speeded versus accurate
-responding. The participant in this experiment switches between fast
-responding at chance level and relatively slower responding at a high
-level of accuracy. Interesting hypotheses to test are: is the
-switching regime symmetric? Is there evidence for two states or does
-one state suffice? Is the guessing state actually a guessing state,
-i.e., is the probability of a correct response at chance level (0.5)?
+series with three variables: response time, accuracy, and a covariate,
+Pacc, which defines the relative pay-off for speeded versus accurate
+responding. Before describing some of the models that are fitted
+to these data, we provide a brief sketch of the reasons for gathering
+these data in the first place.
+Response times are a very common dependent variable in psychological
+experiments and hence form the basis for infernce about many
+psychological processes. A potential threat to such inference based
+on response times is formed by the speed-accuracy trade-off: different
+participants in an experiment may respond differently to typical
+instructions to `respond as fast and accurate as possible'. A popular
+model which takes the speed-accuracy trade-off into account is the
+diffusion model \citep{Ratcliff1978}, which has proven to provide
+accurate descriptions of response times in many different settings.
+
+One potential problem with the diffusion model is that it predicts a
+continuous trade-off between speed and accuracy of responding, i.e.,
+when participants are pressed to respond faster and faster, the
+diffusion model predicts that this would lead to a gradual decrease in
+accuracy. The \code{speed} data set that we analyze below was
+gathered to test this hypothesis versus the alternative hypothesis
+stating that there is a sudden transition from slow and accurate
+responding to fast responding at chance level. At each trial of the
+experiment, the participant is shown the current setting of the
+relative reward for speed versus accuracy. The bottom panel of
+figure~ref{fig:speed} shows the values of this variable. The main
+question in this experiment was what would happen when this reward
+variable would change from reward for accuracy only to reward for
+speed only. The \code{speed} data that we analyse here are from
+participant A in Experiment 1 in \citet{Dutilh2009}, who provide a
+complete description of the experiment and the relevant theoretical
+background.
+
+The central hypothesis about this data is whether indeed it is best
+described by two modes of responding rather than a single mode of
+responding with a continuous trade-off between speed and accuracy.
+The hallmark of having a discontinuity between slow versus speeded
+responding is that the switching between the two modes is assymetric
+\citep[see e.g.][for a theoretical underpinning of this
+claim]{Maas1992}. The \code{fit} help page of \pkg{depmixS4} provides
+a number of examples in which the assymetricity of the switching
+process is tested; those examples are discussed at length in
+\citet{Visser2009b}.
+
+
\subsection{A simple model}
A dependent mixture model is defined by the number of states and the
initial state, state transition, and response distribution functions.
A dependent mixture model can be created with the
\code{depmix}-function as follows:
-
<<>>=
library(depmixS4)
data(speed)
set.seed(1)
-mod <- depmix(rt~1, data=speed, nstates=2, trstart=runif(4))
+mod <- depmix(response=rt~1, data=speed, nstates=2, trstart=runif(4))
@
-The \code{depmix} function returns an object of class \code{depmix}
-which contains the model specification (and not a fitted model!).
+The first line of code loads the \pkg{depmixS4} package and
+\code{data(speed)} loads the \code{speed} data set. The line
+\code{set.seed(1)} is necessary to get starting values that will
+result in the right model, see more on starting values below.
+
+The call to \code{depmix} specifies the model with a number of
+arguments. The \code{response} argument is used to specify the
+response variable, possibly with covariates, in the familiar format
+using formulae such as in \code{lm} or \code{glm} models. The second
+argument, \code{data=speed}, provides the \code{data.frame} in which
+the variables from \code{response} are interpreted. Third, the number
+of states of the model is given by \code{nstates=2}.
+
+\subsubsection{Starting values}
+
Note also that start values for the transition parameters are provided
in this call using the \code{trstart} argument. At this time, the
-package does not provide automatic starting values.
+package does not provide automatic starting values. Starting values
+for parameters can be provided using three arguments: \code{instart}
+for the parameters relating to the prior probabilities,
+\coce{trstart}, relating the transition models, and \code{respstart}
+for the parameters of the response models. Note that the starting
+values for the initial and transition models as well as of the
+multinomial logit response models are interpreted as {\em probabilities},
+and internally converted to multinomial logit parameters.
+
+\subsubsection{Fitting the model and printing results}
+
+The \code{depmix} function returns an object of class \code{depmix}
+which contains the model specification, and not a fitted model. Hence,
+the model needs to be fitted by calling \code{fit}:
+
<<>>=
fm <- fit(mod)
@
@@ -502,7 +587,6 @@
fm
@
-
These statistics can also be extracted using \code{logLik}, \code{AIC}
and \code{BIC}, respectively. By comparison, a 1-state model for
these data, i.e. assuming there is no mixture, has a log-likelihood of
@@ -514,19 +598,20 @@
The \code{summary} method of \code{fit}ted models provides the
parameter estimates, first for the prior probabilities model, second
-for the transition model, and third for the response models.
+for the transition models, and third for the response models.
<<>>=
summary(fm)
@
-Note that, since no further arguments were specified, the initial
-state, state transition and response distributions were set to their
-defaults (multinomial distributions for the first two, and Gaussian
-distributions for the response distributions).
-\subsection{Covariates on transition parameters}
+Since no further arguments were specified, the initial state, state
+transition and response distributions were set to their defaults
+(multinomial distributions for the first two, and a Gaussian
+distribution for the response).
+\subsection{Covariates on transition parameters\label{sec:trans}}
+
By default, the transition probabilities and the initial state
probabilities are parameterized using the multinomial logistic model.
More precisely, each row of the transition matrix is parameterized by
@@ -552,10 +637,10 @@
fm <- fit(mod)
@
-Applying \code{summary} to the \code{fit}ted object gives (only transition models
-printed here):
+Applying \code{summary} to the \code{fit}ted object gives (only
+transition models printed here):
<<>>=
-summary(fm)
+summary(fm)
@
The summary provides all parameters of the model, also the (redundant)
@@ -599,14 +684,48 @@
currently implemented in \pkg{depmixS4} is for multivariate normal
variables.
-\subsection{Adding covariates on the prior probabilities}
+\subsection{Adding covariates on the prior probabilities\label{sec:prior}}
To illustrate the use of covariates on the prior probabilities we have
included another data set with \pkg{depmixS4}. The \code{balance}
data consists of 4 binary items (correct-incorrect) on a balance scale
-task \citet{Siegler1981}. The data form a subset of the data
-published in \citet{Jansen2002}.
+task \citep{Siegler1981}. The data form a subset of the data
+published in \citet{Jansen2002}. Before specifying specifying a model
+for these data, we briefly describe them.
+The balance scale taks is a famous task for testing cognitive
+strategies developed by Piaget \citep{Siegler1981}.
+Figure~\ref{fig:balance} provides an example of a balance scale item.
+Participants' task is to say to which side the balance will tip when
+released, or alternatively, whether it will stay in balance. The item
+shown in Figure~\ref{fig:balance} is a so-callled distance item: the
+number of weights placed on each side is equal, and only the distance
+of the weights to the fulcrum differs between each side.
+
+\begin{figure}[htbp]
+\begin{center}
+ \includegraphic{graphs:baldist.pdf}
+ \caption{Balance scale item; this is a so-called distance item.}
+ \label{fig:balance}
+\end{center}
+\end{figure}
+
+
+Children in the lower grades of primary school are known to ignore the
+distance dimension, and base their answer only on the number of
+weights on each side. Hence, they would typically provide the wrong
+answer to these distance items. Slightly older children do take
+distance into account when responding to balance scale items, but they
+only do so when the number of weights is equal on each side. These
+two strategies that children employ are known a Rule I and Rule II.
+Other strategies can be teased apart by administering different items.
+The \code{balance} data set that we analyse here consists of 4
+distance items on a balance scale task administered to 779
+participants ranging from 5 to 19 years of age. The full set of items
+consisted of 25 items; other items in the test are used to detect
+other strategies that children (and young adults) employ in solving
+balance scale items \citep[see][for details]{Jansen2002}.
+
Similarly to the transition matrix, covariates on the prior
probabilities of the latent states (or classes in this case), are
defined by using a one-sided formula:
More information about the depmix-commits
mailing list