[Depmix-commits] r382 - papers/jss trunk/R

noreply at r-forge.r-project.org noreply at r-forge.r-project.org
Fri Mar 5 00:03:07 CET 2010


Author: ingmarvisser
Date: 2010-03-05 00:03:06 +0100 (Fri, 05 Mar 2010)
New Revision: 382

Modified:
   papers/jss/dpx4Rev.Rnw
   trunk/R/transInit.R
Log:
Minor changes in paper and getdf for transInit models (not done yet ...)

Modified: papers/jss/dpx4Rev.Rnw
===================================================================
--- papers/jss/dpx4Rev.Rnw	2010-03-04 17:10:32 UTC (rev 381)
+++ papers/jss/dpx4Rev.Rnw	2010-03-04 23:03:06 UTC (rev 382)
@@ -40,11 +40,13 @@
 	provided with the exgaus distribution.  Parameters are estimated by
 	the EM algorithm or, when (linear) constraints are imposed on the
 	parameters, by direct numerical optimization with the
-	\pkg{Rdonlp2} routine.}
+	\pkg{Rdonlp2} or \pkg{Rsolnp} routines.}
 
-\Keywords{hidden Markov model, dependent mixture model, mixture model}
+\Keywords{hidden Markov model, dependent mixture model, mixture 
+model, constraints}
 
-\Plainkeywords{hidden Markov model, dependent mixture model, mixture model} %% without formatting
+\Plainkeywords{hidden Markov model, dependent mixture model, mixture 
+model, constraints} %% without formatting
 %% at least one keyword must be supplied
 
 %% publication information
@@ -167,25 +169,30 @@
 of length $T$.  In the following, we use $\vc{O}_{t}$ as shorthand for
 $O_{t}^{1}, \ldots, O_{t}^{m}$.  As an example, consider a time series
 of responses generated by a single participant in a psychological
-response time experiment.  The data consists of three variables,
+response time experiment.  The data consists of three variables:
 response time, response accuracy, and a covariate which is a pay-off
 variable reflecting the relative reward for speeded and/or accurate
 responding.  These variables are measured on 168, 134 and 137
-occasions respectively (the first part of this series is plotted in
+occasions respectively (the first series of 168 trials is plotted in
 Figure~\ref{fig:speed}).  These data are more fully described in
-\citet{Dutilh2010}.
+\citet{Dutilh2010}, and in the next section a number of example models
+for these data is described.
 
 \begin{figure}[htbp]
 \begin{center}
-<<fig=TRUE,echo=FALSE,height=5,width=7,eps=FALSE>>=
-library(depmixS4)
-data(speed)
-plot(as.ts(speed[1:168,]),main="Speed-accuracy trade-off")
-@
-	 \caption{Response times (rt), accuracy (corr) and pay-off values (Pacc) for
-	  the first series of responses in dataset \code{speed}.}
-	  \label{fig:speed}
-	  
+
+	<<fig=TRUE,echo=FALSE,height=5,width=7,eps=FALSE>>=
+	library(depmixS4)
+	data(speed)
+	plot(as.ts(speed[1:168,]),main="Speed-accuracy trade-off")
+	@
+
+	\caption{Response times (rt), accuracy (corr) and pay-off 
+	values (Pacc) for the first series of responses in dataset 
+	\code{speed}.}
+	
+	\label{fig:speed}
+	
 \end{center}
 \end{figure}
 
@@ -195,7 +202,7 @@
 not suitable for long time series due to underflow problems.  In
 contrast, the hidden Markov model is typically only used for `long'
 univariate time series \citep[][chapter~1]{Cappe2005}.  We use the
-term ``dependent mixture model'' because one of the authors (Ingmar
+term "dependent mixture model" because one of the authors (Ingmar
 Visser) thought it was time for a new name to relate these
 models\footnote{Only later did I find out that \citet{Leroux1992}
 already coined the term dependent mixture models in an application
@@ -248,63 +255,6 @@
 	
 \end{enumerate}
 
-%In the next paragraphs, the likelihood
-%equation and estimation procedures for the dependent mixture model are
-%described for data of the above form.  
-
-%The likelihood of the dependent mixture model conditional on the
-%unknown (or hidden) state sequence $\vc{S}_{1:T} = (S_1,\ldots,S_T)$ 
-%and the model is given
-%by:
-%\begin{equation}
-%	\Prob( \vc{O}_{1:T} | \vc{S}_{1:T}, \greekv{\theta} ) = 
-%	\prod_{t=1}^{T}  \Prob( \vc{O}_{t} | S_{t},  \greekv{\theta}),
-%\end{equation}
-%where $\greekv{\theta}$ is the parameter vector of the model.  To arrive
-%at the marginal likelihood of the data for parameter vector
-%$\greekv{\theta}$, i.e.\ unconditional on the state sequence, we need to sum
-%above likelihood over all possible state sequences. This likelihood 
-%is written as: 
-%\begin{equation}
-%	\Prob(\vc{O}_{1:T}|\greekv{\theta}) =  
-%	\sum_{all Ss} \pi_{i}(\vc{z}_1) \vc{b}_{1}(\vc{O}_{1}) 
-%	\prod_{t=1}^{T-1} a_{ij}(\vc{z}_{t}) \vc{b}_{j}(\vc{O}_{t+1}|\vc{z}_{t+1}),
-%\end{equation}
-%where we have the following elements:
-%\begin{enumerate}
-%	
-%	\item $S_{t}$ is an element of $\mathcal{S}=\{1\ldots n\}$, a set
-%	of $n$ latent classes or states; we write for short $S_{i}$ to
-%	denote $S_{t}=i$.
-%	
-%	\item $\pi_{i}(\vc{z}_1) = \Prob(S_1 = i|\vc{z}_1)$,
-%	giving the probability of class/state $i$ at time $t=1$ with
-%	covariate $\vc{z}_1$.
-%	
-%	\item $a_{ij}(\vc{z}_t) = \Prob(S_{t+1}=j|S_{t}=i,\vc{z}_t)$,
-%	provides the probability of a transition from state $i$ to state
-%	$j$ with covariate $\vc{z}_t$,
-%	
-%	\item $\vc{b}_{j}$ is a vector of observation densities
-%	$b_j^k(\vc{z}_t) = \Prob(O_{t}^k|S_t = j, \vc{z}_t)$ that provide the
-%	conditional densities of observations $O_{t}^k$ associated with
-%	latent class/state $j$ and covariate $\vc{z}_t$, $j=1, \ldots, n$,
-%	$k=1, \ldots, m$. 
-%\end{enumerate}
-
-% The dependent mixture model is defined by the following equations: 
-% %\begin{align}
-% %	S_{t} &= A S_{t-1}, t=2, \ldots, T \\
-% %	O_{t} &= b(O_{t}|S_{t}), 
-% %\end{align}
-% \begin{align}
-% 	Pr(S_{t} = j) &= \sum_{i=1}^n a_{ij}(\vc{x}_t) Pr(S_{t-1} = i), t=2, \ldots, T \\
-% 	p(O_{t}^k|S_t &= j, \vc{x}_t) = b_j^k(\vc{x}_t)
-% \end{align}
-%where $S_{t}$ is a sequence of hidden states, $A$ is a transition
-%matrix, $O_{t}$ is an (possibly multivariate) observation and $b$ is a
-%density function for $O_{t}$ conditional on the hidden state $S_{t}$.
-
 For the example data above, $b_j^k$ could be a Gaussian distribution
 function for the response time variable, and a Bernoulli distribution
 for the accuracy variable.  In the models considered here,
@@ -367,8 +317,8 @@
 states.  Let $\greekv{\theta} = (\greekv{\theta}_1,
 \greekv{\theta}_2,\greekv{\theta}_3)$ be the general parameter vector
 consisting of three subvectors with parameters for the prior model,
-transition model, and response model respectively.  The joint
-log-likelihood can be written as
+transition model, and response models respectively.  The joint
+log-likelihood can be written as:
 \begin{equation}
 \log \Prob(\vc{O}_{1:T}, \vc{S}_{1:T}|\vc{z}_{1:T},\greekv{\theta}) = \log 
 \Prob(S_1|\vc{z}_{1},\greekv{\theta}_1) 
@@ -382,9 +332,9 @@
 The expected log-likelihood:
 \begin{equation}
 Q(\greekv{\theta},\greekv{\theta}') = E_{\greekv{\theta}'} 
-(\log \Prob(\vc{O}_{1:T},\vc{S}_{1:T}|\vc{O}_{1:T},\vc{z}_{1:T},\greekv{\theta}))
+(\log \Prob(\vc{O}_{1:T},\vc{S}_{1:T}|\vc{O}_{1:T},\vc{z}_{1:T},\greekv{\theta})),
 \end{equation}
-can be written as
+can be written as:
 \begin{multline}
 \label{eq:Q}
 Q(\greekv{\theta},\greekv{\theta}') = 
@@ -411,39 +361,6 @@
 observations $O^k_t$.
 
 
-
-%The maximising values of $a_{jk}$ are \cite<e.g.,>{Rabiner89}
-%\begin{equation}
-%\label{eq:MLA}
-%\hat{a}_{jk} = \frac{1}{N(T-1)} \sum_{i=1}^N \sum_{t=2}^T  \frac{\xi^i_t(j,k)}{\gamma_{t-1}^i(j)}.
-%\end{equation}
-
-%Fixing certain elements of $\mat{A}$ to 0, as in the DT version, does
-%not affect the estimation of the other elements.  When elements
-%$a_{jk}$ are assumed identical, we simply extend the summation in
-%(\ref{eq:MLA}) to include all those elements, changing the
-%denominator $N(T-1)$ accordingly.
-
-%To estimate $\greekv{\lambda}$, we note that the term containing this
-%parameter has the form of a weighted likelihood, where
-%$\gamma_t^i(j)$ can be interpreted as the number of replications of
-%$r^i_t$.  Hence, we can rely on standard ML estimates of the logistic
-%regression coefficients $\lambda_i$, using the values $\gamma_t^i(j)$
-%as ``case weights'' \cite<e.g.,>{Agresti02,McCullagh83}\footnote{To
-%be more specific, we replicate each observation $r^i_t$ a total of 6
-%times, once for each of the states besides the random state (which
-%offers no information regarding $\lambda_i$).  For the $j$-th
-%replication (corresponding to the $j$-th state), we used
-%$v_j(\vec{x}_t)$ as a predictor variable and $\gamma^i_t(j)$ as a
-%case weight.  All these replications were used to obtain the maximum
-%likelihood estimate of $\lambda$ from a single GLM, using the
-%``glm.fit'' function in R \cite{R}.}.  The resulting parameter
-%estimates are then used in a new Expectation step, followed by
-%another Maximisation step, until convergence.  Under mild regularity
-%conditions, it can be shown that the EM algorithm converges to a
-%(local) maximum of the likelihood \cite{EM}.
-
-
 The EM algorithm however has some drawbacks.  First, it can be slow to
 converge towards the end of optimization.  Second, applying
 constraints to parameters can be problematic; in particular, EM can
@@ -451,15 +368,10 @@
 in \pkg{depmixS4}, EM is used by default in unconstrained models, but
 otherwise, direct optimization is used.  Two options are available for
 direct optimization using package \pkg{Rdonlp2}
-\citep{Tamura2009,Spellucci2002}, or package \pkg{Rsolnp}.  Both
-packages can handle general linear (in)equality constraints, and
-optionally also non-linear constraints.
+\citep{Tamura2009,Spellucci2002}, or package \pkg{Rsolnp}
+\citep{Ghalanos2010}.  Both packages can handle general linear
+(in)equality constraints (and optionally also non-linear constraints).
 
-%Need some more on EM and how/why it is justified to do separate weighted
-%fits of the response models and transition and prior models. 
-
-%Also mention use of glm, nnet and possibly other packages that we use. 
-
 \section[Using depmixS4]{Using \pkg{depmixS4}}
 
 Two steps are involved in using \pkg{depmixS4} which are illustrated
@@ -482,8 +394,8 @@
 already indicated in the Introduction, it consists of three time
 series with three variables: response time, accuracy, and a covariate,
 Pacc, which defines the relative pay-off for speeded versus accurate
-responding.  Before describing some of the models that are fitted
-to these data, we provide a brief sketch of the reasons for gathering
+responding.  Before describing some of the models that are fitted to
+these data, we provide a brief sketch of the reasons for gathering
 these data in the first place.
 
 Response times are a very common dependent variable in psychological
@@ -506,9 +418,9 @@
 responding to fast responding at chance level.  At each trial of the
 experiment, the participant is shown the current setting of the
 relative reward for speed versus accuracy.  The bottom panel of
-figure~\ref{fig:speed} shows the values of this variable.  The 
-experiment was designed to investigate what would happen when this reward
-variable changes from reward for accuracy only to reward for
+figure~\ref{fig:speed} shows the values of this variable.  The
+experiment was designed to investigate what would happen when this
+reward variable changes from reward for accuracy only to reward for
 speed only.  The \code{speed} data that we analyse here are from
 participant A in Experiment 1 in \citet{Dutilh2010}, who provide a
 complete description of the experiment and the relevant theoretical
@@ -517,13 +429,13 @@
 The central question regarding this data is whether it is indeed best
 described by two modes of responding rather than a single mode of
 responding with a continuous trade-off between speed and accuracy.
-The hallmark of a discontinuity between slow versus speeded
-responding is that switching between the two modes is asymmetric
-\citep[see e.g.][for a theoretical underpinning of this
-claim]{Maas1992}.  The \code{fit} help page of \pkg{depmixS4} provides
-a number of examples in which the asymmetry of the switching
-process is tested; those examples and other candidate models are
-discussed at length in \citet{Visser2009b}.
+The hallmark of a discontinuity between slow versus speeded responding
+is that switching between the two modes is asymmetric \citep[see
+e.g.][for a theoretical underpinning of this claim]{Maas1992}.  The
+\code{fit} help page of \pkg{depmixS4} provides a number of examples
+in which the asymmetry of the switching process is tested; those
+examples and other candidate models are discussed at length in
+\citet{Visser2009b}.
 
 
 \subsection{A simple model}
@@ -540,10 +452,10 @@
     trstart=runif(4))
 @
 
-The first line of code loads the \pkg{depmixS4} package and 
-\code{data(speed)} loads the \code{speed} data set. The line
-\code{set.seed(1)} is necessary to get starting values that will 
-result in the right model, see more on starting values below. 
+The first line of code loads the \pkg{depmixS4} package and
+\code{data(speed)} loads the \code{speed} data set.  The line
+\code{set.seed(1)} is necessary to get starting values that will
+result in the right model, see more on starting values below.
 
 The call to \code{depmix} specifies the model with a number of
 arguments.  The \code{response} argument is used to specify the
@@ -551,7 +463,7 @@
 using formulae such as in \code{lm} or \code{glm} models.  The second
 argument, \code{data=speed}, provides the \code{data.frame} in which
 the variables from \code{response} are interpreted.  Third, the number
-of states of the model is given by \code{nstates=2}. 
+of states of the model is given by \code{nstates=2}.
 
 
 \paragraph{Starting values} Note also that start values for the

Modified: trunk/R/transInit.R
===================================================================
--- trunk/R/transInit.R	2010-03-04 17:10:32 UTC (rev 381)
+++ trunk/R/transInit.R	2010-03-04 23:03:06 UTC (rev 382)
@@ -6,14 +6,12 @@
 
 setClass("transInit",contains="GLMresponse")
 
-# FIX ME: data is a necessary argument to determine the dimension of x, even when there
-# are no covariates (and there are by definition no responses ...)
 setMethod("transInit",
 	signature(formula="formula"),
 	function(formula,nstates,data=NULL,family=multinomial(),pstart=NULL,fixed=NULL,prob=TRUE, ...) {
-		call <- match.call()
-		if(formula==formula(~1) & is.null(data)) {
-			x <- matrix(1,ncol=1)
+		call <- match.call()
+		if(formula==formula(~1) & is.null(data)) {
+			x <- matrix(1,ncol=1)
 		} else {
 			mf <- match.call(expand.dots = FALSE)
 			m <- match(c("formula", "data"), names(mf), 0)
@@ -21,36 +19,36 @@
 			mf$drop.unused.levels <- TRUE
 			mf[[1]] <- as.name("model.frame")
 			mf <- eval(mf, parent.frame())
-			x <- model.matrix(attr(mf, "terms"),mf)
+			x <- model.matrix(attr(mf, "terms"),mf)
 		}
 		y <- matrix(1,ncol=1) # y is not needed in the transition and init models
 		parameters <- list()
 		if(is.null(nstates)) stop("'nstates' must be provided in call to trinModel")
-		if(family$family=="multinomial") {
-			if(family$link=="identity") {
-				parameters$coefficients <- t(apply(matrix(1,ncol=nstates,nrow=ncol(x)),1,function(x) x/sum(x)))
+		if(family$family=="multinomial") {
+			if(family$link=="identity") {
+				parameters$coefficients <- t(apply(matrix(1,ncol=nstates,nrow=ncol(x)),1,function(x) x/sum(x)))
 				if(is.null(fixed)) {
 					fixed <- matrix(0,nrow=nrow(parameters$coefficients),ncol=ncol(parameters$coefficients))
 					fixed <- c(as.logical(t(fixed)))
-				}
+				}
 			} else {
-				parameters$coefficients <- matrix(0,ncol=nstates,nrow=ncol(x))
+				parameters$coefficients <- matrix(0,ncol=nstates,nrow=ncol(x))
 				if(is.null(fixed)) {
 					fixed <- parameters$coefficients
 					fixed[,family$base] <- 1 
 					fixed <- c(as.logical(t(fixed)))
-				}
+				}
 			}
 		}
 		npar <- length(unlist(parameters))
 		if(is.null(fixed)) fixed <- rep(0,npar)
 		if(!is.null(pstart)) {
 			if(length(pstart)!=npar) stop("length of 'pstart' must be ",npar)
-			if(family$family=="multinomial") {
-				if(family$link=="identity") {
-					parameters$coefficients[1,] <- pstart[1:ncol(parameters$coefficients)]
+			if(family$family=="multinomial") {
+				if(family$link=="identity") {
+					parameters$coefficients[1,] <- pstart[1:ncol(parameters$coefficients)]
 					pstart <- matrix(pstart,ncol(x),byrow=TRUE)
-					if(ncol(x)>1) parameters$coefficients[2:ncol(x),] <- pstart[2:ncol(x),]
+					if(ncol(x)>1) parameters$coefficients[2:ncol(x),] <- pstart[2:ncol(x),]
 				} else {
 					if(prob) {
 						parameters$coefficients[1,] <- family$linkfun(pstart[1:ncol(parameters$coefficients)],base=family$base)
@@ -58,7 +56,7 @@
 						parameters$coefficients[1,] <- pstart[1:ncol(parameters$coefficients)]
 					}
 					pstart <- matrix(pstart,ncol(x),byrow=TRUE)
-					if(ncol(x)>1) parameters$coefficients[2:ncol(x),] <- pstart[2:ncol(x),]
+					if(ncol(x)>1) parameters$coefficients[2:ncol(x),] <- pstart[2:ncol(x),]
 				}
 			} else {
 				if(family$link=="identity") parameters$coefficients <- family$linkfun(pstart[1:length(parameters$coefficients)])
@@ -120,8 +118,8 @@
 			y <- y[-na,]
 			#y <- round(y) # delete me
 			if(!is.null(w)) w <- w[-na]
-		}
-		switch(object at family$link,
+		}
+		switch(object at family$link,
 		  mlogit = {
     		mask <- matrix(1,nrow=nrow(pars$coefficients),ncol=ncol(pars$coefficients))
     		mask[,base] <- 0 # fix base category coefficients to 0
@@ -143,24 +141,24 @@
     				fit <- nnet.default(x,y,size=0,softmax=TRUE,skip=TRUE,mask=mask,Wts=Wts,trace=FALSE)
     			}
     		}
-    		pars$coefficients <- t(matrix(fit$wts,ncol=ncol(pars$coefficients),nrow=nrow(pars$coefficients)+1)[-1,])
-    		object <- setpars(object,unlist(pars))
-  		},
-  		identity = {
-  		  # object at y = fbo$xi[,,i]/fbo$gamma[,i]
-  		  # should become (see em):
-  		  #for(k in 1:ns) {
-				#		trm[i,k] <- sum(fbo$xi[-c(et),k,i])/sum(fbo$gamma[-c(et),i])
-				#	}
-				if(!is.null(w)) {
-				  sw <- sum(w)
-				  pars <- colSums(w*object at y)/sum(w)
-				} else {
-				  pars <- colMeans(object at y)
+    		pars$coefficients <- t(matrix(fit$wts,ncol=ncol(pars$coefficients),nrow=nrow(pars$coefficients)+1)[-1,])
+    		object <- setpars(object,unlist(pars))
+  		},
+  		identity = {
+  		  # object at y = fbo$xi[,,i]/fbo$gamma[,i]
+  		  # should become (see em):
+  		  #for(k in 1:ns) {
+				#		trm[i,k] <- sum(fbo$xi[-c(et),k,i])/sum(fbo$gamma[-c(et),i])
+				#	}
+				if(!is.null(w)) {
+				  sw <- sum(w)
+				  pars <- colSums(w*object at y)/sum(w)
+				} else {
+				  pars <- colMeans(object at y)
 				}
-			  object <- setpars(object,pars)
-  		},
-  		stop("link function not implemented")
+			  object <- setpars(object,pars)
+  		},
+  		stop("link function not implemented")
   	)
 		object
 	}
@@ -189,14 +187,15 @@
 			return(states)
 		}
 	}
-)
-
+)
+
 setMethod("getdf","transInit",
-	function(object) {
-		npar <- sum(!object at fixed)
-		if(object at family$linkfun == "identity") {
-			fx <- t(matrix(object at fixed,ncol=nrow(object at parameters$coefficients),nrow=ncol(object at parameters$coefficients)))
-			npar <- npar - rowSums(fx)
+	function(object) {
+		npar <- sum(!object at fixed)
+		if(object at family$link == "identity") {
+			npar <- npar-1
+			if(npar<0) npar <- 0
 		}
+		npar
 	}
-)
+)


More information about the depmix-commits mailing list