[Depmix-commits] r289 - papers/jss

Wed Jul 8 17:05:48 CEST 2009

Author: ingmarvisser
Date: 2009-07-08 17:05:48 +0200 (Wed, 08 Jul 2009)
New Revision: 289

Removed:
   papers/jss/bookreview.tex
   papers/jss/codesnippet.tex
   papers/jss/depmix-intro.Rnw
Log:
More file removals ...

Deleted: papers/jss/bookreview.tex
===================================================================

--- papers/jss/bookreview.tex	2009-07-08 15:05:29 UTC (rev 288)
+++ papers/jss/bookreview.tex	2009-07-08 15:05:48 UTC (rev 289)
@@ -1,51 +0,0 @@
-\documentclass[bookreview]{jss}
-
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-%% declarations for jss.cls %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-
-%% reviewer
-\Reviewer{Frederic Udina\\Pompeu Fabra University}
-\Plainreviewer{Frederic Udina}
-
-%% about the book
-\Booktitle{Visualizing Categorical Data}
-\Bookauthor{Michael Friendly}
-\Publisher{SAS Institute Inc.}
-\Pubaddress{Carey, NC}
-\Pubyear{2000}
-\ISBN{1-58025-660-0}
-\Pages{456}
-\Price{USD 69.95 (P)}
-\URL{http://www.math.yorku.ca/SCS/vcd/}
-%% if different from \Booktitle also set
-%% \Plaintitle{Visualizing Categorical Data}
-%% \Shorttitle{Visualizing Categorical Data}
-
-%% publication information
-%% NOTE: Typically, this can be left commented and will be filled out by the technical editor
-%% \Volume{11}
-%% \Issue{7}
-%% \Month{July}
-%% \Year{2004}
-%% \Submitdate{2004-07-26}
-
-%% address of (at least one) author
-\Address{
-  Frederic Udina\\
-  Pompeu Fabra University\\
-  Department of Economics and Business\\
-  Barcelona, Spain 08005\\
-  E-mail: \email{udina at upf.es}\\
-  URL: \url{http://libiya.upf.es/}
-}
-
-%% end of declarations %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-
-
-\begin{document}
-
-%% include the review as usual
-%% Note that you should use the \pkg{}, \proglang{} and \code{} commands.
-
-\end{document}

Deleted: papers/jss/codesnippet.tex
===================================================================
--- papers/jss/codesnippet.tex	2009-07-08 15:05:29 UTC (rev 288)
+++ papers/jss/codesnippet.tex	2009-07-08 15:05:48 UTC (rev 289)
@@ -1,64 +0,0 @@
-\documentclass[codesnippet]{jss}
-
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-%% declarations for jss.cls %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-
-%% almost as usual
-\author{Achim Zeileis\\Wirtschaftsuniversit\"at Wien \And 
-        Second Author\\Plus Affiliation}
-\title{A Capitalized Title: Possibly Containing More Details}
-
-%% for pretty printing and a nice hypersummary also set:
-\Plainauthor{Achim Zeileis, Second Author} %% comma-separated
-\Plaintitle{A Capitalized Title: Possibly Containing More Details} %% without formatting
-\Shorttitle{A Capitalized Title} %% a short title (if necessary)
-
-%% an abstract and keywords
-\Abstract{
-  Here should be the abstract.
-}
-\Keywords{keywords, comma-separated, not capitalized, \proglang{Java}}
-\Plainkeywords{keywords, comma-separated, not capitalized, Java} %% without formatting
-%% at least one keyword must be supplied
-
-%% publication information
-%% NOTE: Typically, this can be left commented and will be filled out by the technical editor
-%% \Volume{13}
-%% \Issue{9}
-%% \Month{September}
-%% \Year{2004}
-%% \Submitdate{2004-09-29}
-%% \Acceptdate{2004-09-29}
-
-%% The address of (at least) one author should be given
-%% in the following format:
-\Address{
-  Achim Zeileis\\
-  Department f\"ur Statistik \& Mathematik\\
-  Wirtschaftsuniversit\"at Wien\\
-  1090 Wien, Austria\\
-  E-mail: \email{Achim.Zeileis at wu-wien.ac.at}\\
-  URL: \url{http://statmath.wu-wien.ac.at/~zeileis/}
-}
-%% It is also possible to add a telephone and fax number
-%% before the e-mail in the following format:
-%% Telephone: +43/1/31336-5053
-%% Fax: +43/1/31336-734
-
-%% for those who use Sweave please include the following line (with % symbols):
-%% need no \usepackage{Sweave.sty}
-
-%% end of declarations %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-
-
-\begin{document}
-
-%% include your article here, just as usual
-%% Note that you should use the \pkg{}, \proglang{} and \code{} commands.
-
-\section[About Java]{About \proglang{Java}}
-%% Note: If there is markup in \(sub)section, then it has to be escape as above.
-
-
-\end{document}

Deleted: papers/jss/depmix-intro.Rnw
===================================================================
--- papers/jss/depmix-intro.Rnw	2009-07-08 15:05:29 UTC (rev 288)
+++ papers/jss/depmix-intro.Rnw	2009-07-08 15:05:48 UTC (rev 289)
@@ -1,900 +0,0 @@
-
-% -*- mode: noweb; noweb-default-code-mode: R-mode; -*-
-%&LaTeX
-
-\documentclass[]{report}
-
-\usepackage[]{amsmath, amsfonts, amstext, amsthm}
-
-%\batchmode
-
-\usepackage{Sweave}
-
-\usepackage{Rd}
-
-\usepackage[round]{natbib}
-
-\title{depmix: An R-package for fitting mixture models on mixed
-multivariate data with Markov dependencies}
-
-\author{Version: 0.9.1 \\ Ingmar Visser\thanks{Correspondence concerning this manual
-should be adressed to: Ingmar Visser, Department of Psychology,
-University of Amsterdam, Roetersstraat 15, 1018 WB, Amsterdam, The
-Netherlands}\\
-Developmental Processes Research Group\\
-Department of Psychology, University of Amsterdam\\
-i.visser at uva.nl}
-
-\date{\today}
-
-%%\VignetteIndexEntry{depmix: An R-package for fitting mixture models on mixed multivariate data with Markov dependencies}
-%%\VignetteDepends{depmix}
-%%\VignetteKeywords{R, latent/hidden Markov model, latent class model, finite mixture model, direct optimization, gradients}
-%%\VignettePackage{depmix}
-
-\SweaveOpts{prefix.string=graphs/depmix}
-
-\newcommand{\vc}{\mathbf}
-\newcommand{\mat}{\mathbf}
-
-% \newcommand{\pkg}{\texttt}
-% \newcommand{\code}{\texttt}
-
-\begin{document}
-
-\bibliographystyle{plainnat}
-
-\maketitle
-
-\begin{abstract}
-	\pkg{depmix} implements a general class of mixture models with
-	Markovian dependencies between them in the R programming language
-	\citep{R2006}.  This includes standard Markov models,
-	latent/hidden Markov models, and latent class and finite mixture
-	distribution models.  The models can be fitted on mixed
-	multivariate data from a number of distributions including the
-	binomial, multinomial, gaussian, lognormal and Weibull
-	distributions.  Parameters can be estimated subject to general
-	linear constraints, and with optional inclusion of regression on
-	(time-dependent) covariates.  Parameter estimation is done
-	through a direct optimization approach with gradients using the
-	\code{nlm} optimization routine.  Optional support for using the NPSOL
-	optimization routines is provided as well.  A number of illustrative
-	examples are included.
-\end{abstract}
-
-\tableofcontents
-
-\chapter{Introduction}
-
-Markov and latent Markov models are frequently used in the social
-sciences, in different areas and applications.  In psychology, they
-are used for modelling learning processes, see \citet{Wickens1982}, for an
-overview, and \citet{Schmittmann2005} for a recent application.  In
-economics, latent Markov models are commonly used as regime switching
-models, see e.g.\ \citet{Kim1994} and \citet{Ghysels1994}.  Further
-applications include speech recognition \citep{Rabiner1989}, EEG analysis
-\citep{Rainer2000}, and genetics \citep{Krogh1998}.  In those latter areas
-of application, latent Markov models are usually referred to as hidden
-Markov models.
-
-The \pkg{depmix} package was motivated by the fact that Markov models
-are used commonly in the social sciences, but no comprehensive package
-was available for fitting such models.  Common programs for Markovian
-models include Panmark \citep{Pol1996}, and for latent class models
-Latent Gold 
-\citep{Vermunt2003}.  Those programs are lacking a number of
-important features, besides not being freely available.  In
-particular, \pkg{depmix}: 1) handles multiple case, or multiple group,
-analysis; 2) handles arbitrarily long time series; 3) estimates models
-with general linear constraints between parameters; 4) analyzes mixed
-distributions, i.e., combinations of categorical and continuous
-observed variables; 5) fits mixtures of latent Markov models to deal
-with population heterogeneity; 6) can fit models with covariates.
-Although \pkg{depmix} is specifically meant for dealing with
-longitudinal or time series data, for say $T>100$, it can also handle
-the limit case with $T=1$.  In those cases, there are no time
-dependencies between observed data, and the model reduces to a finite
-mixture model, or a latent class model.  In the next chapter, an outline
-is provided of the model and the likelihood equations.  In the
-chapters after that a number of examples are presented.
-
-
-\section*{Acknowledgements} 
-
-Development of this pacakge was supported by European Commission grant
-51652 (NEST) and by a VENI grant from the Dutch Organization for Scientific
-Research (NWO) to Ingmar Visser.
-
-I am indebted to many people for providing help in writing this package.
-First and foremost Maartje Raijmakers and Verena Schmittmann tested
-countless earlier versions, spotted bugs and suggested many features.
-Moreover, Maartje Raijmakers provided the discrimination data set.  Han van
-der Maas provided the speed-accuracy data and thereby neccessitated
-implementing models with time-dependent covariates.  Conor Dolan and Raoul
-Grasman both provided valuable advice on statistics in general and
-optimization in particular.
-
-
-\chapter{Dependent mixture models}
-
-The data considered here, has the general form $O_{1}^{1}, \ldots,
-O_{1}^{m}$, $O_{2}^{1}, \ldots, O_{2}^{m}$, \ldots, $O_{T}^{1}, \ldots,
-O_{T}^{m}$ for an $m$-variate time series of length $T$.  As an
-example, consider a time series of responses generated by a single
-subject in a reaction time experiment.  The data consists of three
-variables, reaction time, accuracy and a covariate which is a pay-off
-factor which determines the reward for speed and accuracy.  These
-variables are measured at 168, 134 and 137 occasions respectively.
-Below, a summary is provided for these data, as well as a plot of the
-first timeseries, which is selected by \code{nind=1}.
-
-<<echo=FALSE, results=hide>>=
-.libPaths(new="/Users/ivisser/Library/R/library/")
-library(depmix)
-load("models.Rda")
-@
-
-<<>>=
-data(speed)
-summary(speed)
-@
-\begin{figure}[htbp]
-  \begin{center}
-<<fig=TRUE>>=
-plot(speed,nind=1) 
-@
-	\caption{Reaction times, accuracy and pay-off values for
-	the first series of responses in dataset \texttt{speed}.}
-  \end{center}
-\end{figure}
-
-The latent Markov model is commonly associated with data of this type,
-albeit usually only multinomial variables are considered.  However,
-common estimation procedures, such as those implemented in
-\citet{Pol1996} are not suitable for long time series due to
-underflow problems.  In contrast, the hidden Markov model is typically
-only used for `long' univariate time series.  In the next section, the
-likelihood and estimation procedure for the hidden Markov model is
-described, given data of the above form.
-
-The dependent mixture model is defined by the following elements:
-\begin{enumerate}
-	
-	\item a set $\vc{S}$ of latent classes or states $S_{i},\, i=1,
-	\ldots , n$,
-	
-	\item a matrix $\mat{A}$ of transition probabilities $a_{ij}$ for
-	the transition from state $S_{i}$ to state $S_{j}$,
-	
-	\item a set $\vc{B}$ of observation functions $b_{j}(\cdot)$ that
-	provide the conditional probabilities associated with latent state
-	$S_{j}$,
-	
-	\item a vector $\pmb{\pi}$ of latent state initial probabilities
-	$\pi_{i}$
-\end{enumerate}
-When transitions are added to the latent class model, it is more 
-appropriate to refer to the classes as states. The word class is 
-rather more associated with a stable trait-like attribute whereas a 
-state can change over time. 
-
-
-\section{Likelihood}
-
-The loglikelihood of hidden Markov models is usually computed by the
-so-called forward-backward algorithm \citep{Baum1966,Rabiner1989}, or rather by
-the forward part of this algorithm.  \cite{Lystig2002} changed the forward
-algorithm in such a way as to allow computing the gradients of the
-loglikelihood at the same time.  They start by rewriting the
-likelihood as follows (for ease of exposition the dependence on the
-model parameters is dropped here):
-\begin{equation}
-	L_{T} = Pr(\vc{O}_{1}, \ldots, \vc{O}_{T}) = \prod_{t=1}^{T} 
-Pr(\vc{O}_{t}|\vc{O}_{1}, 
-	\ldots, \vc{O}_{t-1}), 
-	\label{condLike}
-\end{equation}
-where $Pr(\vc{O}_{1}|\vc{O}_{0}):=Pr(\vc{O}_{1})$. Note that for a 
-simple, i.e.\ observed, Markov chain these probabilities reduce to 
-$Pr(\vc{O}_{t}|\vc{O}_{1},\ldots, 
-\vc{O}_{t-1})=Pr(\vc{O}_{t}|\vc{O}_{t-1})$.
-The log-likelihood can now be expressed as:
-\begin{equation}
-	l_{T} = \sum_{t=1}^{T} \log[Pr(\vc{O}_{t}|\vc{O}_{1}, \ldots, 
-\vc{O}_{t-1})].
-	\label{eq:condLogl}
-\end{equation}
-
-To compute the log-likelihood, \cite{Lystig2002} define the following 
-(forward) recursion:
-\begin{align}
-	\phi_{1}(j) &:= Pr(\vc{O}_{1}, S_{1}=j) = \pi_{j} b_{j}(\vc{O}_{1}) 
-	\label{eq:fwd1} \\
-\begin{split}
-	\phi_{t}(j) &:= Pr(\vc{O}_{t}, S_{t}=j|\vc{O}_{1}, \ldots, 
-\vc{O}_{t-1}) \\
-	&= \sum_{i=1}^{N} [\phi_{t-1}(i)a_{ij}b_{j}(\vc{O}_{t})] \times 
-(\Phi_{t-1})^{-1},
-	\label{eq:fwdt} 
-\end{split} 
-\end{align}
-where $\Phi_{t}=\sum_{i=1}^{N} \phi_{t}(i)$. Combining 
-$\Phi_{t}=Pr(\vc{O}_{t}|\vc{O}_{1}, \ldots, \vc{O}_{t-1})$, and 
-equation~(\ref{eq:condLogl}) gives the following expression for the 
-log-likelihood:
-\begin{equation}
-	l_{T} = \sum_{t=1}^{T} \log \Phi_{t}.
-	\label{eq:logl}
-\end{equation}
-
-The above forward recursion can readily be generalized to mixture 
-models, in which it is assumed that the data are realizations of a 
-number of different LMMs and the goal is to assign posterior 
-probabilities to sequences of observations. This situation occurs, 
-for example, in learning data where different learning strategies may 
-lead to different answer patterns. From an observed sequence of 
-responses, it may not be immediately clear from which learning 
-process they stem. Hence, it is interesting to consider a mixture of 
-latent Markov models which incorporate restrictions that are 
-consistent with each of the learning strategies. 
-
-To compute the likelihood of a mixture of $K$ models, define the 
-forward recursion variables as follows (these variables now have an 
-extra index $k$ indicating that observation and transition 
-probabilities are from latent model $k$):
-\begin{align}
-\begin{split}
-\phi_{1}(j_{k}) &=  Pr(\vc{O}_{1}, 
-S_{1}=j_{k})=p_{k}\pi_{j_{k}}b_{j_{k}}(\vc{O}_{1}).
-\end{split}\label{eq:fwd1mix} \\
-\begin{split}
-\phi_{t}(j_{k})   &=   Pr(\vc{O}_{t}, S_{t}=j_{k}|\vc{O}_{1}, \ldots, 
-\vc{O}_{t-1}) \\
-			&= \left[ \sum_{k=1}^{K} \sum_{i=1}^{n_{k}} \phi_{t-1}(i_{k}) 
-			a_{ij_{k}}b_{j_{k}}(\vc{O}_{t}) \right] \times (\Phi_{t-1})^{-1},
-\end{split}\label{eq:fwdtmix} 
-\end{align}
-where $\Phi_{t} = \sum_{k=1}^{K}\sum_{i=1}^{n_{k}} \phi_{t}(j_{k})$.
-Note that the double sum over $k$ and $n_{k}$ is simply an enumeration
-of all the states of the model.  Now, because $a_{ij_{k}}=0$ whenever
-$S_{i}$ is not part of component $k$, the sum over $k$ can be dropped
-and hence equation~\ref{eq:fwdtmix} reduces to:
-\begin{equation}
-	\phi_{t}(j_{k}) = \left[ \sum_{i=1}^{n_{k}} \phi_{t-1}(i_{k}) 
-			a_{ij_{k}}b_{j_{k}}(\vc{O}_{t}) \right] \times (\Phi_{t-1})^{-1}
-\end{equation}
-The loglikelihood is computed by applying equation~\ref{eq:logl} on
-these terms.  For multiple cases, the log-likelihood is simply the sum
-over the individual log-likelihoods. 
-
-\paragraph{Computational considerations} From
-equations~(\ref{eq:fwd1}--\ref{eq:fwdt}), it can be seen that
-computing the forward variables, and hence the log-likelihood, takes
-$O(Tn^{2})$ computations, for an $n$-state model and a time series of
-length $T$.  Consider a mixture of two components, one with two states
-and the other with three states.  Using
-equations~(\ref{eq:fwd1}--\ref{eq:fwdt}) to compute the log-likelihood
-of this model one needs $O(Tn^{2})=O(T\times 25)$ computations whereas
-with the mixture equations~(\ref{eq:fwd1mix}--\ref{eq:fwdtmix}),
-$\sum_{n_{i}} O(n_{i}^{2}T)$ computations are needed, in this case
-$O(T \times 13)$.  So, it can be seen that in this easy example the
-computational cost is almost halved. 
-
-
-\section{Gradients}
-
-\newcommand{\fpp}{\frac{\partial} {\partial \lambda_{1}}}
-
-See equations 10--12  in \cite{Lystig2002} for the score recursion 
-functions of the hidden Markov model for a univariate time series. 
-Here the corresponding score recursion for the multivariate mixture 
-case are provided. The $t=1$ components of this score recursion are 
-defined as (for an arbitrary parameter $\lambda_{1}$):
-\begin{align}
-\psi_{1}(j_{k};\lambda_{1}) &:=  \fpp Pr(\vc{O}_{1}|S_{1}=j_{k}) \\
-\begin{split} 
-	&= \left[  \fpp p_{k} \right] \pi_{j_{k}}b_{j_{k}}(\vc{O}_{1}) + 
-	p_{k}\left[ \fpp \pi_{j_{k}} \right] b_{j_{k}}(\vc{O}_{1}) \\
-	& \qquad  + p_{k}\pi_{j_{k}} \left[ \fpp 
-b_{j_{k}}(\vc{O}_{1})\right],
-\end{split} \label{eq:psi1}
-\end{align}
-and for $t>1$ the definition is:
-\begin{align}
-\psi_{t}(j_{k};\lambda_{1})  & =  \frac{\fpp Pr(\vc{O}_{1}, \ldots, 
-\vc{O}_{t}, S_{t}=j_{k})}
-			{Pr(\vc{O}_{1}, \ldots, \vc{O}_{t-1})}  \\
-\begin{split} 
-	& =  
-			 \sum_{i=1}^{n_{k}} \Bigg\{ \psi_{t-1}(i;\lambda_{1})a_{ij_{k}} 
-			 b_{j_{k}}(\vc{O}_{t}) \\ 
-			 &\qquad +\phi_{t-1}(i) \left[ \fpp a_{ij_{k}} \right] b_{j_{k}} 
-(\vc{O}_{t}) \\
-			&\qquad +\phi_{t-1}(i)a_{ij_{k}}  \left[ \fpp b_{j_{k}} 
-(\vc{O}_{t}) \right] \Bigg\} 
-			\times (\Phi_{t-1})^{-1}.
-\end{split} \label{eq:psit}
-\end{align}
-
-Using above equations, \cite{Lystig2002} derive the following equation 
-for the partial derivative of the likelihood:
-\begin{equation}
-	\fpp l_{T}= 	
-		\frac{\mathbf{\Psi}_{T}(\lambda_{1})}{\mathbf{\Phi}_{T}},
-\end{equation}
-where $\Psi_{t}=\sum_{k=1}^{K} \sum_{i=1}^{n_{k}} 
-\psi_{t}(j_{k};\lambda_{1})$. 
-Starting from the equation from the logarithm of the likelihood, this 
-is easily seen to be correct: 
-\begin{eqnarray*}
-	\fpp \log Pr(\vc{O}_{1}, \ldots, \vc{O}_{T}) &=& Pr(\vc{O}_{1}, 
-\ldots, \vc{O}_{T})^{-1} 
-	\fpp Pr(\vc{O}_{1}, \ldots, \vc{O}_{T}) \\
-	&=&  \frac{Pr(\vc{O}_{1}, \ldots, \vc{O}_{T-1})}{Pr(\vc{O}_{1}, 
-\ldots, \vc{O}_{T})}  \Psi_{T} (\lambda_{1}) \\
-	&=&  \frac{\mathbf{\Psi}_{T}(\lambda_{1})}{\mathbf{\Phi}_{T}}.
-\end{eqnarray*}
-
-Further, to actually compute the gradients, the partial derivatives of
-the parameters and observation distribution functions are neccessary,
-i.e., $\fpp p_{k}$, $\fpp \pi_{i}$, $\fpp a_{ij}$, and $\fpp
-\vc{b}_{i}(\vc{O}_{t})$.  Only the latter case requires some
-attention.  We need the following derivatives $\fpp
-\vc{b}_{j}(\vc{O}_{t})=\fpp \vc{b}_{j}(O_{t}^{1}, \ldots, O_{t}^{m})$, for
-arbitrary parameters $\lambda_{1}$. To stress that $\vc{b}_{j}$ is a
-vector of functions, we here used boldface. First note that because of local
-independence we can write:
-\begin{equation*}
-	\fpp \left[ b_{j}(O_{t}^{1}, \ldots, O_{t}^{m}) \right] = \frac{\partial} 
-{\partial \lambda_{1} } \left[ b_{j}(O_{t}^{1}) \right] \times  
-\left[ b_{j}(O_{t}^{2}) \right], \ldots,  \left[ b_{j}(O_{t}^{m}) \right].  
-\end{equation*}
-Applying the chain rule for products we get:
-\begin{equation}
-	\fpp [b_{j}(O_{t}^{1}, \ldots, O_{t}^{m})] =
-	\sum_{l=1}^{m} \left[ \prod_{i=1, \ldots, \hat{l}, \ldots, m} 
-b_{j}(O_{t}^{i}) \right] \times
-	\fpp  [b_{j}(O_{t}^{l})],
-	\label{partialProd}
-\end{equation}
-where $\hat{l}$ means that that term is left out of the product. 
-These latter terms, $\frac{\partial} {\partial \lambda_{1} }  
-[b_{j}(O_{t}^{k})]$, are easy to compute given either multinomial or 
-gaussian observation densities $b_{j}(\dot)$
-
-
-\section{Parameter estimation}
-
-Parameters are estimated in \pkg{depmix} using a direct optimization
-approach instead of the EM algorithm which is frequently used for this type
-of model.  The EM algorithm however has some drawbacks.  First, it can be
-slow to converge.  Second, applying constraints to parameters can be
-problmatic.  The EM algorithm can sometimes lead to incorrect estimates
-when constraints are applied to parameters in the M-step of the algorithm.
-The package was designed to be used with the \code{npsol}-library, the
-main reason being that it handles general linear (in-)equality
-constraints very well. Unfortunately, \code{npsol} is not freeware and
-hence is not distributed with \pkg{depmix}. Two other options are
-available for optimization using  \code{nlm} and \code{optim}
-respectively. Linear equality constraints are fitted through
-reparametrization. Inequality constraints are fitted through adding a 
-penalty to the likelihood depending on the amount by which a
-constraint is not satisfied. The argument \code{vfactor} to the
-fitting function can be used to control this bahavior. See details of 
-this in the chapter on fitting models. 
-
-\chapter{Using depmix}
-
-Three steps are involved in using \pkg{depmix} which are illustrated
-below with examples:
-\begin{enumerate}
-	\item  data specification with function \code{markovdata}
-	
-	\item  model specification with function \code{dmm}
-	
-	\item  model fitting with function \code{fitdmm}
-\end{enumerate}
-
-To be able to fit models, data need to in a specific format created
-for this package.  Basically, data should be in the form of a matrix
-with each row corresponding to measures taken at a single measurement
-occasion for a single subject.  The function \code{markovdata} further
-only requires one argument providing the itemtypes, being one of
-categorical, continuous or covariate.  A markovdata object is a matrix
-with a number of attributes.  
-
-\section{Creating data sets}
-
-As an example we make a dataset with two variables measured at two
-times 50 occasions.
-
-<<>>=
-x=rnorm(100,10,2)
-y=ifelse(runif(100)<0.5,0,1)
-z=matrix(c(x,y),100,2)
-md=markovdata(z,itemtypes=c("cont","cat"),ntimes=c(50,50))
-md[1:10,]
-@
-
-In the example below, we split the dataset \code{speed} into three
-separate datasets, which we later use as an example to do multi-group
-analysis.
-
-<<>>=
-data(speed)
-r1=markovdata(dat=speed[1:168,],itemt=itemtypes(speed))
-r2=markovdata(dat=speed[169:302,],itemt=itemtypes(speed))
-r3=markovdata(dat=speed[303:439,],itemt=itemtypes(speed))
-summary(r2)
-@
-
-
-Here is the full specification of the \code{markovdata} function. 
-
-\input{markovdata}
-
-
-\section{Data set speed}
-
-Throughout this manual we will use a data set called speed, and hence
-we provide some background information on how these data were
-gathered.
-
-\input{speed}
-
-Interesting hypotheses to test are: is the switching regime symmetric?
-Is there evidence for two states or does one state suffice?  Is the
-guessing state actually a guessing state, i.e., is the probability
-correct at chance level of 0.5?
-
-
-\section{Defining models}
-
-A dependent mixture model is defined by the number of states, and by
-the item distribution functions, and can be created with the
-\code{dmm}-function as follows:
-
-<<>>= 
-mod <-dmm(nstates=2,itemtypes=c("gaus",2))
-summary(mod)
-@
-
-Here \code{itemtypes} is a vector of length the number of items
-measured at each occasion specifying the desired distributions, in
-this case the first item is to follow a normal distribution, and the
-second item follows a bernouilli distribution.  Allowable
-distributions are listed in Table~\ref{tbl:dist}, along with their
-internal code, and the parameter names.  The R-internal code is used
-for estimating these parameters.  Specifics of these distributions and
-their estimation can be found in their respective help files.
-Itemtypes can be specified by their name or by their internal code,
-except in the case of multinomial items, which have to be specified by
-a number.
-
-\begin{table}[tbp]
-	\centering
-	\begin{tabular}{lcc}
-		\hline
-		distribution & code & parameters \\
-		\hline
-		\hline
-		multinomial & $2,3,4, \ldots $ & $p_1, p_2, p_3, \ldots $  \\
-		\hline
-		gaussian, normal & 1  & $ \mu, \sigma $ \\
-		\hline
-		lognormal & -21 & $l\mu$ , $l\sigma$  \\
-		\hline
-		weibull & -22 & shape (a), scale (b) \\
-		\hline
-		gamma & -23 & shape (a), scale (s)  \\
-		\hline
-		3lognormal & -31 &  $l\mu$ , $l\sigma$, shift \\
-		\hline
-		3weibull & -32 & shape (a), scale (b), shift \\
-		\hline
-		3gamma & -33 & shape (a), scale (s), shift \\
-		\hline
-	\end{tabular}
-	\caption{Allowable distribution names, internal codes, and number 
-	of parameters.}
-	\label{tbl:dist}
-\end{table}
-
-The function \code{dmm} returns an object of class \code{dmm} which
-has its own summary function providing the parameter values of the
-model. See the help files for further details. Except in simple 
-cases, starting values can be a problem in latent Markov models, and
-so in general it's best to provide them if you have a fairly good idea
-of what to expect. Providing starting values is done through the stval
-argument: 
-
-<<>>=
-st <- c(1,0.9,0.1,0.2,0.8,2,1,0.7,0.3,5,2,0.2,0.8,0.5,0.5)
-mod <- dmm(nsta=2,itemt=c(1,2), stval=st)
-summary(mod)
-@
-
-\input{dmm}
-
-\subsection{Generating data}
-
-The \code{dmm}-class has a \code{generate} method that can be used to 
-generate data according to a specified model. 
-
-<<>>=
-gen<-generate(c(100,50),mod)
-summary(gen)
-@
-\begin{figure}[htbp]
-  \begin{center}
-<<fig=TRUE>>=
-plot(gen)
-@
-	\caption{Two timeseries generated by 2-state model with one 
-	gaussian item and one binary item.}
-  \end{center}
-\end{figure}
-
-\input{generate}
-
-\section{Fitting models\label{fitmod}}
-
-Fitting models is done using the function \code{fitdmm}. The standard
-call only requires a dataset and a model as in:
-
-<<>>= 
-data(speed)
-mod <- dmm(nstates=2,itemtypes=c(1,2))
-fitex <- fitdmm(speed,mod)
-@
-
-Calling \code{fitdmm} produces some online ouput about the progress of
-the optimization which can be controlled with the \code{printlevel}
-argument.  Its default value of 1 just gives the first and the last
-iteration of the optimization; 0 gives no output, and setting it to 5
-or higher values will produce output at each iteration.  These values
-correspond with the 0,1, and 2 printlevel of \code{nlm}.  When using
-\code{optim}, the \code{printlevel} argument is used to set the
-\code{REPORT} argument of \code{optim} (see its help page for
-details).  Printlevel 0 gives report 0, printlevel 1 gives report 10,
-printlevels 2--4 give report 5 and printlevel>4 gives report 1,
-producing output at every iteration.  Printlevels starting from 15 and
-higher produce increasingly annoying output from the C-routines that
-compute the loglikelihood.
-
-\code{Fitdmm} returns an object of class \code{fit} which has a
-summary method showing the estimated parameter values, along with
-standard errors, and t-ratios whenever those are available.  Along
-with the log-likelihood, the AIC and BIC values are provided.  Apart
-from the printed values (see summary below), a \code{fit}-object has a
-number of other fields.  Most importantly, it contains a copy of the
-fitted model in \code{mod} and it has a field \code{post} containing
-posterior state estimates.  That is, for each group $g$,
-\code{post\$states[[g]]} is a matrix with dimensions the number of
-states of the model + 2, and \code{sum(ntimes(dat))}.  The first
-column contains the a posteriori component model, the second column
-has the state number within the component, and the other columns are
-used for the a posteriori probabilities of each of the states.
-
-<<>>=
-summary(fitex)
-@
-
-
-\input{fitdmm}
-
-\chapter{Extending and constraining models}
-
-\section{Fixing and constraining parameters}
-
-Continuing the example from above, it can be seen that in one of the
-states, the probability of a correct answer is about .5, as is the
-probability of an incorrect answer, i.e., these are parameters Item2,p1 and
-Item2,p2. This latent state, is supposed to be a guessing state, and hence 
-it makes sense to constrain these parameters to their theoretical values of 
-.5. Similarly, the initial state probability for the slow state is one, and
-zero for the other state, and hence it makes sense to fix these parameters.
-The third constraint that we consider here is an equality constraint
-between the transition parameters. Using this constraint, we can test the
-hypothesis whether the switching between states is a symmetric process or
-not. Hence, we constrain the transition parameters $a_{11}$ and $a_{22}$.
-
-Constraining and fixing parameters is done in a similar fashion as the
-\code{pa} command that is used in LISREL \citep{Joreskog1999}. The
-\code{conpat} argument to the \code{fitdmm}-function specifies for
-each parameter in the model whether it's fixed (0) or free (1 or
-higher). Equality constraints can be imposed by having two parameters have 
-the same number in the \code{conpat} vector. When only fixed values are
-required the \code{fixed} argument can be used instead of \code{conpat},
-with zeroes for fixed parameters and other values (ones e.g.) for non-fixed
-parameters. 
-
-Fitting the models subject to these constraints is mostly done through
-reparametrization.  Inequality constraints are enforced by adding a
-penalty to the loglikelihood when the constraint is not satisfied.
-The penalty is linear in the amount by which the constraint is not
-satisfied, and not logarithmic or something similar which is often
-used (see e.g.\ the documentation for \code{constrOptim} which uses a
-logarithmic boundary for inequality constraints).  This has advantages
-and disadvantages.  There are two marked disadvantages.  First, the
-loglikelihood is not smooth at the boundary of the paramter space.
-Second, it can happen that the constraint is not satisfied.  Whenever
-cosntraints are not satisfied \code{fitdmm} exits with a warning
-stating the amount by which it is not satisfied.  This can be remedied
-by upping the \code{vfactor} argument which simply increases the
-penalty by this factor (its default value is 5).  An advantage is that
-using a linear penalty, it is possible that the parameter is estimated
-at the boundary, which is prohibited with logarithmic boundaries.
-
-
-\paragraph{Parameter numbering} When using the \code{conpat} and
-\code{fixed} arguments, complete vectors should be supplied, i.e., these
-vectors should have length of the number of parameters of the model.
-Parameters are numbered in the following order:
-\begin{enumerate}
-	\item  the mixing proportions of a mixture of latent Markov
-	models, i.e., just one parameter for a single component model
-	which has value 1 and is fixed
-
-	\item  the component parameters for each component consisting of
-	the following:
-	\begin{enumerate}
-		\item  transition parameters in row major order, $a_{11},
-		a_{12}, a_{13}, \ldots, a_{21}, a_{22}, a_{23}, \ldots $
-	
-		\item  the observation parameters per state and per item, in
-		the order listed in Table~\ref{tbl:dist}
-	
-		\item  the initial state probabilities per state
-	\end{enumerate}
-
-\end{enumerate}
-
-<<>>=
-conpat=rep(1,15)
-conpat[1]=0 
-conpat[14:15]=0 
-conpat[8:9]=0
-conpat[2]=conpat[5]=2
-stv=c(1,.896,.104,.084,.916,5.52,.20,.5,.5,6.39,.24,.098,.902,0,1)
-mod=dmm(nstates=2,itemt=c("n",2),stval=stv,conpat=conpat)
-@
-
-In the example above \code{conpat} is used to specify a number of
-constraints.  First, \code{conpat[1]=0} specifies that the mixing
-proportion of the model should be fixed (at its starting value of 1),
-which is always the case for single component models.  Second,
-\code{conpat[14:15]=0} fixes the initial state probabilities to zero
-and one respectively.  Similarly, for \code{conpat[8:9]=0}, which are
-the guessing state parameters for the accuracy scores.  They are both
-fixed at 0.5 so as to make the guessing state an actual guessing
-state.  Finally, by invoking \code{conpat[2]=conpat[5]=2}, transition
-parameters $a_{11}$ and $a_{22}$ are set to be equal.  Whenever
-equality constraints are not sufficient, general linear constraints
-can be specified using the \code{conrows} argument. 
-
-The constrained model has the following estimated
-parameters\footnote{Note that in running this example with the
-starting values from the unconstrained model, the initial
-loglikelihood is worse than the final loglikelihood because the
-initial likelihood is based on parameters that do not satisfy the
-constraints.}:
-
-% <<results=hide>>=
-% fitfix <- fitdmm(dat=speed,dmm=mod)
-% @
-% results in models.Rda
-
-<<>>=
-summary(fitfix)
-@
-
-
-\section{Multi group/case analysis}
-
-<<echo=FALSE>>=
-conpat=rep(1,15)
-conpat[1]=0
-conpat[8:9]=0
-conpat[14:15]=0
-stv=c(1,0.9,0.1,0.1,0.9,5.5,0.2,0.5,0.5,6.4,0.25,0.9,0.1,0.5,0.5)
-mod<-dmm(nstates=2,itemt=c("n",2),stval=stv,conpat=conpat)
-@
-
-\code{depmix} can handle multiple cases or multiple groups. A
-multigroup model is specified using the function \code{mgdmm} as
-follows:
-
-<<>>=
-mgr <- mgdmm(dmm=mod,ng=3,trans=TRUE,obser=FALSE)
-mgrfree <- mgdmm(dmm=mod,ng=3,trans=FALSE)
-@
-
-The \code{ng} argument specifies the number of groups, and the
-\code{dmm} argument specifies the model for each group.  \code{dmm}
-can be either a single model or list of models of \code{length(ng)}.
-If it is a single model, each group has an identical structural model
-(same fixed and constrained parameters), and else each group has its
-model. Three further arguments can be used to constrain parameters
-between groups, \code{trans}, \code{obser}, and \code{init}
-respectively. By setting either of these to \code{TRUE}, the
-corresponding transition, observation, and initial state parameters
[TRUNCATED]

To get the complete diff run:
    svnlook diff /svnroot/depmix -r 289