[Pomp-commits] r503 - in pkg: . inst/doc

Tue May 31 19:06:00 CEST 2011

Author: kingaa
Date: 2011-05-31 19:06:00 +0200 (Tue, 31 May 2011)
New Revision: 503

Modified:
   pkg/DESCRIPTION
   pkg/inst/doc/intro_to_pomp.Rnw
   pkg/inst/doc/intro_to_pomp.pdf
   pkg/inst/doc/pomp.bib
Log:
- add section on NLF to intro vignette (by S. Ellner)


Modified: pkg/DESCRIPTION
===================================================================

--- pkg/DESCRIPTION	2011-05-27 19:48:16 UTC (rev 502)
+++ pkg/DESCRIPTION	2011-05-31 17:06:00 UTC (rev 503)
@@ -2,7 +2,7 @@
 Type: Package
 Title: Statistical inference for partially observed Markov processes
 Version: 0.38-2
-Date: 2011-05-27
+Date: 2011-06-01
 Author: Aaron A. King, Edward L. Ionides, Carles Breto, Steve Ellner, Bruce Kendall, Helen Wearing, Matthew J. Ferrari, Michael Lavine, Daniel C. Reuman
 Maintainer: Aaron A. King <kingaa at umich.edu>
 URL: http://pomp.r-forge.r-project.org

Modified: pkg/inst/doc/intro_to_pomp.Rnw
===================================================================
--- pkg/inst/doc/intro_to_pomp.Rnw	2011-05-27 19:48:16 UTC (rev 502)
+++ pkg/inst/doc/intro_to_pomp.Rnw	2011-05-31 17:06:00 UTC (rev 503)
@@ -1028,37 +1028,292 @@
 \clearpage
 \section{Nonlinear forecasting: \code{nlf}}
 
-\citet{Ellner1998,Kendall1999,Kendall2005}.  To be added.
+NLF is an indirect inference approach \citep{Gourieroux1996}, meaning that an intermediate statistical model is used to quantify the model's goodness of fit to the data. 
+Specifically, NLF is a \emph{Simulated Quasi-Maximum Likelihood} (SQML) method. 
+The quasilikelihood function is defined by fitting a convenient statistical model to a long simulation output from the model of interest, and then evaluating the statistical model's likelihood function on the data. 
+The intermediate statistical model in \code{nlf} is a multivariate generalized additive autoregressive model, using radial basis functions as the ridge functions and multivariate Gaussian process noise. 
+\citet{Smith1993} first proposed SQML and developed the underlying statistical theory, \citet{Tidd1993} independently proposed a similar method, and \citet{Kendall2005} describe in detail the methods used by \code{nlf} and use them to fit and compare models for insect population cycles. 
 
-<<first-nlf,echo=F,eval=F>>=
-estnames <- c('alpha.2','alpha.3','tau')
+As a simple example we can use \code{nlf} to estimate the parameters \code{log.K} and \code{log.r} of the Gompertz model.  
+An example of a minimal call, accepting the defaults for all optional arguments, is 
+<<eval=F>>=
+data(gompertz)
 out <- nlf(
            gompertz,
            start=theta.guess,
-           nasymp=2000,
-           est=estnames,
-           lags=c(4,6),
-           seed=5669345L,
-           skip.se=TRUE,
-           method="Nelder-Mead",
-           trace=0,
-           maxit=100,
-           reltol=1e-8,
-           transform=function(x)x,
-           eval.only=FALSE
+           est=c("log.K","log.r"),
+           lags=c(1,2)
            )
 @ 
-<<first-nlf-results,echo=F,eval=F>>=
-print(
-      cbind(
-            guess=theta.guess[estnames],
-            fit=out$params[estnames],
-            truth=theta.true[estnames]
-            ),
-      digits=3
-      )
+where the first argument is the \code{pomp} object, \code{theta.guess} is a vector containing model parameters at which \code{nlf}'s search will begin, \code{est} contains the names of parameters \code{nlf} will estimate, and \code{lags} specifies which past values are to be used in the autoregressive model. 
+In the call above \code{lags=c(1,2)} specifies that the autoregressive model predicts each observation, $y_t$ using $y_{t-1}$ and $y_{t-2}$, the two most recent past observations. 
+The set of lags need not include the most recent observation, and skips are allowed, so that \code{lags=c(2,3,6)} is also ``legal''. 
+
+The quasilikelihood is optimized numerically, so the reliability of the optimization should be assessed by doing multiple fits with different starting parameter values. 
+Because of the way \code{nlf} controls the random number seed, starting values should all be chosen before the calls to \code{nlf}: 
+<<nlf-gompertz-starts,eval=F>>=
+starts <- list()
+for (j in 1:5) { # Pick 5 random starting parameter values
+    sj <- coef(gompertz)
+    sj[c("log.K","log.r")] <- rnorm(n=2,mean=sj[c("log.K","log.r")],sd=0.1)
+    starts[[j]] <- sj
+}
 @ 
+Then to make the results from different starts comparable, use the \code{seed} argument to initialize the random number generator the same way for each fit: 
+<<nlf-gompertz-fits,eval=F>>=
+out <- list()
+for (j in 1:5) { # Do fits. method, trace, and nasymp are explained below   
+  out[[j]] <- nlf(
+                  gompertz,
+                  start=starts[[j]],
+                  est=c("log.K","log.r"),
+                  lags=c(1,2),
+                  seed=7639873, 
+                  method="Nelder-Mead",
+                  trace=4,
+                  nasymp=5000
+                  )  
+}
+fits <- t(sapply(out,function(x)c(x$params[c("log.r","log.K")],value=x$value)))
+@ 
+<<echo=F,eval=T,results=hide>>=
+binary.file <- "nlf-fits.rda"
+if (file.exists(binary.file)) {
+  load(binary.file)
+} else {
+<<nlf-gompertz-starts>>
+<<nlf-gompertz-fits>>
+  save(starts,out,fits,file=binary.file)
+}
+@ 
+The results in this case are very encouraging,
+<<eval=T>>=
+fits
+@ 
+so below we will trust that repeated optimization isn't needed.  
 
+The call above also used the \code{method} argument to specify that the Nelder-Mead option in \code{optim} is used to maximize the quasilikelihood, and the \code{trace} argument is passed to \code{optim}; other arguments can be passed to \code{optim} in the
+same way. 
+\code{nasymp} sets the length of the Gompertz model simulation on which the quasilikelihood is based; 
+larger values will give less variable parameter estimates, but will slow down the fitting process. 
+The slowdown is dominated by the time required to generate the model simulations, so efficient coding of \code{rprocess} is essential. 
+The ``Advanced topics in pomp'' vignette gives some advice on this. 
+Do \code{vignette("advanced\_topics\_in\_pomp")} to view it.
+
+The choice of lags affects the accuracy of the intermediate statistical model and therefore the accuracy of parameter estimates, so it is worth putting some effort into choosing good lags. 
+Given enough time, a user could generate many artificial data sets, fit them all with several candidate lags, and evaluate the precision and accuracy of estimates. 
+A quicker approach is to explore the shape and variability of the quasilikelihood function near pilot parameter estimates for several candidate sets of lags, using \code{nlf} with \code{eval.only=TRUE} to evaluate the quasilikelihood without performing optimization. 
+
+For the Gompertz model the complete state vector is observed, so it is plausible that forecasting based on the one most recent observation is optimal, i.e. \code{lags=1}. 
+But because of measurement error, prediction based on multiple lags might be more accurate and more sensitive to parameter values, and longer-term forecasting might be beneficial if the effects of parameters are amplified over time. 
+Fig.~\ref{fig:nlf-gompertz} shows results for several candidate lags suggested by these considerations. 
+To reduce Monte Carlo error in the objective function, we used \code{simulate} to create a very long ``data set'':
+<<nlf-my-pomp,eval=T>>=
+my.pomp <- simulate(gompertz,times=c(1:1000))
+my.params <- coef(my.pomp)
+@ 
+and then evaluated the quasilikelihood for a range of parameter values: 
+<<nlf-lag-test-log.r,eval=F>>=
+lags <- list(1,2,c(1,2),c(2,3))
+log.r.vals <- my.params["log.r"]+seq(-0.69,0.69,length=25)
+fvals <- matrix(nrow=25,ncol=4)
+for (j in 1:25) {
+  cat(j,"\n")
+  pars <- my.params
+  pars["log.r"] <- log.r.vals[j]
+  for(k in 1:4) {
+    fvals[j,k] <- nlf(my.pomp, start=pars, nasymp=5000, est=NULL, lags=lags[[k]], eval.only=TRUE)
+  }
+}
+@ 
+<<nlf-lag-test-log.K,eval=F,echo=F>>=
+log.K.vals <- my.params["log.K"]+seq(-0.15,0.15,length=25)
+fvals2 <- matrix(nrow=25,ncol=4)
+for (j in 1:25) {
+  cat(j,"\n") 
+  pars <- my.params
+  pars["log.K"] <- log.K.vals[j]
+  pars["X.0"] <- exp(log.K.vals[j])
+  for(k in 1:4) {
+    fvals2[j,k] <- nlf(my.pomp, start=pars, nasymp=5000, est=NULL, lags=lags[[k]], eval.only=TRUE)
+  }
+}
+@ 
+<<nlf-lag-tests,eval=T,echo=F,results=hide>>=
+binary.file <- "nlf-lag-tests.rda"
+if (file.exists(binary.file)) {
+  load(binary.file)
+} else {
+<<nlf-my-pomp>>
+<<nlf-lag-test-log.r>>
+<<nlf-lag-test-log.K>>
+  save(lags,log.r.vals,log.K.vals,fvals,fvals2,file=binary.file)
+}
+@ 
+
+\begin{figure}[tbp]
+<<nlf-gompertz-plot,height=4,width=6,echo=F,fig=T>>=
+fvals <- scale(fvals,center=apply(fvals,2,max),scale=FALSE) 
+fvals2 <- scale(fvals2,center=apply(fvals2,2,max),scale=FALSE)
+op <- par(mfrow=c(1,2),mar=c(5,5,1,1))
+matplot(
+        log.r.vals,
+        fvals,
+        xlab=quote(log~r),
+        lty=1,
+        col=c("black","green3","red","purple"),
+        ylab="NLF objective function",
+        type="o",
+        pch=16
+        )
+abline(v=my.params["log.r"],col="blue")
+legend("bottomleft",legend=c("1","2","1,2","2,3"),col=c("black","green3","red","purple"),lty=1,pch=16,cex=1,bty="n")
+
+matplot(
+        log.K.vals,
+        fvals2,
+        xlab=quote(log~K),
+        lty=1,
+        col=c("black","green3","red","purple"),
+        ylab="NLF objective function",
+        type="o",
+        pch=16
+        )
+abline(v=my.params["log.K"],col="blue")
+par(op)
+@ 
+\label{fig:nlf-gompertz}
+\caption{
+  Values of the NLF objective function (log of the quasilikelihood) for the Gompertz model as a function of the parameters \code{log.r,log.K} for various choices of the \code{lags} argument. 
+  All plotted curves were shifted vertically so as to have maximum value zero. 
+  The objective function was evaluated on an artificial data set of length 1000 that was created using \code{log.r=-2.3,log.K=0}, indicated by the vertical blue lines.
+}
+\end{figure}  
+
+Based on Fig.~\ref{fig:nlf-gompertz}, \code{lags=2} seems like a good choice. 
+Another consideration is the variability of parameter estimates on multiple short data sets: 
+<<nlf-multi-short,eval=F>>=
+nreps <- 100
+ndata <- 60
+fvals <- matrix(nrow=nreps,ncol=length(lags))
+new.pomp <- simulate(gompertz,times=1:ndata,nsim=nreps,seed=NULL) # nreps simulated data sets 
+for (j in 1:nreps) {
+  for (k in seq_along(lags)) {
+    fvals[j,k] <- nlf(
+                      new.pomp[[j]], 
+                      start=coef(gompertz), 
+                      nasymp=5000, 
+                      est=NULL,
+                      lags=lags[[k]],
+                      eval.only=TRUE
+                      ) 
+  }
+}
+fvals <- exp(fvals/ndata)
+@ 
+<<nlf-multi-short-eval,echo=F,eval=T,results=hide>>=
+binary.file <- "nlf-multi-short.rda"
+if (file.exists(binary.file)) {
+  load(binary.file)
+} else {
+<<nlf-multi-short>>
+  save(lags,nreps,ndata,fvals,new.pomp,file=binary.file)
+}
+@ 
+The last line above expresses the objective function as the geometric mean (quasi)likelihood per data point. 
+The variability across data sets was nearly the same for all lags:
+<<eval=T>>=
+apply(fvals,2,function(x)sd(x)/mean(x))
+@ 
+so we proceed to fit with \code{lags=2}. 
+<<nlf-fit-from-truth,eval=F>>=
+true.fit <- nlf(
+                gompertz,
+                start=coef(gompertz),
+                est=c("log.K","log.r"),
+                lags=2,
+                seed=7639873, 
+                method="Nelder-Mead",
+                trace=4,
+                nasymp=5000
+                )
+@ 
+<<nlf-fit-from-truth-eval,echo=F,eval=T,results=hide>>=
+binary.file <- "nlf-fit-from-truth.rda"
+if (file.exists(binary.file)) {
+  load(binary.file)
+} else {
+<<nlf-fit-from-truth>>
+  save(true.fit,file=binary.file)
+}
+@ 
+From \verb+true.fit$params+ and \verb+true.fit$se+ we get the estimates ($\pm$ 1 standard error)
+$\log{r}=$~\Sexpr{signif(true.fit$params["log.r"],2)}~$\pm$~\Sexpr{signif(true.fit$se["log.r"],2)}
+and $\log{K}=$~\Sexpr{signif(true.fit$params["log.K"],2)}~$\pm$~\Sexpr{signif(true.fit$se["log.K"],2)}. 
+%%$\log K = 0.081 \pm 0.064$, $\log r = -2.2 \pm 0.51$
+
+The standard errors provided by \code{nlf} are based on a Newey-West estimate of the variance-covariance matrix that is generally
+somewhat biased downward. 
+So when time permits, bootstrap standard errors are preferable. 
+When \code{nlf} is called with \code{bootstrap=TRUE}, the quasilikelihood function is evaluated on the bootstrap sample of the time series specified in \code{bootsamp}.  
+The first \code{max(lags)} observations cannot be forecast by the autoregressive model, so the size of the bootstrap sample is the length of the data series minus \code{max(lags)}: 
+%%Because of how \code{nlf} controls the random number seed, a unique seed for each draw of a bootstrap sample must be created before the samples are drawn: 
+<<echo=F>>=
+set.seed(32329L)
+@ 
+<<nlf-boot,eval=F>>=
+lags <- 2
+ndata <- length(obs(gompertz))
+nboot <- ndata-max(lags)
+nreps <- 100
+pars <- matrix(0,nreps,2)
+bootsamp <- replicate(n=nreps,sample(nboot,replace=TRUE))
+for (j in seq_len(nreps)) {
+  fit <- nlf(
+             gompertz,
+             start=coef(gompertz),
+             est=c("log.K","log.r"),
+             lags=lags,
+             seed=7639873, 
+             bootstrap=TRUE, 
+             bootsamp=bootsamp[,j],
+             skip.se=TRUE, 
+             method="Nelder-Mead",
+             trace=4,
+             nasymp=5000
+             )
+   pars[j,] <- fit$params[c("log.r","log.K")]
+}
+colnames(pars) <- c("log.r","log.K")
+@ 
+<<nlf-boot-eval,echo=F,eval=T,results=hide>>=
+binary.file <- "nlf-boot.rda"
+if (file.exists(binary.file)) {
+  load(binary.file)
+} else {
+<<nlf-boot>>
+  save(pars,file=binary.file)
+}
+@ 
+<<>>=
+apply(pars,2,sd)
+@ 
+The bootstrap standard errors are, as expected, slightly higher than the asymptotic estimates. 
+
+The code above implements a ``resampling cases'' approach to bootstrapping the data set to which the intermediate autoregressive model is fitted. 
+This is valid when observations are conditionally independent given the past observations, which is only true for a Markov process if the complete state is observed. 
+Otherwise there may be correlations, and we need to use methods for bootstrapping time series.
+In \code{nlf} it is relatively easy to implement the ``blocks of blocks'' resampling method \citep[][p.~398]{Davison1997}. 
+For example, with block length $l=3$ we resample (with replacement) observations in groups of length 3: 
+<<eval=F>>=
+bootsamp <- replicate(n=nrep,sample(nboot,size=floor(nboot/3),replace=TRUE))
+bootsamp <- rbind(bootsamp,bootsamp+1,bootsamp+2)
+@ 
+and otherwise proceed exactly as above. 
+
+%%\citet{Ellner1998,Kendall1999}
+
+\clearpage
 \section{Bayesian sequential Monte Carlo: \code{bsmc}}
 
 \citet{Liu2001b}.  To be added.

Modified: pkg/inst/doc/intro_to_pomp.pdf
===================================================================
(Binary files differ)

Modified: pkg/inst/doc/pomp.bib
===================================================================
--- pkg/inst/doc/pomp.bib	2011-05-27 19:48:16 UTC (rev 502)
+++ pkg/inst/doc/pomp.bib	2011-05-31 17:06:00 UTC (rev 503)
@@ -89,26 +89,14 @@
   timestamp = {2009.09.15}
 }
 
- at ARTICLE{Ionides2011,
-  author = {E. L. Ionides and A. Bhadra and Y. Atchad{\'e} and A. A. King},
-  title = {{I}terated filtering},
-  journal = {Annals of Statistics},
-  year = {in press},
-  abstract = {Inference for partially observed Markov process models has been a
-	longstanding methodological challenge with many scientific and engineering
-	applications. Iterated filtering algorithms maximize the likelihood
-	function for partially observed Markov process models by solving
-	a recursive sequence of filtering problems. We present new theoretical
-	results pertaining to the convergence of iterated filtering algorithms
-	implemented via sequential Monte Carlo filters. This theory complements
-	the growing body of empirical evidence that iterated filtering algorithms
-	provide an effective inference strategy for scientific models of
-	nonlinear dynamic systems. The first step in our theory involves
-	studying a new recursive approach for maximizing the likelihood function
-	of a latent variable model, when this likelihood is evaluated via
-	importance sampling. This leads to the consideration of an iterated
-	importance sampling algorithm which serves as a simple special case
-	of iterated filtering, and may have applicability in its own right.}
+ at BOOK{Davison1997,
+  title = {{B}ootstrap {M}ethods and their {A}pplication},
+  publisher = {Cambridge University Press},
+  year = {1997},
+  author = {A.C. Davison and D.V. Hinkley},
+  address = {New York},
+  owner = {kingaa},
+  timestamp = {2011.05.26}
 }
 
 @ARTICLE{Ellner1998,
@@ -160,6 +148,15 @@
   timestamp = {2007.03.13}
 }
 
+ at BOOK{Gourieroux1996,
+  title = {{S}imulation-based {E}conometric {M}ethods},
+  publisher = {Oxford University Press},
+  year = {1996},
+  author = {C. Gouri\'{e}roux and A. Monfort},
+  owner = {kingaa},
+  timestamp = {2011.05.26}
+}
+
 @ARTICLE{He2010,
   author = {He, Daihai and Ionides, Edward L. and King, Aaron A.},
   title = {{P}lug-and-play inference for disease dynamics: measles in large
@@ -193,6 +190,28 @@
   timestamp = {2009.06.26}
 }
 
+ at ARTICLE{Ionides2011,
+  author = {E. L. Ionides and A. Bhadra and Y. Atchad{\'e} and A. A. King},
+  title = {{I}terated filtering},
+  journal = {Annals of Statistics},
+  year = {in press},
+  abstract = {Inference for partially observed Markov process models has been a
+	longstanding methodological challenge with many scientific and engineering
+	applications. Iterated filtering algorithms maximize the likelihood
+	function for partially observed Markov process models by solving
+	a recursive sequence of filtering problems. We present new theoretical
+	results pertaining to the convergence of iterated filtering algorithms
+	implemented via sequential Monte Carlo filters. This theory complements
+	the growing body of empirical evidence that iterated filtering algorithms
+	provide an effective inference strategy for scientific models of
+	nonlinear dynamic systems. The first step in our theory involves
+	studying a new recursive approach for maximizing the likelihood function
+	of a latent variable model, when this likelihood is evaluated via
+	importance sampling. This leads to the consideration of an iterated
+	importance sampling algorithm which serves as a simple special case
+	of iterated filtering, and may have applicability in its own right.}
+}
+
 @ARTICLE{Ionides2006,
   author = {E. L. Ionides and C. Bret{\'o} and Aaron A. King},
   title = {{I}nference for nonlinear dynamical systems},
@@ -385,6 +404,55 @@
   url = {http://dx.doi.org/10.1073/pnas.0608571103}
 }
 
+ at ARTICLE{Smith1993,
+  author = {A. A. Smith},
+  title = {{E}stimating nonlinear time-series models using simulated vector
+	autoregression},
+  journal = {Journal of Applied Econometrics},
+  year = {1993},
+  volume = {8},
+  pages = {S63--S84},
+  owner = {kingaa},
+  timestamp = {2011.05.26}
+}
+
+ at ARTICLE{Tidd1993,
+  author = {Tidd, C. W. and Olsen, L. F. and Schaffer, W. M.},
+  title = {{T}he {C}ase for {C}haos in {C}hildhood {E}pidemics. {II}. {P}redicting
+	{H}istorical {E}pidemics from {M}athematical {M}odels},
+  journal = {Proceedings of the Royal Society of London, Series B},
+  year = {1993},
+  volume = {254},
+  pages = {257--273},
+  number = {1341},
+  month = {Dec.},
+  abstract = {The case for chaos in childhood epidemics rests on two observations.
+	The first is that historical epidemics show various `fieldmarks'
+	of chaos, such as positive Lyapunov exponents. Second, phase portraits
+	reconstructed from real-world epidemiological time series bear a
+	striking resemblance to chaotic solutions obtained from certain epidemiological
+	models. Both lines of evidence are subject to dispute: the algorithms
+	used to look for the fieldmarks can be fooled by short, noisy time
+	series, and the same fieldmarks can be generated by stochastic models
+	in which there is demonstrably no chaos at all. In the present paper,
+	we compare the predictive abilities of stochastic models with those
+	of mechanistic scenarios that admit to chaotic solutions. The main
+	results are as follows: (i) the mechanistic models outperform their
+	stochastic counterparts; (ii) forecasting efficacy of the deterministic
+	models is maximized by positing parameter values that induce chaotic
+	behaviour; (iii) simple mechanistic models are equal if not superior
+	to more detailed schemes that include age structure; and (iv) prediction
+	accuracy for monthly notifications declines rapidly with time, so
+	that, from a practical standpoint, the results are of little value.
+	By way of contrast, next amplitude maps can successfully forecast
+	successive changes in maximum incidence one or more years into the
+	future.},
+  file = {Tidd1993.pdf:Tidd1993.pdf:PDF},
+  owner = {kingaa},
+  timestamp = {2009.09.22},
+  url = {http://links.jstor.org/sici?sici=0962-8452%2819931222%29254%3A1341%3C257%3ATCFCIC%3E2.0.CO%3B2-G}
+}
+
 @ARTICLE{Wearing2009,
   author = {Helen J Wearing and Pejman Rohani},
   title = {{E}stimating the duration of pertussis immunity using epidemiological