[Pomp-commits] r499 - in pkg: R inst inst/doc

noreply at r-forge.r-project.org noreply at r-forge.r-project.org
Sun May 22 23:00:06 CEST 2011


Author: kingaa
Date: 2011-05-22 23:00:05 +0200 (Sun, 22 May 2011)
New Revision: 499

Modified:
   pkg/R/pomp.R
   pkg/inst/NEWS
   pkg/inst/doc/intro_to_pomp.Rnw
   pkg/inst/doc/intro_to_pomp.pdf
Log:

- fix bug in pomp-pomp to do with userdata slot not being preserved
- improve the intro to pomp vignette (much more explanation of the SIR model)


Modified: pkg/R/pomp.R
===================================================================
--- pkg/R/pomp.R	2011-05-20 15:43:32 UTC (rev 498)
+++ pkg/R/pomp.R	2011-05-22 21:00:05 UTC (rev 499)
@@ -517,27 +517,34 @@
             if (missing(PACKAGE)) PACKAGE <- data at PACKAGE
             if (missing(skeleton.type)) skeleton.type <- data at skeleton.type
             if (missing(skeleton)) skeleton <- data at skeleton
-            
-            pomp.constructor(
-                             data=data at data,
-                             times=times,
-                             t0=t0,
-                             rprocess=rprocess,
-                             dprocess=dprocess,
-                             rmeasure=rmeasure,
-                             dmeasure=dmeasure,
-                             skeleton=skeleton,
-                             skeleton.type=skeleton.type,
-                             initializer=initializer,
-                             covar=covar,
-                             tcovar=tcovar,
-                             obsnames=obsnames,
-                             statenames=statenames,
-                             paramnames=paramnames,
-                             covarnames=covarnames,
-                             PACKAGE=PACKAGE,
-                             ...
-                             )
+            userdata <- data at userdata
+            added.userdata <- list(...)
+            userdata[names(added.userdata)] <- added.userdata
+            do.call(
+                    pomp.constructor,
+                    c(
+                      list(
+                           data=data at data,
+                           times=times,
+                           t0=t0,
+                           rprocess=rprocess,
+                           dprocess=dprocess,
+                           rmeasure=rmeasure,
+                           dmeasure=dmeasure,
+                           skeleton=skeleton,
+                           skeleton.type=skeleton.type,
+                           initializer=initializer,
+                           covar=covar,
+                           tcovar=tcovar,
+                           obsnames=obsnames,
+                           statenames=statenames,
+                           paramnames=paramnames,
+                           covarnames=covarnames,
+                           PACKAGE=PACKAGE
+                           ),
+                      userdata
+                      )
+                    )
           }
           )
 

Modified: pkg/inst/NEWS
===================================================================
--- pkg/inst/NEWS	2011-05-20 15:43:32 UTC (rev 498)
+++ pkg/inst/NEWS	2011-05-22 21:00:05 UTC (rev 499)
@@ -1,11 +1,16 @@
 NEWS
 
-0.37-2
+0.38-1
+     o  Fixed bug in 'pomp' when 'data' is of class 'pomp': userdata should be preserved.
+
+     o	Improved discussion of SIR model in "Intro to pomp" vignette.
+
      o  In 'pfilterd.pomp' objects, 'saved.states' and 'saved.params' slots are now length-ntimes lists of arrays.
 
      o  It is now possible to use variable numbers of particles in 'pfilter'.
      	The 'Np' argument may be specified as a single number as before, or now as either a vector of numbers or a function.
 
+0.37-2
      o  The "advanced topics" vignette has been much augmented and improved.
 
      o  The 'euler.sir' example (load it with 'data(euler.sir)') has a different measurement model: a discretized normal distribution with the same mean and variance as before.

Modified: pkg/inst/doc/intro_to_pomp.Rnw
===================================================================
--- pkg/inst/doc/intro_to_pomp.Rnw	2011-05-20 15:43:32 UTC (rev 498)
+++ pkg/inst/doc/intro_to_pomp.Rnw	2011-05-22 21:00:05 UTC (rev 499)
@@ -723,13 +723,13 @@
 In \pkg{pomp}, the trajectory is computed using the \code{trajectory} function, which in turn uses the \code{skeleton} slot of the \code{pomp} object.
 The \code{skeleton} slot should be filled with the deterministic skeleton of the process model.
 In the discrete-time case, this is the map
-\begin{equation*}
+\begin{equation}\label{eq:discrete-skeleton}
   x\,\mapsto\,\expect{X_{t+{\Delta}t}\;\vert\;X_{t}=x,\theta}.
-\end{equation*}
+\end{equation}
 In the continuous-time case, this is the vectorfield
-\begin{equation*}
+\begin{equation}\label{eq:continuous-skeleton}
   x\,\mapsto\,\lim_{{\Delta}{t}\,\to\,0}\,\expect{\frac{X_{t+{\Delta}{t}}-x}{{\Delta}{t}}\;\Big{\vert}\;X_{t}=x,\theta}.
-\end{equation*}
+\end{equation}
 Our discrete-time Gompertz has the deterministic skeleton
 \begin{equation}\label{eq:gompertz-skel}
   x\,\mapsto\,K^{1-S}\,x^{S},
@@ -1099,7 +1099,6 @@
 	\end{picture}
     }
   \end{center}
-  \label{fig:SIR}
   \caption{
     Diagram of the SIR model.
     The host population is divided into three classes according to their infection status: 
@@ -1108,162 +1107,330 @@
     R, recovered and immune hosts. 
     Births result in new susceptibles and all individuals have a common death rate $\mu$.
     Since the birth rate equals the death rate, the expected population size, $N=S+I+R$, remains constant.
-    The S$\to$I rate, $\lambda$, called the \emph{force of infection}, depends on the number of infectious individuals according to $\lambda(t)={\beta(t)\,\xi(t)\,I}/{N}$,
-    where $\beta(t)$ is the seasonal (periodic) contact rate and $\xi(t)$ is a white noise.
+    The S$\to$I rate, $\lambda$, called the \emph{force of infection}, depends on the number of infectious individuals according to $\lambda(t)={\beta\,I}/{N}$.
     The I$\to$R, or recovery, rate is $\gamma$.
-    The case reports, $C$, count infected individuals with probability $\rho$. 
+    The case reports, $C$, result from a process by which infections are recorded with probability $\rho$. 
     Since diagnosed cases are treated with bed-rest and hence removed, infections are counted upon transition to R. 
   }
+  \label{fig:SIR}
 \end{figure}
 
-The SIR model is a mainstay of theoretical epidemiology.
-It has the deterministic skeleton
+\subsection{The stochastic SIR model.}
+
+A mainstay of theoretical epidemiology, the SIR model describes the progress of a contagious infection through a population of hosts.
+The hosts are divided into three classes, according to their status vis-a-vis the infection (Fig.~\ref{fig:SIR}).
+The S class contains those that have not yet been infected and are thereby still susceptible to it;
+the I class comprises those who are currently infected and, by assumption, infectious;
+the R class includes those who have recovered from the infection.
+The latter are assumed to be immune against reinfection.
+We let $S(t)$, $I(t)$, and $R(t)$ represent the numbers of individuals within the respective classes at time $t$.
+
+It is natural to formulate this model as a continuous-time Markov process.
+In this process, the numbers of individuals within each class change through time in whole-number increments.
+In particular, individuals move between classes (entering S at birth, moving thence to I, and on to R unless death arrives first) at random times.
+Thus, the numbers of births and class-transitions that occur in any interval of time are random variables.
+The birth rate, death rates, and the rate of transition, $\gamma$, from I to R are frequently assumed to be constants, specific to the infection and the host population.
+Crucially, the I to R transition rate, the so-called \emph{force of infection}, is not constant, but depends on the current number of infectious individuals.
+The assumption that transmission is \emph{frequency dependent}, as for example when each individual realizes a fixed number of contacts per unit time, corresponds to the assumption $\lambda(t) = \beta\,I(t)/N$, where $\beta$ is known as the contact rate and $N=S+I+R$ is the population size.
+This assumption introduces the model's only nonlinearity.
+It is useful sometimes to further assume that birth and death rates are equal and independent of infection status---call the common rate $\mu$---which has the consequence that the expected population size then remains constant.
+
+It is typically impossible to monitor $S$, $I$, and $R$, directly. 
+Relatively commonly, however, records of \emph{cases}, i.e., individual infections, are kept by public health authorities.
+The number of cases, $C(t_1,t_2)$, recorded within a given reporting interval $[t_1,t_2)$ might perhaps be modeled by a binomial process
 \begin{equation*}
+  C(t_1,t_2)\;\sim\;\mathrm{binomial}({\Delta}_{I{\to}R}(t_1,t_2),\rho)
+\end{equation*}
+where $\Delta_{I{\to}R}(t_1,t_2)$ is the accumulated number of recoveries that have occured over the interval in question and $\rho$ is the \emph{reporting rate}, i.e., the probability that a given infection is observed and recorded.
+
+The model's deterministic skeleton is a system of nonlinear ordinary differential equations---a vectorfield---on the space of positive values of $S$, $I$, and $R$ (cf. Eq.~\ref{eq:continuous-skeleton}).
+Specifically, the SIR deterministic skeleton is
+\begin{equation*}
   \begin{aligned}
-    &\frac{dS}{dt}=\mu\,(N-S)-\beta(t)\,\frac{I}{N}\,S\\
-    &\frac{dI}{dt}=\beta(t)\,\frac{I}{N}\,S-\gamma\,I-\mu\,I\\
+    &\frac{dS}{dt}=\mu\,(N-S)-\beta\,\frac{I}{N}\,S\\
+    &\frac{dI}{dt}=\beta\,\frac{I}{N}\,S-\gamma\,I-\mu\,I\\
     &\frac{dR}{dt}=\gamma\,I-\mu\,R\\
   \end{aligned}
 \end{equation*}
-Here $N=S+I+R$ is the (constant) population size and $\beta$ is a time-dependent contact rate.
-We'll assume that the contact rate is periodic and implement it as a covariate.
-As an additional wrinkle, we'll assume that the rate of the infection process $\beta\,I/N$ is perturbed by white noise.
 
-As in the earlier example, we need to write a function that will simulate the process.
-We can use \code{gillespie.sim} to implement this using the exact stochastic simulation algorithm of \citet{Gillespie1977a}.
-This will be quite slow and inefficient, however, so we'll use the so-called ``tau-leap'' algorithm, one version of which is implemented in \pkg{pomp} using Euler-multinomial processes.
-Before we do this, we'll first define the basis functions that will be used for the seasonality.
-It is convenient to use periodic B-splines for this purpose.
-The following codes set up this basis.
-<<seas-basis>>=
-tbasis <- seq(0,25,by=1/52)
-basis <- periodic.bspline.basis(tbasis,nbasis=3)
-colnames(basis) <- paste("seas",1:3,sep='')
-@ 
+\subsection{Implementing the SIR model in \pkg{pomp}.}
 
-Now we'll define the process model simulator.
-Since we have covariates now, our function will have one additional argument, \code{covars}, which will contain the value of each covariate at time \code{t}, established using linear interpolation if necessary.
-<<sir-proc-sim-def>>=
-sir.proc.sim <- function (x, t, params, covars, delta.t, ...) {
-  params <- exp(params)
-  with(
-       as.list(c(x,params,covars)),
-       {
-         beta <- exp(sum(log(c(beta1,beta2,beta3))*c(seas1,seas2,seas3)))
-         beta.var <- beta.sd^2
-         dW <- if (beta.var>0) 
-           rgamma(n=1,shape=delta.t/beta.var,scale=beta.var) 
-         else 
-           delta.t
-         foi <- (iota+beta*I*dW/delta.t)/pop
-         trans <- c(
-                    rpois(n=1,lambda=mu*pop*delta.t),
-                    reulermultinom(n=1,size=S,rate=c(foi,mu),dt=delta.t),
-                    reulermultinom(n=1,size=I,rate=c(gamma,mu),dt=delta.t),
-                    reulermultinom(n=1,size=R,rate=c(mu),dt=delta.t)
-                    )
-         c(
-           S=S+trans[1]-trans[2]-trans[3],
-           I=I+trans[2]-trans[4]-trans[5],
-           R=R+trans[4]-trans[6],
-           cases=cases+trans[4],
-           W=if (beta.sd>0) W+(dW-delta.t)/beta.sd else W
-           )
-       }
-       )
+As before, we'll need to write functions to implement some or all of the SIR model's \code{rprocess}, \code{dprocess}, \code{rmeasure}, \code{dmeasure}, and \code{skeleton} components.
+It turns out to be relatively straightforward to implement all of these but \code{dprocess}.
+
+For the \code{rprocess} portion, we can use \code{gillespie.sim} to implement the continuous-time Markov process exactly using the stochastic simulation algorithm of \citet{Gillespie1977a}.
+For many practical purposes, however, this will prove quite slow and inefficient.
+If we are willing to live with an approximate simulation scheme, we can use the the so-called ``tau-leap'' algorithm, one version of which is implemented in \pkg{pomp} as the \code{euler.sim} plug-in.
+This algorithm holds the transition rates $\lambda$, $\mu$, $\gamma$ constant over a small interval of time ${\delta}t$ and simulates the numbers of births, deaths, and transitions that occur over that interval.
+It then updates the state variables $S$, $I$, $R$ accordingly, increments the time variable by ${\delta}t$, recomputes the transition rates, and repeats.
+Naturally, as ${\delta}t\to 0$, this approximation to the true continuous-time process becomes better and better.
+The critical feature from the inference point of view, however, is that no relationship need be assumed between the Euler simulation interval ${\delta}t$ and the reporting interval, which itself need not even be the same from one observation to the next.
+
+Under this assumption, the number of individuals leaving any of the classes by all available routes over a particular time interval becomes a multinomial process.
+In particular, the probability that an S individual, for example, becomes infected is $p_{S{\to}I}=\frac{\lambda(t)}{\lambda(t)+\mu}\,(1-e^{-(\lambda(t)+\mu)\,{\delta}t})$; 
+the probability that an S individual dies before becoming infected is $p_{S{\to}}=\frac{\mu}{\lambda(t)+\mu}\,(1-e^{-(\lambda(t)+\mu)\,{\delta}t})$;
+and the probability that neither happens is $1-p_{S{\to}I}-p_{S{\to}}=e^{-(\lambda(t)+\mu)\,{\delta}t}$.
+Thus, if $\Delta_{S{\to}I}$ and $\Delta_{S{\to}}$ are the numbers of S individuals acquiring infection and dying, respectively, in the Euler simulation interval $(t,t+{\delta}t)$, then
+\begin{equation*}
+  (\Delta_{S{\to}I},\Delta_{S\to},S-\Delta_{S{\to}I}-\Delta_{S\to})\;\sim\;\mathrm{multinomial}\left(S(t);p_{S{\to}I},p_{S{\to}},1-p_{S{\to}I}-p_{S{\to}}\right),
+\end{equation*}
+Now, the expression on the right arises with sufficient frequency in compartmental models like the SIR that \pkg{pomp} has special functions for it.
+In \pkg{pomp}, the random variable $(\Delta_{S{\to}I},\Delta_{S\to})$ above is said to have an \emph{Euler-multinomial} distribution.
+The \pkg{pomp} functions \code{reulermultinom} and \code{deulermultinom} provide facilities for drawing random deviates from, and computing the p.d.f.\ of, such distributions.
+As the help pages relate, \code{reulermultinom} and \code{deulermultinom} parameterize the Euler-multinomial distributions by the size ($S(t)$ in the example above), rates ($\lambda(t)$ and $\mu$), and time interval ${\delta}t$.
+Obviously, the Euler-multinomial distributions generalize to an arbitrary number of exit routes.
+
+The help (\code{?euler.sim}) informs us that to use \code{euler.sim}, we need to specify a function that advances the states from $t$ to $t+{\delta}t$.
+The function \code{sir.proc.sim}, defined here, does this.
+<<sir-proc-sim-def,keep.source=T>>=
+sir.proc.sim <- function (x, t, params, delta.t, ...) {
+  ## untransform the parameters
+  N <- exp(params["N"])             # population size
+  gamma <- exp(params["gamma"])     # recovery rate
+  mu <- exp(params["mu"])           # birth rate = death rate
+  beta <- exp(params["beta"])       # contact rate
+  foi <- beta*x["I"]/N                       # the force of infection
+  trans <- c(
+             rpois(n=1,lambda=mu*N*delta.t), # births are assumed to be Poisson
+             reulermultinom(n=1,size=x["S"],rate=c(foi,mu),dt=delta.t), # exits from S
+             reulermultinom(n=1,size=x["I"],rate=c(gamma,mu),dt=delta.t), # exits from I
+             reulermultinom(n=1,size=x["R"],rate=c(mu),dt=delta.t)        # exits from R
+             )
+  ## now connect the compartments
+  x+c(
+      trans[1]-trans[2]-trans[3],
+      trans[2]-trans[4]-trans[5],
+      trans[4]-trans[6],
+      trans[4]                          # accumulate the recoveries
+    )
 }
 @ 
-Let's look at this definition in a bit of detail.
-We will be log-transforming the parameters: the first line untransforms them.
-Here, we use \code{with} to make the codes a bit easier to read.
-The variable \code{beta} will be the transmission rate: 
-the time-dependence of this rate is parameterized using the basis functions, the current values have been passed via the \code{covars} argument.
-The next lines make a draw, \code{dW}, from a Gamma random variable which will model environmental stochasticity as white noise in the transmission process \citep{Breto2009,He2010}.
-The next line computes the force of infection, \code{foi}.
-\code{trans} is next filled with random draws of all the transitions between the S, I, and R compartments.
-Births are modeled using a Poisson distribution, the number of births is stored in \code{trans[1]}. 
-Individuals leave the S class through either death or infection: \code{trans[2]} will contain the number infected, \code{trans[3]} the number dead.
-The number leaving the I and R classes are handled similarly.
-See the documentation on \code{reulermultinom} for a more thorough explanation of how this function works.
-Finally, a named vector is returned that contains the new values of the state variables, each of which is the old value, adjusted by the transitions.
-Note that the state variable \code{cases} accumulates the number of I$\to$R transitions and \code{W} accumulates the (standardized) white noise.
+Note that we've assumed here that the parameters have been log-transformed.
 
-Now we're ready to construct the \code{pomp} object.
-<<sir-pomp-def>>=
-pomp(
-     data=data.frame(
-       time=seq(1/52,4,by=1/52),
-       reports=NA
-       ),
-     times="time",
-     t0=0,
-     tcovar=tbasis,
-     covar=basis,
-     rprocess=euler.sim(
-       step.fun=sir.proc.sim,
-       delta.t=1/52/20
-       ),
-     measurement.model=reports~binom(size=cases,prob=exp(rho)),
-     zeronames=c("cases"),
-     initializer=function(params, t0, comp.names, ...){
-       p <- exp(params)
-       snames <- c("S","I","R","cases","W")
-       fracs <- p[paste(comp.names,"0",sep=".")]
-       x0 <- numeric(length(snames))
-       names(x0) <- snames
-       x0[comp.names] <- round(p['pop']*fracs/sum(fracs))
-       x0
-     },
-     comp.names=c("S","I","R")
-     ) -> sir
-@ 
-The specification of \code{data}, \code{times}, and \code{t0} should be familiar.
-The covariates are specified using the arguments \code{tcovar} and \code{covar}.
-We use \code{euler.sim} to specify the process simulator (\code{rprocess}).
-We are approximating the continuous-time process using an Euler simulator with a time-step of 1/20 of a week.
-Both \code{rmeasure} and \code{dmeasure} can be specified at once using the \code{measurement.model} argument.
-Here, we model the observation process using a binomial process, where the \emph{reporting rate}, \code{rho}, is the probability that a case is reported.
+Two significant wrinkles remains to be explained.
+First, notice that in \code{sir.proc.sim}, the state variable \code{cases} accumulates the total number of recoveries.
+Thus, \code{cases} will be a counting process and, in particular, will be nondecreasing with time.
+In fact, the number of recoveries within an interval, ${\Delta}_{I{\to}R}(t_1,t_2)=\mathtt{cases}(t_2)-\mathtt{cases}(t_1)$.
+Clearly, including \code{cases} as a state variable violates the Markov assumption.
 
-As we noted before, the state variable \code{cases} accumulates the number of cases (i.e., the number of I$\to$R transitions).
-However, the data are not the cumulative number of cases, but the number of cases that have occurred since the last report.
-Specifying \code{cases} in the \code{zeronames} argument has the effect of re-setting the state variable \code{cases} to zero after each observation.
+However, this is not an essential violation.
+Because none of the rates $\lambda$, $\mu$, or $\gamma$ depend on \code{cases}, the process remains essentially Markovian.
+We still have a difficulty with the measurement process, however, in that our data are assumed to be records of infections resolving within a given interval while the process model keeps track of the accumulated number of infections that have resolved since the record-keeping began.
+We can get around this difficulty by re-setting \code{cases} to zero immediately after each observation.
+We tell \pkg{pomp} to do this using the \code{pomp}'s \code{zeronames} argument, as we will see in a moment.
 
-Finally, in this example, we do not use the default parameterization of the initial states.
-Instead, we specify a custom \code{initializer} argument.
-We want instead to parameterize the initial states in terms of the fractions of the total population contained in each compartment.
-In particular, as we see in the \code{initializer} argument to \code{pomp}, we normalize so that the sum of \code{S.0}, \code{I.0}, and \code{R.0} is 1, then multiply by the initial population size, and then round to the nearest whole number.
-Note that the initializer we have specified needs an argument \code{comp.names} (the names of the S, I, and R state variables).
-This is set in the last line.
-More generally, one can give any number or kind of additional arguments to \code{pomp}:
-they will be passed to the \code{initializer}, \code{rprocess}, \code{dprocess}, \code{rmeasure}, \code{dmeasure}, and \code{skeleton} functions, if these exist.
-This feature aids in the writing of customized \code{pomp} objects.
-
-Now we'll simulate data using the parameters
-<<sir-params>>=
-theta <- c(
-           gamma=26,mu=1/50,iota=10,
-           beta1=1200,beta2=1500,beta3=900,
-           beta.sd=1e-2,
-           pop=5e5,
-           rho=0.6,
-           S.0=26/1200,I.0=0.001,R.0=1-0.001-26/1200
-           )
+The second wrinkle has to do with the initial conditions, i.e., the states $S(t_0)$, $I(t_0)$, $R(t_0)$.
+By default, \pkg{pomp} will allow us to specify these initial states arbitrarily.
+For the model to be consistent, they should be positive integers that sum to the population size $N$.
+We can enforce this constraint by customizing the parameterization of our initial conditions.
+We do this in by specializing a custom \code{initializer} in the call to \code{pomp} that constructs the \code{pomp} object.
+Let's construct it now and fill it with simulated data.
+<<sir-pomp-def>>=
+sir <- simulate(
+                pomp(
+                     data=data.frame(
+                       time=seq(1/52,15,by=1/52),
+                       reports=NA
+                       ),
+                     times="time",
+                     t0=0,
+                     rprocess=euler.sim(
+                       step.fun=sir.proc.sim,
+                       delta.t=1/52/20
+                       ),
+                    skeleton.type="vectorfield",
+                    skeleton=function(t,x,params,...){
+                       N <- exp(params["N"])             # population size
+                       gamma <- exp(params["gamma"])     # recovery rate
+                       mu <- exp(params["mu"])           # birth rate = death rate
+                       beta <- exp(params["beta"])       # contact rate
+                       foi <- beta*x["I"]/N # the force of infection
+                       trans <- c(
+                                  mu*N,               # births
+                                  x["S"]*c(foi,mu),   # exits from S
+                                  x["I"]*c(gamma,mu), # exits from I
+                                  x["R"]*mu           # exits from R
+                                  )
+                       ## now connect the compartments  
+                       x+c(
+                           trans[1]-trans[2]-trans[3],
+                           trans[2]-trans[4]-trans[5],
+                           trans[4]-trans[6],
+                           (trans[2]-trans[4]-trans[5])*gamma/52 # approximately the number of new infections/week
+                           )
+                     },
+                     measurement.model=reports~binom(size=cases,prob=exp(rho)),
+                     initializer=function(params, t0, ic.pars, comp.names, ...){
+                       x0 <- c(S=0,I=0,R=0,cases=0)
+                       N <- exp(params["N"])
+                       fracs <- exp(params[ic.pars])
+                       x0[comp.names] <- round(N*fracs/sum(fracs))
+                       x0
+                     },
+                     zeronames=c("cases"), # zero out this variable after each observation
+                     ic.pars=c("S0","I0","R0"), # initial condition parameters
+                     comp.names=c("S","I","R")          # names of the compartments
+                     ),
+                params=log(
+                  c(
+                    N=50000,
+                    beta=60,gamma=8,mu=1/50,
+                    rho=0.6,
+                    S0=8/60,I0=0.002,R0=1-8/60-0.001
+                    )
+                  ),
+                seed=677573454L
+                )
 @ 
-<<sir-first-sim>>=
-sir <- simulate(sir,nsim=1,params=log(theta),seed=329348545L)
-@ 
-Figure~\ref{fig:sir-plot} shows the simulated data and state variable trajectories.
+Notice that we are assuming here that the data are collected weekly and use an Euler step-size of 1/20~wk.
+Here, we've assumed an infection with an infectious period of $1/\gamma=1/8$~yr and a basic reproductive number, $R_0$ of $\beta/(\gamma+\mu)\approx 7.5$.
+We've assumed a host population size of 50,000 and 60\% reporting efficiency.
+Fig.~\ref{fig:sir-plot} shows one realization of this process.
 
 \begin{figure}
 <<sir-plot,fig=T,echo=F>>=
-data(euler.sir)
-sir <- simulate(euler.sir,nsim=1,params=c(log(theta),nbasis=3,period=1,degree=3),seed=329348545L)
 plot(sir)
 @   
   \caption{Results of \code{plot(sir)}.}
   \label{fig:sir-plot}
 \end{figure}
 
+\subsection{Complications: seasonality, imported infections, extra-demographic stochasticity.}
+
+Let's add a bit of real-world complexity to the simple SIR model.
+We'll modify the model to take three facts into account:
+\begin{inparaenum}[(i)]
+  \item For many infections, the contact rate is \emph{seasonal}: $\beta=\beta(t)$ is a periodic function of time.
+  \item No host population is truly closed: \emph{imported infections} arise when infected individuals visit the host population and transmit.
+  \item Stochastic fluctuation in the rates $\lambda$, $\mu$, and $\gamma$ can give rise to \emph{extrademographic stochasticity}, i.e., random process variability beyond the purely demographic stochasticity we've included so far.
+\end{inparaenum}
+
+One way to incorporate seasonality into the model is to assume some functional form for $\beta(t)$.
+Alternatively, we can use flexible functions to allow $\beta$ to take a variety of shapes.
+B-splines are useful in this regard and \pkg{pomp} provides some simple facilities for using these.
+If $s_{i}(t)$, $i=1,\dots,k$ is a periodic B-spline basis, as in Fig.~\ref{fig:seas-basis-plot}, then we can for example define
+\begin{equation*}
+  \log\beta(t)=\sum_{i}\!b_{i}\,s_{i}(t)
+\end{equation*}
+and, by varying the coefficients $b_{i}$, we can obtain a wide variety of shapes for $\beta(t)$.
+In \pkg{pomp}, we can define a set of periodic B-spline basis functions by doing:
+<<seas-basis>>=
+tbasis <- seq(0,20,by=1/52)
+basis <- periodic.bspline.basis(tbasis,nbasis=3,degree=2,period=1,names="seas%d")
+@ 
+This results in a data-frame with \Sexpr{ncol(basis)} columns; 
+each column is a quadratic periodic B-spline over the 20~yr domain, with period 1~yr.
+Fig.~\ref{fig:seas-basis-plot} shows these basis functions.
+Effectively, \code{tbasis} and \code{basis} function as a look-up table that can be used by the \code{rprocess} simulator to obtain a seasonal contact rate, $\beta(t)$.
+We accomplish this using the \code{covar} and \code{tcovar} arguments to \code{pomp}, as we will see below.
+
+There are a number of ways to take account of imported infections.
+Here, we'll simply assume that there is some background force of infection, $\iota$, not due to I-class individuals.
+Putting this together with the seasonal contact rate results in a force of infection $\lambda(t)=\beta(t)\,I(t)/N+\iota$.
+
+Finally, we can allow for extrademographic stochasticity by allowing the force of infection to be itself a random variable.
+We'll accomplish this by assuming a multiplicative white noise on the force of infection, i.e.,
+\begin{equation*}
+  \lambda(t) = \left(\beta(t)\,\frac{I(t)}{N}+\iota\right)\,\frac{dW(t)}{dt},
+\end{equation*}
+where $dW/dt$ is a white noise, specifically the ``derivative'' of an integrated Gamma white noise process.
+\citet{He2010} discuss such processes and apply them in an inferential context.
+
+Let's modify the process-model simulator to incorporate these three complexities.
+<<complex-sir-def>>=
+complex.sir.proc.sim <- function (x, t, params, delta.t, covars, ...) {
+  ## untransform the parameters
+  N <- exp(params["N"])                 # population size
+  gamma <- exp(params["gamma"])         # recovery rate
+  mu <- exp(params["mu"])               # birth rate = death rate
+  iota <- exp(params["iota"])           # import rate
+  b <- params[c("b1","b2","b3")] # contact-rate coefficients
+  beta <- exp(b%*%covars)       # flexible seasonality
+  beta.sd <- exp(params["beta.sd"])     # extrademographic noise intensity
+  beta.var <- beta.sd*beta.sd
+  if (beta.var > 0) 
+    dW <- rgamma(n=1,shape=delta.t/beta.var,scale=beta.var)
+  else
+    dW <- delta.t
+  foi <- (beta*x["I"]/N+iota)*dW/delta.t # the force of infection
+  trans <- c(
+             rpois(n=1,lambda=mu*N*delta.t), # births are assumed to be Poisson
+             reulermultinom(n=1,size=x["S"],rate=c(foi,mu),dt=delta.t), # exits from S
+             reulermultinom(n=1,size=x["I"],rate=c(gamma,mu),dt=delta.t), # exits from I
+             reulermultinom(n=1,size=x["R"],rate=c(mu),dt=delta.t)        # exits from R
+             )
+  ## now connect the compartments
+  x+c(
+      trans[1]-trans[2]-trans[3],
+      trans[2]-trans[4]-trans[5],
+      trans[4]-trans[6],
+      trans[4],                         # accumulate the recoveries
+      (dW-delta.t)                      # mean = 0, sd = (beta.sd^2 delta.t)
+    )
+}
+complex.sir <- simulate(
+                        pomp(
+                             sir,
+                             tcovar=tbasis,
+                             covar=basis,
+                             rprocess=euler.sim(
+                               complex.sir.proc.sim,
+                               delta.t=1/52/20
+                               ),
+                             initializer=function(params, t0, ic.pars, comp.names, ...){
+                               x0 <- c(S=0,I=0,R=0,cases=0,W=0)
+                               N <- exp(params["N"])
+                               fracs <- exp(params[ic.pars])
+                               x0[comp.names] <- round(N*fracs/sum(fracs))
+                               x0
+                               }
+                             ),
+                        params=log(
+                          c(
+                            N=50000,
+                            b1=60,b2=10,b3=110,
+                            gamma=8,mu=1/50,
+                            rho=0.6,
+                            iota=0.01,beta.sd=0.1,
+                            S0=8/60,I0=0.002,R0=1-8/60-0.001
+                            )
+                          ),
+                        seed=8274355L
+                        )
+@ 
+Note that the seasonal basis functions are passed to \code{complex.sir.proc.sim} via the \code{covars} argument.
+Whenever \code{complex.sir.proc.sim} is called, this argument will contain values of the covariates obtained from the look-up table.
+
+\begin{figure}
+<<seas-basis-plot,echo=F,height=4,width=6,fig=T>>=
+op <- par(mar=c(5,5,1,5))
+matplot(tbasis,basis,xlim=c(0,2),type='l',lwd=2,bty='u',
+        lty=1,col=c("red","blue","orange"),xlab="time (yr)",ylab="seasonal basis functions")
+bb <- coef(complex.sir,c("b1","b2","b3"))
+plot.window(c(0,2),c(0,1)*exp(max(bb)))
+lines(tbasis,exp(basis%*%bb),col="black",lwd=3,lty=1)
+lines(tbasis,exp(basis%*%bb),col="white",lwd=2.5,lty="23")
+axis(side=4)
+mtext(side=4,line=2,text=bquote(beta(t)==exp(.(b1)*s[1]+.(b2)*s[2]+.(b3)*s[3]),where=as.list(exp(coef(complex.sir)))))
+par(op)
+@   
+  \caption{
+    Periodic B-spline basis functions can be used to construct flexible periodic functions.
+    The colored lines show the three basis functions.
+    The dashed black line shows the seasonality $\beta(t)$ assumed in \code{complex.sir}.
+  }
+  \label{fig:seas-basis-plot}
+\end{figure}
+
+
+\begin{figure}
+<<complex-sir-plot,echo=F,fig=T>>=
+plot(complex.sir)
+@   
[TRUNCATED]

To get the complete diff run:
    svnlook diff /svnroot/pomp -r 499


More information about the pomp-commits mailing list