Licensed under the Creative Commons Attribution-NonCommercial license. Please share and remix noncommercially, mentioning its origin.
This document has its origins in the SISMID short course on Simulation-based Inference given by Aaron King and Edward Ionides.
Produced with R version 3.6.3 and pomp version 2.8.
Important Note: These materials have been updated for use with version 2.8. As of version 2, pomp syntax has changed substantially. These changes are documented on the pomp website.
This tutorial develops some classes of dynamic models of relevance in biological systems, especially epidemiology. We have the following goals:
Compartmental models are of great utility in many disciplines and very much so in epidemiology. Let us derive deterministic and stochastic versions of the susceptible-infected-recovered (SIR) model of disease transmission dynamics in a closed population. In so doing, we will use notation that generalizes to more complex systems (Bretó et al. 2009).
Together with initial conditions specifying S(0), I(0) and R(0), we just need to write down ordinary differential equations (ODE) for the flow counting processes. These are, dNSIdt=μSI(t)S(t),dNIRdt=μIRI(t).
Find the expected value of NSI(t+δ)−NSI(t) and NIR(t+δ)−NIR(t) given the current state, S(t), I(t) and R(t). Take the limit as δ→0 and show that this gives the ODE model.
Recall the simple continuous-time Markov chain interpretation of the SIR model without demography: P[NSI(t+δ)=NSI(t)+1]=μSI(t)S(t)δ+o(δ),P[NIR(t+δ)=NIR(t)+1]=μIRI(t)δ+o(δ).
We look for a numerical solution with state variables ˜S(kδ), ˜I(kδ), ˜R(kδ).
The counting processes for the flows between compartments are ˜NSI(t) and ˜NIR(t). The counting processes are related to the numbers of individuals in the compartments by the same flow equations we had before: Δ˜S=−Δ˜NSIΔ˜I=Δ˜NSI−Δ˜NIRΔ˜R=Δ˜NIR,
Let’s focus NSI(t); the same methods can also be applied to NIR(t).
What are the advantages and disadvantages of these different schemes? Conceptually, it is simplest to think of (1) or (2). Numerically, it is usually preferable to implement (3).
The Euler method extends naturally to stochastic differential equations. A natural way to add stochastic variation to an ODE dx/dt=h(x) is dXdt=h(X)+σdBdt where B(t) is Brownian motion and so dB/dt is Gaussian white noise. The so-called Euler-Maruyama approximation ˜X is generated by ˜X((k+1)δ)=˜X(kδ)+δh(˜X(kδ))+σ√δZk where Z1,Z2,… is a sequence of independent standard normal random variables, i.e., Zk∼Normal(0,1). Although SDEs are often considered an advanced topic in probability, the Euler approximation doesn’t demand much more than familiarity with the normal distribution.
Write down the Euler-Maruyama method for an SDE representation of the closed-population SIR model. Consider some difficulties that might arise with non-negativity constraints, and propose some practical way one might deal with that issue.
As an example that we can probe in some depth, let’s look at an isolated outbreak of influenza that occurred in a boarding school for boys in England (Anonymous 1978). Download the data and examine it:
read.table("http://kingaa.github.io/short-course/stochsim/bsflu_data.txt") -> bsflu
head(bsflu)
## B C day
## 1978-01-22 1 0 1
## 1978-01-23 6 0 2
## 1978-01-24 26 0 3
## 1978-01-25 73 1 4
## 1978-01-26 222 8 5
## 1978-01-27 293 16 6
The variable B
refers to boys confined to bed and C
to boys in convalescence. Let’s restrict our attention for the moment to the B
variable.
Let’s assume that B indicates the number of boys confined to bed the preceding day and that the disease follows the simple SIR model. Our tasks will be, first, to estimate the parameters of the SIR and, second, to decide whether or not the SIR model is an adequate description of these data.
Below is a diagram of the SIR model. The host population is divided into three classes according to their infection status: S, susceptible hosts; I, infected (and infectious) hosts; R, recovered and immune hosts. The rate at which individuals move from S to I is the force of infection, λ=μSI=βI/N, while that at which individuals move into the R class is μIR=γ.
Let’s look at how we can view the SIR as a POMP model. The unobserved state variables, in this case, are the numbers of individuals, S, I, R in the S, I, and R compartments, respectively. It’s reasonable in this case to view the population size N=S+I+R, as fixed. The numbers that actually move from one compartment to another over any particular time interval are modeled as stochastic processes. In this case, we’ll assume that the stochasticity is purely demographic, i.e., that each individual in a compartment at any given time faces the same risk of exiting the compartment.
To implement the model in pomp, the first thing we need is a stochastic simulator for the unobserved state process. We’ve seen that there are several ways of approximating the process just described for numerical purposes. An attractive option here is to model the number moving from one compartment to the next over a very short time interval as a binomial random variable. In particular, we model the number, ΔNSI, moving from S to I over interval Δt as ΔNSI∼Binomial(S,1−e−λΔt), and the number moving from I to R as ΔNIR∼Binomial(I,1−e−γΔt).
A Csnippet
that encodes such a simulator is as follows:
sir_step <- Csnippet("
double dN_SI = rbinom(S,1-exp(-Beta*I/N*dt));
double dN_IR = rbinom(I,1-exp(-gamma*dt));
S -= dN_SI;
I += dN_SI - dN_IR;
R += dN_IR;
")
At day zero, we’ll assume that I=1 and R=0, but we don’t know how big the school is, so we treat N as a parameter to be estimated and let S(0)=N−1. Thus an initializer Csnippet
is
sir_init <- Csnippet("
S = N-1;
I = 1;
R = 0;
")
We fold these Csnippets
, with the data, into a pomp
object thus:
pomp(bsflu,time="day",t0=0,rprocess=euler(sir_step,delta.t=1/6),
rinit=sir_init,paramnames=c("N","Beta","gamma"),
statenames=c("S","I","R")) -> sir
Now let’s assume that the case reports, B, result from a process by which new infections result in confinement with probability ρ, which we can think of as the probability that an infection is severe enough to be noticed by the school authorities. Since confined cases have, presumably, a much lower transmission rate, let’s treat B as being a count of the number of boys who have moved from I to R over the course of the past day. We need a variable to track this. Let’s modify our Csnippet
above, adding a variable H to track the incidence. We’ll then replace the rprocess
with the new one.
sir_step <- Csnippet("
double dN_SI = rbinom(S,1-exp(-Beta*I/N*dt));
double dN_IR = rbinom(I,1-exp(-gamma*dt));
S -= dN_SI;
I += dN_SI - dN_IR;
R += dN_IR;
H += dN_IR;
")
sir_init <- Csnippet("
S = N-1;
I = 1;
R = 0;
H = 0;
")
pomp(sir,rprocess=euler(sir_step,delta.t=1/6),rinit=sir_init,
paramnames=c("Beta","gamma","N"),statenames=c("S","I","R","H")) -> sir
Now, we’ll model the data, B, as a binomial process, Bt∼Binomial(H(t)−H(t−1),ρ). But we have a problem, since at time t, the variable H
we’ve defined will contain H(t), not H(t)−H(t−1). We can overcome this by telling pomp
that we want H
to be set to zero immediately following each observation. We do this by setting the accumvars
argument to pomp
:
pomp(sir,accumvars="H") -> sir
Now, to include the observations in the model, we must write an rmeasure
component:
rmeas <- Csnippet("B = rbinom(H,rho);")
and put these into our pomp
object:
sir <- pomp(sir,rmeasure=rmeas,statenames="H",paramnames="rho")
Let’s perform some simulations, just to verify that our codes are working as intended. To do so, we’ll need some parameters. A little thought will get us some ballpark estimates. In the data, it looks like there were a total of 1540 infections, so the population size, N, must be somewhat in excess of this number. In fact, we can use the final-size equation R0=−log(1−f)f, where f=R(∞)/N is the final size of the epidemic, together with the idea that R0 for influenza is typically thought to be around 1.5, to estimate that f≈0.6, whence N≈2600. If the infectious period is roughly 1 da, then 1/γ≈1 da and β=γR0≈1.5 da−1. Let’s simulate the model at these parameters.
sims <- simulate(sir,params=c(Beta=1.5,gamma=1,rho=0.9,N=2600),
nsim=20,format="data.frame",include.data=TRUE)
ggplot(sims,mapping=aes(x=day,y=B,group=.id,color=.id=="data"))+
geom_line()+guides(color=FALSE)
Fiddle with the parameters to see if you can’t find parameters for which the data are a more plausible realization.
Below is a diagram of the so-called SEIR model. This differs from the SIR model in that infected individuals must pass a period of latency before becoming infectious.
Modify the codes above to construct a pomp
object containing the flu data and an SEIR model. Perform simulations as above and adjust parameters to get a sense of whether improvement is possible by including a latent period.
In the preceding, we’ve been assuming that Bt represents the number of boys sent to bed on day t. Actually, this isn’t correct at all. As described in the report (Anonymous 1978), Bt represents the total number of boys in bed on day t. Since boys were potentially confined for more than one day, the data count each infection multiple times. On the other hand, we have information about the total number of boys at risk and the total number who were infected. In fact, we know that N=763 boys were at risk and 512 boys in total spent between 3 and 7 days away from class (either in bed or convalescent). Moreover, we have data on the number of boys, Ct, convalescent at day t. Since 1540 boy-da/512 boy≈3 da, we know that the average duration spent in bed was 3 da and, since ∑tCt=924, we can infer that the average time spent convalescing was 924 boy-da/512 boy≈1.8 da.
Formulate a model with both confinement and convalescent stages. Implement it in pomp using a compartmental model like that diagrammed below.
You will have to give some thought to just how to model the relationship between the data (B and C) and the state variables. How many parameters can reasonably be fixed? How many must be estimated? Obtain some ballpark estimates of the parameters and simulate to see if you can plausibly explain the data as a realization of this model.
Anonymous. 1978. Influenza in a boarding school. British medical journal 1:587.
Bhadra, A., E. L. Ionides, K. Laneri, M. Pascual, M. Bouma, and R. Dhiman. 2011. Malaria in Northwest India: Data analysis via partially observed stochastic differential equation models driven by Lévy noise. Journal of the American Statistical Association 106:440–451.
Bretó, C., D. He, E. L. Ionides, and A. A. King. 2009. Time series analysis via mechanistic models. Annals of Applied Statistics 3:319–348.
Gillespie, D. T. 1977. Exact stochastic simulation of coupled chemical reactions. Journal of Physical Chemistry 81:2340–2361.
Gillespie, D. T. 2001. Approximate accelerated stochastic simulation of chemically reacting systems. Journal of Chemical Physics 115:1716–1733.
He, D., E. L. Ionides, and A. A. King. 2010. Plug-and-play inference for disease dynamics: Measles in large and small populations as a case study. Journal of the Royal Society, Interface 7:271–283.
Keeling, M., and P. Rohani. 2007. Modeling infectious diseases in humans and animals. Princeton University Press, Princeton.
Laneri, K., A. Bhadra, E. L. Ionides, M. Bouma, R. Yadav, R. Dhiman, and M. Pascual. 2010. Forcing versus feedback: Epidemic malaria and monsoon rains in NW India. PLoS Computational Biology 6:e1000898.