pomp
objectThe R codes in this script are available for download. This work is licensed under the Creative Commons attribution-noncommercial license. Please share and remix noncommercially, mentioning its origin.
This document shows how the results of He, Ionides, and King (2010) can be reproduced. Specifically, it implements the model of that paper and computes the likelihood of the data under that model. The intentions are to:
Whilst He et al. (2010) examined measles in 20 towns, we will focus on London alone. The full 20 city dataset is available on the pomp website and the MLEs obtained by He et al. (2010) for the full 20 towns are included in the R code accompanying this document.
To get started, we must install pomp if it is not already installed. The following commands install pomp from source (on github).
library(devtools)
install_github("kingaa/pomp")
If we were to run into trouble in the above, we would consult the package website.
Now we’ll load the packages we’ll need, and set the random seed, to allow reproducibility.
set.seed(594709947L)
library(tidyverse)
library(pomp)
Now we’ll load the data and covariates. The data are measles reports from 20 cities in England and Wales, digitized from the printed public records and graciously provided by Prof. Bryan Grenfell. We also have information on the population sizes and birth-rates in these cities; we’ll treat these variables as covariates.
load("twentycities.rda")
|>
measles mutate(year=as.integer(format(date,"%Y"))) |>
filter(town==TOWN & year>=1950 & year<1964) |>
mutate(time=(julian(date,origin=as.Date("1950-01-01")))/365.25+1950) |>
filter(time>1950 & time<1964) |>
select(time,cases) -> dat
|> filter(town==TOWN) |>
demog select(-town) -> demog
Let’s plot the data and covariates.
|> ggplot(aes(x=time,y=cases))+geom_line() dat
|>
demog pivot_longer(-year) |>
ggplot(aes(x=year,y=value))+geom_point()+
facet_wrap(~name,ncol=1,scales="free_y")
Now we’ll smooth the covariates, storing the results in a new data frame. Note that, for reasons discussed in He et al. (2010), we lag the birthrate by 4 yr.
|>
demog reframe(
time=seq(from=min(year),to=max(year),by=1/12),
pop=predict(smooth.spline(x=year,y=pop),x=time)$y,
birthrate=predict(smooth.spline(x=year+0.5,y=births),x=time-4)$y
covar ) ->
View the covariates:
plot(pop~time,data=covar,type='l')
points(pop~year,data=demog)
plot(birthrate~time,data=covar,type='l')
points(births~year,data=demog)
plot(birthrate~I(time-4),data=covar,type='l')
points(births~I(year+0.5),data=demog)
He et al. (2010) consider an SEIR model with births and deaths, seasonally varying transmission rates, and a cohort effect. Specifically, the transmission rate is high when school is in session and low otherwise. The cohort effect means that some fraction of a birth cohort enters the susceptible pool all at once, on the first day of school. Another important feature of the model is the presence of extra-demographic stochasticity, modeled as multiplicative white noise in the force of infection.
The following Csnippet
implements a single step in the simulation of this model, from time t
to time t+dt
, conditional on the state at time t
.
Csnippet("
rproc <- double beta, br, seas, foi, dw, births;
double rate[6], trans[6];
// cohort effect
if (fabs(t-floor(t)-251.0/365.0) < 0.5*dt)
br = cohort*birthrate/dt + (1-cohort)*birthrate;
else
br = (1.0-cohort)*birthrate;
// term-time seasonality
t = (t-floor(t))*365.25;
if ((t>=7&&t<=100) || (t>=115&&t<=199) || (t>=252&&t<=300) || (t>=308&&t<=356))
seas = 1.0+amplitude*0.2411/0.7589;
else
seas = 1.0-amplitude;
// transmission rate
beta = R0*seas*(1.0-exp(-(gamma+mu)*dt))/dt;
// force of infection
foi = beta*pow(I+iota,alpha)/pop;
// white noise (extra-demographic stochasticity)
dw = rgammawn(sigmaSE,dt);
rate[0] = foi*dw/dt; //infection rate (stochastic)
rate[1] = mu; // natural S death
rate[2] = sigma; // rate of ending of latent stage
rate[3] = mu; // natural E death
rate[4] = gamma; // recovery
rate[5] = mu; // natural I death
// Poisson births
births = rpois(br*dt);
// transitions between classes
reulermultinom(2,S,&rate[0],dt,&trans[0]);
reulermultinom(2,E,&rate[2],dt,&trans[2]);
reulermultinom(2,I,&rate[4],dt,&trans[4]);
S += births - trans[0] - trans[1];
E += trans[0] - trans[2] - trans[3];
I += trans[2] - trans[4] - trans[5];
W += (dw - dt)/sigmaSE; // standardized i.i.d. white noise
C += trans[4]; // true incidence
")
In the above, C
will accumulate the true incidence (modeled as I to R transitions) and W
will accumulate the (standardized) white-noise perturbations.
The following simulates from the initial condition distribution given the parameters S_0
, E_0
, I_0
, and R_0
, which are the fractions of the population in each model compartment at time zero. Notice that these are not assumed to be normalized.
Csnippet("
rinit <- double m = pop/(S_0+E_0+I_0+R_0);
S = nearbyint(m*S_0);
E = nearbyint(m*E_0);
I = nearbyint(m*I_0);
W = 0;
C = 0;
")
He et al. (2010) assume an overdispersed binomial measurement model incorporating both under-reporting and reporting error. The following Csnippet
computes the probability P[cases|C].
Csnippet("
dmeas <- double m = rho*C;
double v = m*(1.0-rho+psi*psi*m);
double tol = 1.0e-18;
if (cases > 0.0) {
lik = pnorm(cases+0.5,m,sqrt(v)+tol,1,0)-pnorm(cases-0.5,m,sqrt(v)+tol,1,0)+tol;
} else {
lik = pnorm(cases+0.5,m,sqrt(v)+tol,1,0)+tol;
}
if (give_log) lik = log(lik);
")
The following simulates from this model.
Csnippet("
rmeas <- double m = rho*C;
double v = m*(1.0-rho+psi*psi*m);
double tol = 1.0e-18;
cases = rnorm(m,sqrt(v)+tol);
if (cases > 0.0) {
cases = nearbyint(cases);
} else {
cases = 0.0;
}
")
The model parameters are assumed to be positive, and some (rho
, cohort
, amplitude
) are assumed to lie between 0 and 1. Maximization of the likelihood is therefore a constrained optimization problem. It is possible to convert the problem into an unconstrained maximization problem by transforming the parameters. In particular, we can log transform the positive parameters, logit transform the parameters constrained to take values in the unit interval, and use a log-barycentric transformation on the initial-value parameters.
pomp
objectThe following constructs a pomp
object encoding the model and data.
|>
dat pomp(t0=with(dat,2*time[1]-time[2]),
time="time",
rprocess=euler(rproc,delta.t=1/365.25),
dmeasure=dmeas,
rmeasure=rmeas,
rinit=rinit,
partrans=parameter_trans(
log=c("mu","sigma","gamma","alpha","iota","psi","sigmaSE","R0"),
logit=c("rho","cohort","amplitude"),
barycentric=c("S_0","E_0","I_0","R_0")
),covar=covariate_table(covar,times="time"),
accumvars=c("C","W"),
statenames=c("S","E","I","C","W"),
paramnames=c("R0","mu","sigma","gamma","alpha","iota",
"rho","sigmaSE","psi","cohort","amplitude",
"S_0","E_0","I_0","R_0")
m1 ) ->
We plot the various components.
|>
m1 as.data.frame() |>
pivot_longer(-time) |>
ggplot(aes(x=time,y=value))+
geom_line()+
facet_grid(name~.,scales="free_y")
We extract the MLE computed by He et al. (2010). See the R code for the MLEs from all 20 cities.
|> filter(town==TOWN) -> mle
mles c("R0","mu","sigma","gamma","alpha","iota",
paramnames <-"rho","sigmaSE","psi","cohort","amplitude",
"S_0","E_0","I_0","R_0")
unlist(mle[paramnames]) theta <-
loglik | sigma | gamma | rho | R0 | amplitude | alpha | iota | cohort | psi | sigmaSE |
---|---|---|---|---|---|---|---|---|---|---|
-3804.9 | 28.9 | 30.4 | 0.488 | 56.8 | 0.554 | 0.976 | 2.9 | 0.557 | 0.116 | 0.0878 |
The following codes compute the likelihood using the particle filter. We use the circumstance and doFuture packages to parallelize the computations on a multicore machine. Note the use of the L’Ecuyer parallel random number generator.
library(doFuture)
library(doRNG)
library(circumstance)
registerDoFuture()
registerDoRNG(998468235L)
plan(multicore)
|>
m1 pfilter(params=theta,Np=10000,Nrep=8) |>
logLik() |>
logmeanexp(se=TRUE,ess=TRUE)
## est se ess
## -3801.0700998 0.2212405 6.0483439
This compares well with that estimated by He et al. (2010).
The following compute and plot some simulations. First, we plot 9 simulations and compare them with the data.
|>
m1 simulate(params=theta,nsim=9,format="d",include.data=TRUE) |>
ggplot(aes(x=time,y=cases,group=.id,color=(.id=="data")))+
guides(color="none")+
geom_line()+facet_wrap(~.id,ncol=2)
The following computes a larger number of simulations and plots the data together with the 5th and 95th percentiles.
|>
m1 simulate(params=theta,nsim=200,format="d",include.data=TRUE) |>
select(time,.id,cases) |>
mutate(
data=if_else(.id=="data","data","simulation")
|>
) group_by(time,data) |>
reframe(
p=c("lo","med","hi"),
q=quantile(cases,prob=c(0.05,0.5,0.95),names=FALSE)
|>
) pivot_wider(names_from=p,values_from=q) |>
ggplot(aes(x=time,y=med,color=data,fill=data,ymin=lo,ymax=hi))+
geom_ribbon(alpha=0.2)
The “Simulation-based Inference for Epidemiological Dynamics” short course discusses this study of measles in much more detail.
Produced using R version 4.4.1 and pomp version 5.10.
He D, Ionides EL, King AA (2010). “Plug-and-Play Inference for Disease Dynamics: Measles in Large and Small Populations as a Case Study.” J R Soc Interface, 7, 271–283. https://doi.org/10.1098/rsif.2009.0151.