The R codes in this script are available for download. This work is licensed under the Creative Commons attribution-noncommercial license. Please share and remix noncommercially, mentioning its origin.
CC-BY_NC

Introduction

This document shows how the results of He, Ionides, and King (2010) can be reproduced. Specifically, it implements the model of that paper and computes the likelihood of the data under that model. The intentions are to:

  • facilitate new data analyses based on variants of this model by showing how it is implemented in pomp
  • establish baseline likelihoods for these data against which future analyses can be compared
  • illustrate the construction of a complex model in pomp

Whilst He et al. (2010) examined measles in 20 towns, we will focus on London alone. The full 20 city dataset is available on the pomp website and the MLEs obtained by He et al. (2010) for the full 20 towns are included in the R code accompanying this document.

Preliminaries

To get started, we must install pomp if it is not already installed. The following commands install pomp from source (on github).

library(devtools)
install_github("kingaa/pomp")

If we were to run into trouble in the above, we would consult the package website.

Now we’ll load the packages we’ll need, and set the random seed, to allow reproducibility.

set.seed(594709947L)
library(tidyverse)
library(pomp)

Data and covariates

Now we’ll load the data and covariates. The data are measles reports from 20 cities in England and Wales, digitized from the printed public records and graciously provided by Prof. Bryan Grenfell. We also have information on the population sizes and birth-rates in these cities; we’ll treat these variables as covariates.

load("twentycities.rda")
measles |> 
  mutate(year=as.integer(format(date,"%Y"))) |>
  filter(town==TOWN & year>=1950 & year<1964) |>
  mutate(time=(julian(date,origin=as.Date("1950-01-01")))/365.25+1950) |>
  filter(time>1950 & time<1964) |>
  select(time,cases) -> dat
demog |> filter(town==TOWN) |>
  select(-town) -> demog

Let’s plot the data and covariates.

dat |> ggplot(aes(x=time,y=cases))+geom_line()

demog |> 
  pivot_longer(-year) |>
  ggplot(aes(x=year,y=value))+geom_point()+
  facet_wrap(~name,ncol=1,scales="free_y")

Now we’ll smooth the covariates, storing the results in a new data frame. Note that, for reasons discussed in He et al. (2010), we lag the birthrate by 4 yr.

demog |> 
  reframe(
    time=seq(from=min(year),to=max(year),by=1/12),
    pop=predict(smooth.spline(x=year,y=pop),x=time)$y,
    birthrate=predict(smooth.spline(x=year+0.5,y=births),x=time-4)$y
  ) -> covar

View the covariates:

plot(pop~time,data=covar,type='l')
points(pop~year,data=demog)

plot(birthrate~time,data=covar,type='l')
points(births~year,data=demog)

plot(birthrate~I(time-4),data=covar,type='l')
points(births~I(year+0.5),data=demog)

The process model

He et al. (2010) consider an SEIR model with births and deaths, seasonally varying transmission rates, and a cohort effect. Specifically, the transmission rate is high when school is in session and low otherwise. The cohort effect means that some fraction of a birth cohort enters the susceptible pool all at once, on the first day of school. Another important feature of the model is the presence of extra-demographic stochasticity, modeled as multiplicative white noise in the force of infection.

The following Csnippet implements a single step in the simulation of this model, from time t to time t+dt, conditional on the state at time t.

rproc <- Csnippet("
  double beta, br, seas, foi, dw, births;
  double rate[6], trans[6];
  
  // cohort effect
  if (fabs(t-floor(t)-251.0/365.0) < 0.5*dt) 
    br = cohort*birthrate/dt + (1-cohort)*birthrate;
  else 
    br = (1.0-cohort)*birthrate;

  // term-time seasonality
  t = (t-floor(t))*365.25;
  if ((t>=7&&t<=100) || (t>=115&&t<=199) || (t>=252&&t<=300) || (t>=308&&t<=356))
      seas = 1.0+amplitude*0.2411/0.7589;
    else
      seas = 1.0-amplitude;

  // transmission rate
  beta = R0*seas*(1.0-exp(-(gamma+mu)*dt))/dt;
  // force of infection
  foi = beta*pow(I+iota,alpha)/pop;
  // white noise (extra-demographic stochasticity)
  dw = rgammawn(sigmaSE,dt);

  rate[0] = foi*dw/dt;  //infection rate (stochastic)
  rate[1] = mu;             // natural S death
  rate[2] = sigma;        // rate of ending of latent stage
  rate[3] = mu;             // natural E death
  rate[4] = gamma;        // recovery
  rate[5] = mu;             // natural I death

  // Poisson births
  births = rpois(br*dt);
  
  // transitions between classes
  reulermultinom(2,S,&rate[0],dt,&trans[0]);
  reulermultinom(2,E,&rate[2],dt,&trans[2]);
  reulermultinom(2,I,&rate[4],dt,&trans[4]);

  S += births   - trans[0] - trans[1];
  E += trans[0] - trans[2] - trans[3];
  I += trans[2] - trans[4] - trans[5];
  W += (dw - dt)/sigmaSE;  // standardized i.i.d. white noise
  C += trans[4];           // true incidence
")

In the above, C will accumulate the true incidence (modeled as I to R transitions) and W will accumulate the (standardized) white-noise perturbations.

The following simulates from the initial condition distribution given the parameters S_0, E_0, I_0, and R_0, which are the fractions of the population in each model compartment at time zero. Notice that these are not assumed to be normalized.

rinit <- Csnippet("
  double m = pop/(S_0+E_0+I_0+R_0);
  S = nearbyint(m*S_0);
  E = nearbyint(m*E_0);
  I = nearbyint(m*I_0);
  W = 0;
  C = 0;
")

The measurement model

He et al. (2010) assume an overdispersed binomial measurement model incorporating both under-reporting and reporting error. The following Csnippet computes the probability \(\mathbb{P}[\text{cases} | C]\).

dmeas <- Csnippet("
  double m = rho*C;
  double v = m*(1.0-rho+psi*psi*m);
  double tol = 1.0e-18;
  if (cases > 0.0) {
      lik = pnorm(cases+0.5,m,sqrt(v)+tol,1,0)-pnorm(cases-0.5,m,sqrt(v)+tol,1,0)+tol;
  } else {
    lik = pnorm(cases+0.5,m,sqrt(v)+tol,1,0)+tol;
  }
  if (give_log) lik = log(lik);
")

The following simulates from this model.

rmeas <- Csnippet("
  double m = rho*C;
  double v = m*(1.0-rho+psi*psi*m);
  double tol = 1.0e-18;
  cases = rnorm(m,sqrt(v)+tol);
  if (cases > 0.0) {
    cases = nearbyint(cases);
  } else {
    cases = 0.0;
  }
")

Parameter transformations

The model parameters are assumed to be positive, and some (rho, cohort, amplitude) are assumed to lie between 0 and 1. Maximization of the likelihood is therefore a constrained optimization problem. It is possible to convert the problem into an unconstrained maximization problem by transforming the parameters. In particular, we can log transform the positive parameters, logit transform the parameters constrained to take values in the unit interval, and use a log-barycentric transformation on the initial-value parameters.

Constructing the pomp object

The following constructs a pomp object encoding the model and data.

dat |> 
  pomp(t0=with(dat,2*time[1]-time[2]),
    time="time",
    rprocess=euler(rproc,delta.t=1/365.25),
    dmeasure=dmeas,
    rmeasure=rmeas,
    rinit=rinit,
    partrans=parameter_trans(
      log=c("mu","sigma","gamma","alpha","iota","psi","sigmaSE","R0"),
      logit=c("rho","cohort","amplitude"),
      barycentric=c("S_0","E_0","I_0","R_0")
    ),
    covar=covariate_table(covar,times="time"),
    accumvars=c("C","W"),
    statenames=c("S","E","I","C","W"),
    paramnames=c("R0","mu","sigma","gamma","alpha","iota",
      "rho","sigmaSE","psi","cohort","amplitude",
      "S_0","E_0","I_0","R_0")
  ) -> m1

We plot the various components.

m1 |> 
  as.data.frame() |> 
  pivot_longer(-time) |>
  ggplot(aes(x=time,y=value))+
  geom_line()+
  facet_grid(name~.,scales="free_y")

Likelihood computation

We extract the MLE computed by He et al. (2010). See the R code for the MLEs from all 20 cities.

mles |> filter(town==TOWN) -> mle
paramnames <- c("R0","mu","sigma","gamma","alpha","iota",
  "rho","sigmaSE","psi","cohort","amplitude",
  "S_0","E_0","I_0","R_0")
theta <- unlist(mle[paramnames])
loglik sigma gamma rho R0 amplitude alpha iota cohort psi sigmaSE
-3804.9 28.9 30.4 0.488 56.8 0.554 0.976 2.9 0.557 0.116 0.0878

The following codes compute the likelihood using the particle filter. We use the circumstance and doFuture packages to parallelize the computations on a multicore machine. Note the use of the L’Ecuyer parallel random number generator.

library(doFuture)
library(doRNG)
library(circumstance)
registerDoFuture()
registerDoRNG(998468235L)
plan(multicore)

m1 |>
  pfilter(params=theta,Np=10000,Nrep=8) |>
  logLik() |>
  logmeanexp(se=TRUE,ess=TRUE)
##           est            se           ess 
## -3801.5663773     0.3062983     5.0946501

This compares well with that estimated by He et al. (2010).

Simulations

The following compute and plot some simulations. First, we plot 9 simulations and compare them with the data.

m1 |> 
  simulate(params=theta,nsim=9,format="d",include.data=TRUE) |>
  ggplot(aes(x=time,y=cases,group=.id,color=(.id=="data")))+
  guides(color="none")+
  geom_line()+facet_wrap(~.id,ncol=2)

The following computes a larger number of simulations and plots the data together with the 5th and 95th percentiles.

m1 |> 
  simulate(params=theta,nsim=200,format="d",include.data=TRUE) |>
  select(time,.id,cases) |>
  mutate(
    data=if_else(.id=="data","data","simulation")
  ) |>
  group_by(time,data) |>
  reframe(
    p=c("lo","med","hi"),
    q=quantile(cases,prob=c(0.05,0.5,0.95),names=FALSE)
  ) |>
  pivot_wider(names_from=p,values_from=q) |>
  ggplot(aes(x=time,y=med,color=data,fill=data,ymin=lo,ymax=hi))+
  geom_ribbon(alpha=0.2)

For more…

The “Simulation-based Inference for Epidemiological Dynamics” short course discusses this study of measles in much more detail.


Produced using R version 4.3.3 and pomp version 5.7.1.0.


References

He D, Ionides EL, King AA (2010). “Plug-and-Play Inference for Disease Dynamics: Measles in Large and Small Populations as a Case Study.” J R Soc Interface, 7, 271–283. https://doi.org/10.1098/rsif.2009.0151.


pomp documentation index
pomp manual
pomp homepage
Top  Back  Close