Measles profile computation

Important Note: These materials have been superseded by materials provided as part of the as part of the SISMID short course on Simulation-based Inference.

Build the pomp object
Profile over $\sigma_{SE}$
- Initial set of mifs
Back to measles lesson
Back to course homepage
R codes for this document
References

Licensed under the Creative Commons Attribution-NonCommercial license. Please share and remix noncommercially, mentioning its origin.
CC-BY_NC

Produced in R version 3.3.1 using pomp version 1.7.5.1.

Build the `pomp` object

Preliminaries

Load the packages we’ll need, and set the random seed, to allow reproducibility.

set.seed(594709947L)
library(ggplot2)
theme_set(theme_bw())
library(plyr)
library(reshape2)
library(magrittr)
library(pomp)
stopifnot(packageVersion("pomp")>="1.4.5")

Data and covariates

Now we’ll load the data and covariates. The data are measles reports from 20 cities in England and Wales. We also have information on the population sizes and birth-rates in these cities; we’ll treat these variables as covariates.

daturl <- "http://kingaa.github.io/pomp/vignettes/twentycities.rda"
datfile <- file.path(tempdir(),"twentycities.rda")
download.file(daturl,destfile=datfile,mode="wb")
load(datfile)
measles %>% 
  mutate(year=as.integer(format(date,"%Y"))) %>%
  subset(town=="London" & year>=1950 & year<1964) %>%
  mutate(time=(julian(date,origin=as.Date("1950-01-01")))/365.25+1950) %>%
  subset(time>1950 & time<1964, select=c(time,cases)) -> dat
demog %>% subset(town=="London",select=-town) -> demog

demog %>% 
  summarize(
    time=seq(from=min(year),to=max(year),by=1/12),
    pop=predict(smooth.spline(x=year,y=pop),x=time)$y,
    birthrate=predict(smooth.spline(x=year+0.5,y=births),x=time-4)$y
    ) -> covar

The (unobserved) process model

The following implements a simulator.

rproc <- Csnippet("
  double beta, br, seas, foi, dw, births;
  double rate[6], trans[6];
  
  // cohort effect
  if (fabs(t-floor(t)-251.0/365.0) < 0.5*dt) 
    br = cohort*birthrate/dt + (1-cohort)*birthrate;
  else 
    br = (1.0-cohort)*birthrate;

  // term-time seasonality
  t = (t-floor(t))*365.25;
  if ((t>=7&&t<=100) || (t>=115&&t<=199) || (t>=252&&t<=300) || (t>=308&&t<=356))
      seas = 1.0+amplitude*0.2411/0.7589;
    else
      seas = 1.0-amplitude;

  // transmission rate
  beta = R0*(gamma+mu)*seas;
  // expected force of infection
  foi = beta*pow(I+iota,alpha)/pop;
  // white noise (extrademographic stochasticity)
  dw = rgammawn(sigmaSE,dt);

  rate[0] = foi*dw/dt;  // stochastic force of infection
  rate[1] = mu;             // natural S death
  rate[2] = sigma;        // rate of ending of latent stage
  rate[3] = mu;             // natural E death
  rate[4] = gamma;        // recovery
  rate[5] = mu;             // natural I death

  // Poisson births
  births = rpois(br*dt);
  
  // transitions between classes
  reulermultinom(2,S,&rate[0],dt,&trans[0]);
  reulermultinom(2,E,&rate[2],dt,&trans[2]);
  reulermultinom(2,I,&rate[4],dt,&trans[4]);

  S += births   - trans[0] - trans[1];
  E += trans[0] - trans[2] - trans[3];
  I += trans[2] - trans[4] - trans[5];
  R = pop - S - E - I;
  W += (dw - dt)/sigmaSE;  // standardized i.i.d. white noise
  C += trans[4];           // true incidence
")

We complete the process model definition by specifying the distribution of initial unobserved states. The following codes assume that the fraction of the population in each of the four compartments is known.

initlz <- Csnippet("
  double m = pop/(S_0+E_0+I_0+R_0);
  S = nearbyint(m*S_0);
  E = nearbyint(m*E_0);
  I = nearbyint(m*I_0);
  R = nearbyint(m*R_0);
  W = 0;
  C = 0;
")

The measurement model

We’ll model both under-reporting and measurement error. We want $\mathbb{E}[\text{cases}|C] = \rho\,C$ , where $C$ is the true incidence and $0<\rho<1$ is the reporting efficiency. We’ll also assume that $\mathrm{Var}[\text{cases}|C] = \rho\,(1-\rho)\,C + (\psi\,\rho\,C)^2$ , where $\psi$ quantifies overdispersion. Note that when $\psi=0$ , the variance-mean relation is that of the binomial distribution. To be specific, we’ll choose $\text{cases|C} \sim f(\cdot|\rho,\psi,C)$ , where $f(c|\rho,\psi,C) = \Phi(c+\tfrac{1}{2},\rho\,C,\rho\,(1-\rho)\,C+(\psi\,\rho\,C)^2)-\Phi(c-\tfrac{1}{2},\rho\,C,\rho\,(1-\rho)\,C+(\psi\,\rho\,C)^2),$ where $\Phi(x,\mu,\sigma^2)$ is the c.d.f. of the normal distribution with mean $\mu$ and variance $\sigma^2$ .

The following computes $\mathbb{P}[\text{cases}|C]$ .

dmeas <- Csnippet("
  double m = rho*C;
  double v = m*(1.0-rho+psi*psi*m);
  double tol = 1.0e-18;
  if (cases > 0.0) {
      lik = pnorm(cases+0.5,m,sqrt(v)+tol,1,0)-pnorm(cases-0.5,m,sqrt(v)+tol,1,0)+tol;
  } else {
    lik = pnorm(cases+0.5,m,sqrt(v)+tol,1,0)+tol;
  }
")

The following codes simulate $\text{cases} | C$ .

rmeas <- Csnippet("
  double m = rho*C;
  double v = m*(1.0-rho+psi*psi*m);
  double tol = 1.0e-18;
  cases = rnorm(m,sqrt(v)+tol);
  if (cases > 0.0) {
    cases = nearbyint(cases);
  } else {
    cases = 0.0;
  }
")

Parameter transformations

The parameters are constrained to be positive, and some of them are constrained to lie between $0$ and $1$ . We can turn the likelihood maximization problem into an unconstrained maximization problem by transforming the parameters. The following Csnippets implement such a transformation and its inverse.

toEst <- Csnippet("
  Tsigma = log(sigma);
  Tgamma = log(gamma);
  Tcohort = logit(cohort);
  Tamplitude = logit(amplitude);
  TsigmaSE = log(sigmaSE);
  Tpsi = log(psi);
  TR0 = log(R0);
  to_log_barycentric (&TS_0, &S_0, 4);
")

fromEst <- Csnippet("
 Tsigma = exp(sigma);
  Tgamma = exp(gamma);
  Tcohort = expit(cohort);
  Tamplitude = expit(amplitude);
  TsigmaSE = exp(sigmaSE);
  Tpsi = exp(psi);
  TR0 = exp(R0);
  from_log_barycentric (&TS_0, &S_0, 4);
")

ML point estimates

He et al. (2010) estimated the parameters of this model. The full set is included in the R code accompanying this document, where they are read into a data frame called mles.

mles %>% subset(town=="London") -> mle
paramnames <- c("R0","mu","sigma","gamma","alpha","iota",
                "rho","sigmaSE","psi","cohort","amplitude",
                "S_0","E_0","I_0","R_0")
mle %>% extract(paramnames) %>% unlist() -> theta

Construct and verify the `pomp` object

dat %>% 
  pomp(t0=with(dat,2*time[1]-time[2]),
       time="time",
       params=theta,
       rprocess=euler.sim(rproc,delta.t=1/365.25),
       initializer=initlz,
       dmeasure=dmeas,
       rmeasure=rmeas,
       toEstimationScale=toEst,
       fromEstimationScale=fromEst,
       covar=covar,
       tcovar="time",
       zeronames=c("C","W"),
       statenames=c("S","E","I","R","C","W"),
       paramnames=c("R0","mu","sigma","gamma","alpha","iota",
                    "rho","sigmaSE","psi","cohort","amplitude",
                    "S_0","E_0","I_0","R_0")
  ) -> m1

plot(simulate(m1))

Profile over $\sigma_{SE}$

Initial set of mifs

To compute a likelihood profile over a given parameter (or set of parameters) across some range, we first construct a grid of points spanning that range. At each point in the grid, we fix the focal parameter (or set of parameters) at that value and maximize the likelihood over the remaining parameters. To ensure that this optimization is global, we initiate multiple optimizers at a variety of points across the space. The pomp function profileDesign is useful in constructing such a set of starting points for the optimization.

The following code constructs a data frame, each row of which is a starting point for an optimization. We will be profiling over $\sigma_SE$ (sigmaSE in the code), fixing $\mu=0.02$ and $\alpha=1$ . To simplify the calculation still further, we will hold $\rho$ and $\iota$ at their ML point estimates.

estpars <- setdiff(names(theta),c("sigmaSE","mu","alpha","rho","iota"))

theta["alpha"] <- 1

theta.t <- partrans(m1,theta,"toEstimationScale")

theta.t.hi <- theta.t.lo <- theta.t
theta.t.lo[estpars] <- theta.t[estpars]-log(2)
theta.t.hi[estpars] <- theta.t[estpars]+log(2)

profileDesign(
  sigmaSE=seq(from=log(0.02),to=log(0.2),length=20),
  lower=theta.t.lo,upper=theta.t.hi,nprof=40
) -> pd

dim(pd)

## [1] 800  16

pd <- as.data.frame(t(partrans(m1,t(pd),"fromEstimationScale")))

pairs(~sigmaSE+R0+mu+sigma+gamma+S_0+E_0,data=pd)

pomp provides two functions, bake and stew, that save the results of expensive computations. We’ll run the profile computations on a cluster using MPI. Note that again, care must be taken with the parallel random number generator.

bake("sigmaSE-profile1.rds",{
 
  foreach (p=iter(pd,"row"),
           .combine=rbind,
           .errorhandling="remove",
           .inorder=FALSE,
           .options.mpi=list(chunkSize=1,seed=1598260027L,info=TRUE)
  ) %dopar% {
    
    tic <- Sys.time()
    
    library(magrittr)
    library(plyr)
    library(reshape2)
    library(pomp)
    
    options(stringsAsFactors=FALSE)
    
    dat %>% 
      pomp(t0=with(dat,2*time[1]-time[2]),
           time="time",
           rprocess=euler.sim(rproc,delta.t=1/365.25),
           initializer=initlz,
           dmeasure=dmeas,
           rmeasure=rmeas,
           toEstimationScale=toEst,
           fromEstimationScale=fromEst,
           covar=covar,
           tcovar="time",
           zeronames=c("C","W"),
           statenames=c("S","E","I","R","C","W"),
           paramnames=c("R0","mu","sigma","gamma","alpha","iota",
                        "rho","sigmaSE","psi","cohort","amplitude",
                        "S_0","E_0","I_0","R_0")
      ) %>% 
      mif2(start = unlist(p),
           Nmif = 50, 
           rw.sd = rw.sd(
             R0=0.02,sigma=0.02,gamma=0.02,psi=0.02,cohort=0.02,amplitude=0.02,
             S_0=ivp(0.02),E_0=ivp(0.02),I_0=ivp(0.02),R_0=ivp(0.02)),
           Np = 1000,
           cooling.type = "geometric",
           cooling.fraction.50 = 0.1,
           transform = TRUE) %>%
      mif2() -> mf
    
    ## Runs 10 particle filters to assess Monte Carlo error in likelihood
    pf <- replicate(10, pfilter(mf, Np = 2000))
    ll <- sapply(pf,logLik)
    ll <- logmeanexp(ll, se = TRUE)
    nfail <- sapply(pf,getElement,"nfail")
    
    toc <- Sys.time()
    etime <- toc-tic
    units(etime) <- "hours"

    data.frame(as.list(coef(mf)),
               loglik = ll[1],
               loglik.se = ll[2],
               nfail.min = min(nfail),
               nfail.max = max(nfail),
               etime = as.numeric(etime))
  }
}) -> sigmaSE_prof

The preceding calculations took 333.9 cpu hr, or about 13 cpu sec per iteration per 1000 particles. Let’s examine the results.

pairs(~loglik+sigmaSE+R0+I(1/gamma)+I(1/sigma)+psi+log(cohort),
      data=sigmaSE_prof,subset=loglik>max(loglik)-100)

summary(sigmaSE_prof)

##     sigmaSE              R0               mu           sigma         
##  Min.   :0.02000   Min.   : 14.15   Min.   :0.02   Min.   :    11.8  
##  1st Qu.:0.03561   1st Qu.: 27.94   1st Qu.:0.02   1st Qu.:    24.1  
##  Median :0.06336   Median : 36.88   Median :0.02   Median :    39.3  
##  Mean   :0.07986   Mean   : 41.14   Mean   :0.02   Mean   :   810.0  
##  3rd Qu.:0.11263   3rd Qu.: 50.95   3rd Qu.:0.02   3rd Qu.:    70.1  
##  Max.   :0.20000   Max.   :121.01   Max.   :0.02   Max.   :528898.1  
##      gamma            alpha        iota          rho       
##  Min.   : 12.53   Min.   :1   Min.   :2.9   Min.   :0.488  
##  1st Qu.: 24.22   1st Qu.:1   1st Qu.:2.9   1st Qu.:0.488  
##  Median : 33.84   Median :1   Median :2.9   Median :0.488  
##  Mean   : 43.93   Mean   :1   Mean   :2.9   Mean   :0.488  
##  3rd Qu.: 52.05   3rd Qu.:1   3rd Qu.:2.9   3rd Qu.:0.488  
##  Max.   :383.05   Max.   :1   Max.   :2.9   Max.   :0.488  
##    sigmaSE.1            psi              cohort           amplitude      
##  Min.   :0.02000   Min.   :0.07468   Min.   :0.002423   Min.   :0.09874  
##  1st Qu.:0.03561   1st Qu.:0.10549   1st Qu.:0.291688   1st Qu.:0.30148  
##  Median :0.06336   Median :0.11444   Median :0.531590   Median :0.44209  
##  Mean   :0.07986   Mean   :0.11583   Mean   :0.546576   Mean   :0.49508  
##  3rd Qu.:0.11263   3rd Qu.:0.12440   3rd Qu.:0.837886   3rd Qu.:0.66996  
##  Max.   :0.20000   Max.   :0.18011   Max.   :0.999159   Max.   :0.99945  
##       S_0                E_0                 I_0           
##  Min.   :0.009043   Min.   :1.773e-05   Min.   :1.599e-05  
##  1st Qu.:0.024561   1st Qu.:3.704e-05   1st Qu.:3.776e-05  
##  Median :0.030002   Median :5.110e-05   Median :4.925e-05  
##  Mean   :0.032142   Mean   :5.384e-05   Mean   :5.165e-05  
##  3rd Qu.:0.037160   3rd Qu.:6.604e-05   3rd Qu.:6.298e-05  
##  Max.   :0.089919   Max.   :1.271e-04   Max.   :1.178e-04  
##       R_0             loglik        loglik.se          nfail.min     
##  Min.   :0.9100   Min.   :-6649   Min.   :  0.1567   Min.   : 0.000  
##  1st Qu.:0.9627   1st Qu.:-3878   1st Qu.:  0.6434   1st Qu.: 0.000  
##  Median :0.9699   Median :-3843   Median :  1.0945   Median : 0.000  
##  Mean   :0.9678   Mean   :-4073   Mean   :  4.6511   Mean   : 6.436  
##  3rd Qu.:0.9753   3rd Qu.:-3825   3rd Qu.:  3.1078   3rd Qu.: 0.000  
##  Max.   :0.9909   Max.   :-3804   Max.   :115.9717   Max.   :82.000  
##    nfail.max          etime       
##  Min.   : 0.000   Min.   :0.3796  
##  1st Qu.: 0.000   1st Qu.:0.4080  
##  Median : 0.000   Median :0.4176  
##  Mean   : 6.838   Mean   :0.4174  
##  3rd Qu.: 0.000   3rd Qu.:0.4277  
##  Max.   :84.000   Max.   :0.4508

Next, we’ll skim off the top 20 likelihoods for each value of the $\sigma_{SE}$ parameter. We’ll put these through another round of miffing.

sigmaSE_prof %>%
  mutate(sigmaSE=exp(signif(log(sigmaSE),5))) %>%
  ddply(~sigmaSE,subset,rank(-loglik)<=20) %>%
  subset(nfail.max==0,select=paramnames) -> pd

bake("sigmaSE-profile2.rds",{
  
  foreach (p=iter(pd,"row"),
           .combine=rbind,
           .errorhandling="remove",
           .inorder=FALSE,
           .options.mpi=list(chunkSize=1,seed=1598260027L,info=TRUE)
  ) %dopar% {
    
    tic <- Sys.time()
    
    library(magrittr)
    library(plyr)
    library(reshape2)
    library(pomp)
    
    options(stringsAsFactors=FALSE)

    dat %>% 
      pomp(t0=with(dat,2*time[1]-time[2]),
           time="time",
           rprocess=euler.sim(rproc,delta.t=1/365.25),
           initializer=initlz,
           dmeasure=dmeas,
           rmeasure=rmeas,
           toEstimationScale=toEst,
           fromEstimationScale=fromEst,
           covar=covar,
           tcovar="time",
           zeronames=c("C","W"),
           statenames=c("S","E","I","R","C","W"),
           paramnames=c("R0","mu","sigma","gamma","alpha","iota",
                        "rho","sigmaSE","psi","cohort","amplitude",
                        "S_0","E_0","I_0","R_0")
      ) %>% 
      mif2(start = unlist(p),
           Nmif = 50, 
           rw.sd = rw.sd(
             R0=0.02,sigma=0.02,gamma=0.02,psi=0.02,cohort=0.02,amplitude=0.02,
             S_0=ivp(0.02),E_0=ivp(0.02),I_0=ivp(0.02),R_0=ivp(0.02)),
           Np = 5000,
           cooling.type = "geometric",
           cooling.fraction.50 = 0.1,
           transform = TRUE) %>%
      mif2() -> mf
    
    ## Runs 10 particle filters to assess Monte Carlo error in likelihood
    pf <- replicate(10, pfilter(mf, Np = 5000))
    ll <- sapply(pf,logLik)
    ll <- logmeanexp(ll, se = TRUE)
    nfail <- sapply(pf,getElement,"nfail")
    
    toc <- Sys.time()
    etime <- toc-tic
    units(etime) <- "hours"
 
    data.frame(as.list(coef(mf)),
               loglik = ll[1],
               loglik.se = ll[2],
               nfail.min = min(nfail),
               nfail.max = max(nfail),
               etime = as.numeric(etime))
  }
}) -> sigmaSE_prof

The preceding calculations took 758.8 cpu hr, or about 12 cpu sec per iteration per 1000 particles.

sigmaSE_prof %<>%
  subset(nfail.max==0) %>%
  mutate(sigmaSE=exp(signif(log(sigmaSE),5))) %>%
  ddply(~sigmaSE,subset,rank(-loglik)<=2)

sigmaSE_prof %>%
  ggplot(aes(x=sigmaSE,y=loglik))+
  geom_point()+
  geom_smooth(method="loess")

Plot profile traces, which show the trade-off between intensity of extra-demographic stochasticity and duration of the infectious and latent periods.

pairs(~loglik+sigmaSE+R0+I(1/gamma)+I(1/sigma),
      data=sigmaSE_prof)

References

He, D., E. L. Ionides, and A. A. King. 2010. Plug-and-play inference for disease dynamics: Measles in large and small populations as a case study. Journal of the Royal Society, Interface 7:271–283.

Measles profile computation

Aaron A. King

Build the pomp object

Preliminaries

Data and covariates

The (unobserved) process model

The measurement model

Parameter transformations

ML point estimates

Construct and verify the pomp object

Profile over σSE\sigma_{SE}

Initial set of mifs

Back to measles lesson

Back to course homepage

R codes for this document

References

Build the `pomp` object

Construct and verify the `pomp` object

Profile over $\sigma_{SE}$