pomp
objectpomp
objectpomp
objectWe begin by loading the packages we’ll need and setting the random seed, to allow reproducibility.
stopifnot(getRversion()>="4.1")
stopifnot(packageVersion("pomp")>="4.6")
set.seed(594709947L)
library(tidyverse)
library(pomp)
Now we’ll load the data and covariates. The data are measles reports from 20 cities in England and Wales. We also have information on the population sizes and birth-rates in these cities; we’ll treat these variables as covariates.
"https://kingaa.github.io/pomp/vignettes/twentycities.rda"
daturl <- file.path(tempdir(),"twentycities.rda")
datfile <-download.file(daturl,destfile=datfile,mode="wb")
load(datfile)
|>
measles mutate(year=as.integer(format(date,"%Y"))) |>
filter(town=="London" & year>=1950 & year<1964) |>
mutate(
time=(julian(date,origin=as.Date("1950-01-01")))/365.25+1950
|>
) filter(time>1950 & time<1964) |>
select(time,cases) -> dat
|>
demog filter(town=="London") |>
select(-town) -> demogLondon
|>
demogLondon summarize(
time=seq(from=min(year),to=max(year),by=1/12),
pop=predict(smooth.spline(x=year,y=pop),x=time)$y,
birthrate=predict(smooth.spline(x=year+0.5,y=births),x=time-4)$y,
birthrate1=predict(smooth.spline(x=year+0.5,y=births),x=time)$y
covar1
) ->
|>
covar1 select(-birthrate1) -> covar
The following implements a simulator.
Csnippet("
rproc <- double beta, br, seas, foi, dw, births;
double rate[6], trans[6];
// cohort effect
if (fabs(t-floor(t)-251.0/365.0) < 0.5*dt)
br = cohort*birthrate/dt + (1-cohort)*birthrate;
else
br = (1.0-cohort)*birthrate;
// term-time seasonality
t = (t-floor(t))*365.25;
if ((t>=7 && t<=100) ||
(t>=115 && t<=199) ||
(t>=252 && t<=300) ||
(t>=308 && t<=356))
seas = 1.0+amplitude*0.2411/0.7589;
else
seas = 1.0-amplitude;
// transmission rate
beta = R0*(gamma+mu)*seas;
// expected force of infection
foi = beta*pow(I+iota,alpha)/pop;
// white noise (extrademographic stochasticity)
dw = rgammawn(sigmaSE,dt);
rate[0] = foi*dw/dt; // stochastic force of infection
rate[1] = mu; // natural S death
rate[2] = sigma; // rate of ending of latent stage
rate[3] = mu; // natural E death
rate[4] = gamma; // recovery
rate[5] = mu; // natural I death
// Poisson births
births = rpois(br*dt);
// transitions between classes
reulermultinom(2, S, &rate[0], dt, &trans[0]);
reulermultinom(2, E, &rate[2], dt, &trans[2]);
reulermultinom(2, I, &rate[4], dt, &trans[4]);
S += births - trans[0] - trans[1];
E += trans[0] - trans[2] - trans[3];
I += trans[2] - trans[4] - trans[5];
R = pop - S - E - I;
W += (dw - dt)/sigmaSE; // standardized i.i.d. white noise
C += trans[4]; // true incidence
")
We complete the process model definition by specifying the distribution of initial unobserved states. The following codes assume that the fraction of the population in each of the four compartments is known.
Csnippet("
rinit <- double m = pop/(S_0+E_0+I_0+R_0);
S = nearbyint(m*S_0);
E = nearbyint(m*E_0);
I = nearbyint(m*I_0);
R = nearbyint(m*R_0);
W = 0;
C = 0;
")
We’ll model both under-reporting and measurement error. We want E[cases|C]=ρC, where C is the true incidence and 0<ρ<1 is the reporting efficiency. We’ll also assume that Var[cases|C]=ρ(1−ρ)C+(ψρC)2, where ψ quantifies overdispersion. Note that when ψ=0, the variance-mean relation is that of the binomial distribution. To be specific, we’ll choose cases|C∼f(⋅|ρ,ψ,C), where f(c|ρ,ψ,C)=Φ(c+12,ρC,ρ(1−ρ)C+(ψρC)2)−Φ(c−12,ρC,ρ(1−ρ)C+(ψρC)2), where Φ(x,μ,σ2) is the c.d.f. of the normal distribution with mean μ and variance σ2.
The following computes P[cases|C].
Csnippet("
dmeas <- double m = rho*C;
double v = m*(1.0-rho+psi*psi*m);
double tol = 0.0;
if (cases > 0.0) {
lik = pnorm(cases+0.5,m,sqrt(v)+tol,1,0)
- pnorm(cases-0.5,m,sqrt(v)+tol,1,0) + tol;
} else {
lik = pnorm(cases+0.5,m,sqrt(v)+tol,1,0) + tol;
}
if (give_log) lik = log(lik);
")
The following codes simulate cases|C.
Csnippet("
rmeas <- double m = rho*C;
double v = m*(1.0-rho+psi*psi*m);
double tol = 0.0;
cases = rnorm(m,sqrt(v)+tol);
if (cases > 0.0) {
cases = nearbyint(cases);
} else {
cases = 0.0;
}
")
The parameters are constrained to be positive, and some of them are constrained to lie between 0 and 1. We can turn the likelihood maximization problem into an unconstrained maximization problem by transforming the parameters. The following Csnippets implement such a transformation and its inverse.
parameter_trans(
pt <-log=c("sigma","gamma","sigmaSE","psi","R0"),
logit=c("cohort","amplitude"),
barycentric=c("S_0","E_0","I_0","R_0")
)
He, Ionides, and King (2010) estimated the parameters of this model. The full set is included in the R code accompanying this document, where they are read into a data frame called mles
.
|> filter(town=="London") -> mle
mles c("R0","mu","sigma","gamma","alpha","iota",
paramnames <-"rho","sigmaSE","psi","cohort","amplitude",
"S_0","E_0","I_0","R_0")
|> unlist() -> theta
mle[paramnames] |> select(-S_0,-E_0,-I_0,-R_0) |> as.data.frame() mle
town loglik loglik.sd mu delay sigma gamma rho R0
London -3804.9 0.16 0.02 4 28.9 30.4 0.488 56.8
amplitude alpha iota cohort psi sigmaSE
0.554 0.976 2.9 0.557 0.116 0.0878
pomp
object|>
dat pomp(t0=with(dat,2*time[1]-time[2]),
time="time",
params=theta,
rprocess=euler(rproc,delta.t=1/365.25),
rinit=rinit,
dmeasure=dmeas,
rmeasure=rmeas,
partrans=pt,
covar=covariate_table(covar,times="time"),
accumvars=c("C","W"),
statenames=c("S","E","I","R","C","W"),
paramnames=c("R0","mu","sigma","gamma","alpha","iota",
"rho","sigmaSE","psi","cohort","amplitude",
"S_0","E_0","I_0","R_0")
m1 ) ->
plot(simulate(m1))
To compute a likelihood profile over a given parameter (or set of parameters) across some range, we first construct a grid of points spanning that range. At each point in the grid, we fix the focal parameter (or set of parameters) at that value and maximize the likelihood over the remaining parameters. To ensure that this optimization is global, we initiate multiple optimizers at a variety of points across the space. The pomp function profile_design
is useful in constructing such a set of starting points for the optimization.
The following code constructs a data frame, each row of which is a starting point for an optimization. We will be profiling over σSE (sigmaSE
in the code), fixing μ=0.02 and α=1. To simplify the calculation still further, we will hold ρ and ι at their ML point estimates.
setdiff(names(theta),c("sigmaSE","mu","alpha","rho","iota"))
estpars <-
"alpha"] <- 1
theta[
partrans(m1,theta,"toEst")
theta.t <-
theta.t.lo <- theta.t
theta.t.hi <- theta.t[estpars]-log(2)
theta.t.lo[estpars] <- theta.t[estpars]+log(2)
theta.t.hi[estpars] <-
profile_design(
sigmaSE=seq(from=log(0.02),to=log(0.2),length=20),
lower=theta.t.lo,
upper=theta.t.hi,
nprof=40
pd
) ->
dim(pd)
[1] 800 16
as.data.frame(t(partrans(m1,t(pd),"fromEst")))
pd <-
pairs(~sigmaSE+R0+mu+sigma+gamma+S_0+E_0,data=pd)
pomp provides two functions, bake
and stew
, that save the results of expensive computations. We’ll run the profile computations in parallel. Note that again, care must be taken with the parallel random number generator.
bake("sigmaSE-profile1.rds",{
foreach (
p=iter(pd,"row"),
.combine=bind_rows, .errorhandling="remove",
.options.future=list(seed=1598260027L)
%dofuture% {
)
Sys.time()
tic <-
|>
m1 mif2(
params=p,
Nmif = 50,
rw.sd = rw_sd(
R0=0.02,sigma=0.02,gamma=0.02,psi=0.02,cohort=0.02,amplitude=0.02,
S_0=ivp(0.02),E_0=ivp(0.02),I_0=ivp(0.02),R_0=ivp(0.02)),
Np = 1000,
cooling.type = "geometric",
cooling.fraction.50 = 0.1
|>
) mif2() -> mf
## Runs 10 particle filters to assess Monte Carlo error in likelihood
replicate(10, pfilter(mf, Np = 2000))
pf <- sapply(pf,logLik)
ll <- logmeanexp(ll, se = TRUE)
ll <-
Sys.time()
toc <- toc-tic
etime <-units(etime) <- "hours"
data.frame(
as.list(coef(mf)),
loglik = ll[1],
loglik.se = ll[2],
etime = as.numeric(etime)
)
}|>
}) filter(is.finite(loglik)) -> sigmaSE_prof
The preceding calculations took 183.6 cpu hr, or about 7.4 cpu sec per iteration per 1000 particles. Let’s examine the results.
pairs(~loglik+sigmaSE+R0+I(1/gamma)+I(1/sigma)+psi+log(cohort),
data=sigmaSE_prof,subset=loglik>max(loglik)-100)
Next, we’ll skim off the top 20 likelihoods for each value of the σSE parameter. We’ll put these through another round of miffing.
|>
sigmaSE_prof mutate(sigmaSE=exp(signif(log(sigmaSE),5))) |>
group_by(sigmaSE) |>
filter(rank(-loglik)<=20) |>
ungroup() -> pd
bake("sigmaSE-profile2.rds",{
foreach (p=iter(pd,"row"),
.combine=rbind, .errorhandling="remove",
.options.future=list(seed=915963734L)
%dofuture% {
)
Sys.time()
tic <-
|>
m1 mif2(
params = p,
Nmif = 50,
rw.sd = rw_sd(
R0=0.02,sigma=0.02,gamma=0.02,psi=0.02,cohort=0.02,amplitude=0.02,
S_0=ivp(0.02),E_0=ivp(0.02),I_0=ivp(0.02),R_0=ivp(0.02)),
Np = 5000,
cooling.fraction.50 = 0.1
|>
) mif2() -> mf
replicate(10, pfilter(mf, Np = 5000))
pf <- sapply(pf,logLik)
ll <- logmeanexp(ll, se = TRUE)
ll <-
Sys.time()
toc <- toc-tic
etime <-units(etime) <- "hours"
data.frame(
as.list(coef(mf)),
loglik = ll[1],
loglik.se = ll[2],
etime = as.numeric(etime))
} sigmaSE_prof }) ->
The preceding calculations took 98.7 cpu hr, or about 1.6 cpu sec per iteration per 1000 particles.
|>
sigmaSE_prof mutate(sigmaSE=exp(signif(log(sigmaSE),5))) |>
group_by(sigmaSE) |>
filter(rank(-loglik)<=2) |>
ungroup() -> sigmaSE_prof
|>
sigmaSE_prof ggplot(aes(x=sigmaSE,y=loglik))+
geom_point()+
geom_smooth(method="loess")
It is useful to plot profile traces, which show how the other parameters vary along the profile. In this case, these display clear relationships between intensity of extra-demographic stochasticity, R0, and durations of the infectious and latent periods.
pairs(~loglik+sigmaSE+R0+I(1/gamma)+I(1/sigma),
data=sigmaSE_prof)
Produced in R version 4.3.2 using pomp version 5.6.
He D, Ionides EL, King AA (2010). “Plug-and-Play Inference for Disease Dynamics: Measles in Large and Small Populations as a Case Study.” J R Soc Interface, 7, 271–283. https://doi.org/10.1098/rsif.2009.0151.
Licensed under the Creative Commons Attribution-NonCommercial license. Please share and remix noncommercially, mentioning its origin.