Important Note: These materials have been superseded by materials provided as part of the as part of the SISMID short course on Simulation-based Inference.
Licensed under the Creative Commons Attribution-NonCommercial license. Please share and remix noncommercially, mentioning its origin.
This document has its origins in the SISMID short course on Simulation-based Inference given by Aaron King and Edward Ionides.
Produced with R version 3.3.1 and pomp version 1.7.5.1.
This tutorial covers likelihood estimation via the method of iterated filtering. It presupposes familiarity with building partially observed Markov process (POMP) objects in the R package pomp (King et al. 2016). pomp is available from CRAN and github. This tutorial follows on from the topic of carrying out particle filtering (also known as sequential Monte Carlo) via pfilter
in pomp.
We have the following goals:
Many, many statistical methods have been proposed for inference on POMP models (He et al. 2010,King et al. (2016)). The volume of research indicates both the importance and the difficulty of the problem. Let’s start by considering three criteria to categorize inference methods: the plug-and-play property; full-information or feature-based; frequentist or Bayesian.
rprocess
but not dprocess
is said to be plug-and-play. All popular modern Monte Carlo methods fall into this category.dmeasure
. A method that uses only rprocess
and rmeasure
is called “doubly plug-and-play”.Frequentist | Bayesian | ||
---|---|---|---|
Plug-and-play | Full-information | iterated filtering | particle MCMC |
Feature-based | simulated moments | ABC | |
synthetic likelihood | |||
Not-plug-and-play | Full-information | EM algorithm | MCMC |
Kalman filter | |||
Feature-based | Yule-Walker | extended Kalman filter | |
extended Kalman filter |
model input: Simulators for fX0(x0;θ) and fXn|Xn−1(xn|xn−1;θ); evaluator for fYn|Xn(yn|xn;θ); data, y∗1:N
algorithmic parameters: Number of iterations, M; number of particles, J; initial parameter swarm, {Θ0j,j=1,…,J}; perturbation density, hn(θ|φ;σ); perturbation scale, σ1:M
output: Final parameter swarm, {ΘMj,j=1,…,J}
comments:
For a relatively simple epidemiological example of IF2, we consider fitting a stochastic SIR model to an influenza outbreak in a British boarding school (Anonymous 1978). Reports consist of the number of children confined to bed for each of the 14 days of the outbreak. The total number of children at the school was 763, and a total of 512 children spent time away from class. Only one adult developed influenza-like illness, so adults are omitted from the data and model. First, we read in the data:
bsflu_data <- read.table("http://kingaa.github.io/short-course/stochsim/bsflu_data.txt")
Our model is a variation on a basic SIR Markov chain, with state X(t)=(S(t),I(t),R1(t),R2(t),R3(t)) giving the number of individuals in the susceptible and infectious categories, and three stages of recovery. The recovery stages, R1, R2 and R3, are all modeled to be non-contagious. R1 consists of individuals who are bed-confined if they show symptoms; R2 consists of individuals who are convalescent if they showed symptoms; R3 consists of recovered individuals who have returned to school-work if they were symtomatic. The observation on day n of the observed epidemic (with t1 being 22 January) consists of the numbers of children who are bed-confined and convalescent. These measurements are modeled as Yn=(Bn,Cn) with Bn∼Poisson(ρR1(tn)) and Cn∼Poisson(ρR2(tn)). Here, ρ is a reporting rate corresponding to the chance of being symptomatic.
The index case for the epidemic was proposed to be a boy returning to Britain from Hong Kong, who was reported to have a transient febrile illness from 15 to 18 January. It would therefore be reasonable to initialize the epidemic with I(t0)=1 at t0=−6. This is a little tricky to reconcile with the rest of the data; for now, we avoid this issue by instead initializing with I(t0)=1 at t0=0. All other individuals are modeled to be initially susceptible.
Our Markov transmission model is that each individual in S transitions to I at rate βI(t)/N; each individual in I transitions at rate μI to R1. Subsequently, the individual moves from R1 to R2 at rate μR1, and finally from R2 to R3 at rate μR2. Therefore, 1/μI is the mean infectious time prior to bed-confinement; 1/μR1 is the mean duration of bed-confinement for symptomatic cases; 1/μR2 is the mean duration of convalescence for symptomatic cases. All rates have units day−1.
This model has limitations and weaknesses, but writing down and fitting a model is a starting point for data analysis, not an end point. In particular, one should try model variations. For example, one could include a latency period for infections, or one could modify the model to give a better description of the bed-confinement and convalescence processes. Ten individuals received antibiotics for secpondary infections, and they had longer bed-confinement and convalescence times. Partly for this reason, we will initially fit only the bed-confinement data, using Yn=Bn for our dmeasure
.
We do not need a representation of R3 since this variable has consequences neither for the dynamics of the state process nor for the data. If we confine ourselves for the present to fitting only the bed-confinement data, then we need not track R2. For the code, we enumerate the state variables (S, I, R1) and the parameters (β, μI, ρ, μR1) as follows:
statenames <- c("S","I","R1")
paramnames <- c("Beta","mu_I","mu_R1","rho")
In the codes below, we’ll refer to the data variables by their names (B, C), as given in the bsflu_data
data-frame:
colnames(bsflu_data)[1:2]
## [1] "B" "C"
Now, we write the model code:
dmeas <- Csnippet("
lik = dpois(B,rho*R1+1e-6,give_log);
")
rmeas <- Csnippet("
B = rpois(rho*R1+1e-6);
")
rproc <- Csnippet("
double N = 763;
double t1 = rbinom(S,1-exp(-Beta*I/N*dt));
double t2 = rbinom(I,1-exp(-mu_I*dt));
double t3 = rbinom(R1,1-exp(-mu_R1*dt));
S -= t1;
I += t1 - t2;
R1 += t2 - t3;
")
fromEst <- Csnippet("
TBeta = exp(Beta);
Tmu_I = exp(mu_I);
Trho = expit(rho);
")
toEst <- Csnippet("
TBeta = log(Beta);
Tmu_I = log(mu_I);
Trho = logit(rho);
")
init <- Csnippet("
S = 762;
I = 1;
R1 = 0;
")
We build the pomp
object:
library(pomp)
pomp(
data=subset(bsflu_data,select=-C),
times="day",t0=0,
rprocess=euler.sim(rproc,delta.t=1/12),
rmeasure=rmeas,dmeasure=dmeas,
fromEstimationScale=fromEst,toEstimationScale=toEst,
initializer=init,
statenames=statenames,
paramnames=paramnames
) -> bsflu
plot(bsflu,main="")
To develop and debug code, it is useful to have example codes that run quickly. Here, we run some simulations and a particle filter, simply to check that the codes are working correctly. We’ll use the following parameters, derived from our earlier explorations:
params <- c(Beta=2,mu_I=1,rho=0.9,mu_R1=1/3,mu_R2=1/2)
Now to run and plot some simulations:
y <- simulate(bsflu,params=params,nsim=10,as.data.frame=TRUE)
library(ggplot2)
library(reshape2)
ggplot(data=y,mapping=aes(x=time,y=B,group=sim))+
geom_line()+theme_classic()
Before engaging in iterated filtering, it is a good idea to check that the basic particle filter is working since iterated filtering builds on this technique. The simulations above check the rprocess
and rmeasure
codes; the particle filter depends on the rprocess
and dmeasure
codes and so is a check of the latter.
pf <- pfilter(bsflu,params=params,Np=1000)
plot(pf)
These plots show the data along with the effective sample size of the particle filter (ess
) and the log likelihood of each observation conditional on the preceding ones (cond.logLik
).
Let’s treat μR1 and μR2 as known, and fix these parameters at the empirical means of the bed-confinement and convalescence times for symptomatic cases, respectively:
(fixed_params <- with(bsflu_data,c(mu_R1=1/(sum(B)/512),mu_R2=1/(sum(C)/512))))
## mu_R1 mu_R2
## 0.3324675 0.5541126
We will estimate β, μI, and ρ.
It will be helpful to parallelize most of the computations. Most machines nowadays have multiple cores and using this computational capacity involves only:
library(foreach)
library(doParallel)
registerDoParallel()
The first two lines load the foreach and doParallel packages, the latter being a “backend” for the foreach package. The next line tells foreach that we will use the doParallel backend. By default, R will guess how many concurrent R processes to run on this single multicore machine.
We proceed to carry out replicated particle filters at an initial guess of β=2, μI=1, and ρ=0.9.
stew(file="pf.rda",{
t_pf <- system.time(
pf <- foreach(i=1:10,.packages='pomp',
.options.multicore=list(set.seed=TRUE),
.export=c("bsflu","fixed_params")
) %dopar% {
pfilter(bsflu,params=c(Beta=2,mu_I=1,rho=0.9,fixed_params),Np=10000)
}
)
n_pf <- getDoParWorkers()
},seed=625904618,kind="L'Ecuyer")
(L_pf <- logmeanexp(sapply(pf,logLik),se=TRUE))
## se
## -86.9221208 0.7738137
In 1.4 seconds on a 30-core machine, we obtain an unbiased likelihood estimate of -86.9 with a Monte Carlo standard error of 0.77.
Given a model and a set of data, the likelihood surface is well defined, though it may be difficult to visualize. We can develop a progressively more complete picture of this surface by storing likelihood estimates whenever we compute them. In particular, it is very useful to construct a database within which to store the likelihood of every point for which we have an estimated likelihood. This will become progressively more complete as our parameter-space search goes on. At this point, we’ve computed the likelihood at a single point. Let’s store this point, together with the estimated likelihood and our estimate of the standard error on that likelihood, in a CSV file:
results <- as.data.frame(as.list(c(coef(pf[[1]]),loglik=L_pf[1],loglik=L_pf[2])))
write.csv(results,file="bsflu_params.csv",row.names=FALSE)
Let’s carry out a local search using mif2
around this point in parameter space. For that, we need to set the rw.sd
and cooling.fraction.50
algorithmic parameters. Since β and μI will be estimated on the log scale, and we expect that multiplicative perturbations of these parameters will have roughly similar effects on the likelihood, we’ll use a perturbation size of 0.02, which we imagine will have a small but non-negligible effect. For simplicity, we’ll use the same perturbation size on ρ. We fix cooling.fraction.50=0.5
, so that after 50 mif2
iterations, the perturbations are reduced to half their original magnitudes.
stew(file="box_search_local.rda",{
t_local_mif <- system.time({
mifs_local <- foreach(i=1:20,
.packages='pomp',
.combine=c,
.options.multicore=list(set.seed=TRUE),
.export=c("bsflu","fixed_params")
) %dopar%
{
mif2(
bsflu,
start=c(Beta=2,mu_I=1,rho=0.9,fixed_params),
Np=2000,
Nmif=50,
cooling.type="geometric",
cooling.fraction.50=0.5,
transform=TRUE,
rw.sd=rw.sd(Beta=0.02,mu_I=0.02,rho=0.02)
)
}
})
},seed=482947940,kind="L'Ecuyer")
ggplot(data=melt(conv.rec(mifs_local)),
aes(x=iteration,y=value,group=L1,color=factor(L1)))+
geom_line()+
guides(color=FALSE)+
facet_wrap(~variable,scales="free_y")+
theme_bw()
Although the filtering carried out by mif2
in the final filtering iteration generates an approximation to the likelihood at the resulting point estimate, this is not usually good enough for reliable inference. Partly, this is because parameter perturbations are applied in the last filtering iteration. Partly, this is because mif2
is usually carried out with a smaller number of particles than is necessary for a good likelihood evaluation—the errors in mif2
average out over many iterations of the filtering. Therefore, we evaluate the likelihood, together with a standard error, using replicated particle filters at each point estimate:
stew(file="lik_local.rda",{
t_local_eval <- system.time({
results_local <- foreach(mf=mifs_local,
.packages='pomp',
.combine=rbind,
.options.multicore=list(set.seed=TRUE)
) %dopar%
{
evals <- replicate(10, logLik(pfilter(mf,Np=20000)))
ll <- logmeanexp(evals,se=TRUE)
c(coef(mf),loglik=ll[1],loglik=ll[2])
}
})
},seed=900242057,kind="L'Ecuyer")
results_local <- as.data.frame(results_local)
This investigation took 9 sec for the maximization and 16 sec for the likelihood evaluation. These repeated stochastic maximizations can also show us the geometry of the likelihood surface in a neighborhood of this point estimate:
pairs(~loglik+Beta+mu_I+rho,data=results_local,pch=16)
We add these newly explored points to our database:
results <- rbind(results,results_local[names(results)])
write.csv(results,file="bsflu_params.csv",row.names=FALSE)
When carrying out parameter estimation for dynamic systems, we need to specify beginning values for both the dynamic system (in the state space) and the parameters (in the parameter space). By convention, we use initial values for the initialization of the dynamic system and starting values for initialization of the parameter search.
Practical parameter estimation involves trying many starting values for the parameters. One can specify a large box in parameter space that contains all parameter vectors which seem remotely sensible. If an estimation method gives stable conclusions with starting values drawn randomly from this box, this gives some confidence that an adequate global search has been carried out.
For our flu model, a box containing reasonable parameter values might be
params_box <- rbind(
Beta=c(1,5),
mu_I=c(0.5,3),
rho = c(0.5,1)
)
We are now ready to carry out likelihood maximizations from diverse starting points.
stew(file="box_search_global.rda",{
n_global <- getDoParWorkers()
t_global <- system.time({
mf1 <- mifs_local[[1]]
guesses <- as.data.frame(apply(params_box,1,function(x)runif(300,x[1],x[2])))
results_global <- foreach(guess=iter(guesses,"row"),
.packages='pomp',
.combine=rbind,
.options.multicore=list(set.seed=TRUE),
.export=c("mf1","fixed_params")
) %dopar%
{
mf <- mif2(mf1,start=c(unlist(guess),fixed_params))
mf <- mif2(mf,Nmif=100)
ll <- replicate(10,logLik(pfilter(mf,Np=100000)))
ll <- logmeanexp(ll,se=TRUE)
c(coef(mf),loglik=ll[1],loglik=ll[2])
}
})
},seed=1270401374,kind="L'Ecuyer")
results_global <- as.data.frame(results_global)
results <- rbind(results,results_global[names(results)])
write.csv(results,file="bsflu_params.csv",row.names=FALSE)
In the above, we took advantage of a general pomp behavior: re-running a command on an object (i.e., mif2
on mf1
) created by the same command preserves the algorithmic arguments.
Evaluation of the best result of this search gives a likelihood of -72.8 with a standard error of 1.54. This took in 22 minutes altogether on a machine with 30 processors. One can visualize aspects of the global geometry of the likelihood surface with a scatterplot matrix of these diverse parameter estimates:
library(plyr)
all <- ldply(list(guess=guesses,result=subset(results,loglik>max(loglik)-50)))
pairs(~Beta+mu_I+rho,data=all,col=ifelse(all$.id=="guess",grey(0.5),"red"),pch=16)
pairs(~loglik+Beta+mu_I+rho,data=subset(results,loglik>max(loglik)-50),pch=16)
We see that optimization attempts from diverse remote starting points end up with comparable likelihoods, even when the parameter values are quite distinct. This gives us some confidence in our maximization procedure.
Try to estimate β, μI, ρ, and μR1 simultaneously. Does your estimate of μR1 differ from the value we computed from the raw data? How do you interpret the agreement or lack thereof?
The Poisson measurement model used here may not seem easy to interpret. Formulate an alternative measurement model and maximize the likelihood to compare the alternative model.
How strong is the evidence about the contact rate, β, given this model and data? Use mif2
to construct a profile likelihood. Due to time constraints, you may be able to compute only a preliminary version.
It is also possible to profile over the basic reproduction number, R0=β/μI. Is this more or less well determined that β for this model and data?
Check the source code for the bsflu
pomp
object. Does the code implement the model described?
For various reasons, it can be surprisingly hard to make sure that the written equations and the code are perfectly matched. Here are some things to think about:
rprocess
. Suppose that plug-and-play statistical methodology is used to infer parameters. A conscientious researcher carries out a simulation study, using simulate
to generate some realizations from the fitted model and checking that the inference methodology can successfully recover the known parameters for this model, up to some statistical error. Will this procedure help to identify the error in rprocess
? If not, how does one debug rprocess
? What research practices help minimize the risk of errors in simulation code?Develop your own heuristics to try to improve the performance of mif2
in the previous example. Specifically, for a global optimization procedure carried out using random starting values in the specified box, let ˆΘmax be a random Monte Carlo estimate of the resulting MLE, and let ˆθ be the true (unknown) MLE. We can define the maximization error in the log likelihood to be e=ℓ(ˆθ)−E[ℓ(ˆΘmax)]. We cannot directly evaluate e, since there is also Monte Carlo error in our evaluation of ℓ(θ), but we can compute it up to a known precision. Plan some code to estimates e for a search procedure using a computational effort of JM=2×107, comparable to that used for each mif computation in the global search. Discuss the strengths and weaknesses of this quantification of optimization success. See if you can choose J and M subject to this constraint, together with choices of rw.sd
and the cooling rate, cooling.fraction.50
, to arrive at a quantifiably better procedure. Computationally, you may not be readily able to run your full procedure, but you could run a quicker version of it.
Even in this small, 3 parameter example, it takes a considerable amount of computation to find the global maximum (with values of β around 0.004) starting from uniform draws in the specified box. The problem is that, on the scale on which “uniform” is defined, the peak around β≈0.004 is very narrow. Propose and test a more favorable way to draw starting parameters for the global search, with better scale invariance properties.
Modify the model to include a latent period between becoming exposed and becoming infectious. See what effect this has on the maximized likelihood.
Andrieu, C., A. Doucet, and R. Holenstein. 2010. Particle Markov chain Monte Carlo methods. Journal of the Royal Statistical Society, Series B 72:269–342.
Anonymous. 1978. Influenza in a boarding school. British medical journal 1:587.
He, D., E. L. Ionides, and A. A. King. 2010. Plug-and-play inference for disease dynamics: Measles in large and small populations as a case study. Journal of the Royal Society, Interface 7:271–283.
Ionides, E. L. 2011. Discussion on “Feature matching in time series modeling” by Y. Xia and H. Tong. Statistical Science 26:49–52.
Ionides, E. L., C. Bretó, and A. A. King. 2006. Inference for nonlinear dynamical systems. Proceedings of the National Academy of Sciences of the U.S.A. 103:18438–18443.
Ionides, E. L., D. Nguyen, Y. Atchadé, S. Stoev, and A. A. King. 2015. Inference for dynamic and latent variable models via iterated, perturbed Bayes maps. Proceedings of the National Academy of Sciences of the U.S.A. 112:719–724.
King, A. A., D. Nguyen, and E. L. Ionides. 2016. Statistical inference for partially observed Markov processes via the R package pomp. Journal of Statistical Software 69:1–43.
Shrestha, S., A. A. King, and P. Rohani. 2011. Statistical inference for multi-pathogen systems. PLoS Computational Biology 7:e1002135.
Toni, T., D. Welch, N. Strelkowa, A. Ipsen, and M. P. Stumpf. 2009. Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems. Journal of the Royal Society Interface 6:187–202.