**Important Note:** These materials have been superseded by materials provided as part of the as part of the SISMID short course on Simulation-based Inference.

Licensed under the Creative Commons Attribution-NonCommercial license. Please share and remix noncommercially, mentioning its origin.

This document has its origins in the SISMID short course on Simulation-based Inference given by Aaron King and Edward Ionides.

Produced with **R** version 3.3.1 and **pomp** version 1.7.5.1.

This tutorial covers likelihood estimation via the method of iterated filtering. It presupposes familiarity with building partially observed Markov process (POMP) objects in the **R** package **pomp** (King et al. 2016). **pomp** is available from CRAN and github. This tutorial follows on from the topic of carrying out particle filtering (also known as sequential Monte Carlo) via `pfilter`

in **pomp**.

We have the following goals:

- Review the available options for inference on POMP models, to put iterated filtering in context.
- Understand how iterated filtering algorithms carry out repeated particle filtering operations, with randomly perturbed parameter values, in order to maximize the likelihood.
- Gain experience carrying out statistical investigations using iterated filtering in a relatively simple situation (fitting an SIR model to a boarding school flu outbreak).

Many, many statistical methods have been proposed for inference on POMP models (He et al. 2010,King et al. (2016)). The volume of research indicates both the importance and the difficulty of the problem. Let’s start by considering three criteria to categorize inference methods: the plug-and-play property; full-information or feature-based; frequentist or Bayesian.

- Inference methodology that calls
`rprocess`

but not`dprocess`

is said to be*plug-and-play*. All popular modern Monte Carlo methods fall into this category. - Simulation-based is equivalent to plug-and-play.
- Historically, simulation-based meant simulating forward from initial conditions to the end of the time series.
- However, particle filtering methods instead consider each observation interval sequentially. They carry out multiple, carefully selected, simulations over each interval.
- Plug-and-play methods can call
`dmeasure`

. A method that uses only`rprocess`

and`rmeasure`

is called “doubly plug-and-play”.

- Two
*non-plug-and-play*methods (expectation-maximization (EM) algorithms and Markov chain Monte Carlo (MCMC)) have theoretical convergence problems for nonlinear POMP models. The failures of these two workhorses of statistical computation have prompted development of alternative methodology.

*Full-information*methods are defined to be those based on the likelihood function for the full data (i.e., likelihood-based frequentist inference and Bayesian inference).*Feature-based*methods either consider a summary statistic (a function of the data) or work with an an alternative to the likelihood.- Asymptotically, full-information methods are statistically efficient and feature-based methods are not.
- Loss of statistical efficiency could potentially be an acceptable tradeoff for advantages in computational efficiency.
- However, good low-dimensional summary statistics can be hard to find.
- When using statistically inefficient methods, it can be hard to know how much information you are losing.
- Intuition and scientific reasoning can be inadequate tools to derive informative low-dimensional summary statistics (Shrestha et al. 2011,Ionides (2011)).

- Recently, plug-and-play Bayesian methods have been discovered:
- particle Markov chain Monte Carlo (PMCMC) (Andrieu et al. 2010).
- approximate Bayesian computation (ABC) (Toni et al. 2009).

- Prior belief specification is both the strength and weakness of Bayesian methodology:
- The likelihood surface for nonlinear POMP models often contains nonlinear ridges and variations in curvature.
- These situations bring into question the appropriateness of independent priors derived from expert opinion on marginal distributions of parameters.
- They also are problematic for specification of “flat” or “uninformative” prior beliefs.
- Expert opinion can be treated as data for non-Bayesian analysis. However, our primary task is to identify the information in the data under investigation, so it can be helpful to use methods that do not force us to make our conclusions dependent on quantification of prior beliefs.

- Iterated filtering methods (Ionides et al. 2006,Ionides et al. (2015)) are the only currently available, full-information, plug-and-play, frequentist methods for POMP models.
- Iterated filtering methods have been shown to solve likelihood-based inference problems for epidemiological situations which are computationally intractable for available Bayesian methodology (Ionides et al. 2015).

Frequentist |
Bayesian |
||
---|---|---|---|

Plug-and-play |
Full-information |
iterated filtering | particle MCMC |

Feature-based |
simulated moments | ABC | |

synthetic likelihood | |||

Not-plug-and-play |
Full-information |
EM algorithm | MCMC |

Kalman filter | |||

Feature-based |
Yule-Walker | extended Kalman filter | |

extended Kalman filter |

- Yule-Walker is the method of moments for ARMA, a linear Gaussian POMP.
- The Kalman filter gives the exact likelihood for a linear Gaussian POMP. The extended Kalman filter gives an approximation for nonlinear models that can be used for quasi-likelihood or quasi-Bayesian inference.

- We use the IF2 algorithm of Ionides et al. (2015).
- A particle filter is carried out with the parameter vector for each particle doing a random walk.
- At the end of the time series, the collection of parameter vectors is recycled as starting parameters for a new particle filter with a smaller random walk variance.
- Theoretically, this procedure converges toward the region of parameter space maximizing the maximum likelihood.
- Empirically, we can test this claim on examples.

**model input**: Simulators for \(f_{X_0}(x_0;\theta)\) and \(f_{X_n|X_{n-1}}(x_n| x_{n-1}; \theta)\); evaluator for \(f_{Y_n|X_n}(y_n| x_n;\theta)\); data, \(y^*_{1:N}\)

**algorithmic parameters**: Number of iterations, \(M\); number of particles, \(J\); initial parameter swarm, \(\{\Theta^0_j, j=1,\dots,J\}\); perturbation density, \(h_n(\theta|\varphi;\sigma)\); perturbation scale, \(\sigma_{1{:}M}\)

**output**: Final parameter swarm, \(\{\Theta^M_j, j=1,\dots,J\}\)

- \(\quad\) For \(m\) in \(1{:} M\)
- \(\quad\quad\quad\) \(\Theta^{F,m}_{0,j}\sim h_0(\theta|\Theta^{m-1}_{j}; \sigma_m)\) for \(j\) in \(1{:} J\)
- \(\quad\quad\quad\) \(X_{0,j}^{F,m}\sim f_{X_0}(x_0 ; \Theta^{F,m}_{0,j})\) for \(j\) in \(1{:} J\)
- \(\quad\quad\quad\) For \(n\) in \(1{:} N\)
- \(\quad\quad\quad\quad\quad\) \(\Theta^{P,m}_{n,j}\sim h_n(\theta|\Theta^{F,m}_{n-1,j},\sigma_m)\) for \(j\) in \(1{:} J\)
- \(\quad\quad\quad\quad\quad\) \(X_{n,j}^{P,m}\sim f_{X_n|X_{n-1}}(x_n | X^{F,m}_{n-1,j}; \Theta^{P,m}_j)\) for \(j\) in \(1{:} J\)
- \(\quad\quad\quad\quad\quad\) \(w_{n,j}^m = f_{Y_n|X_n}(y^*_n| X_{n,j}^{P,m} ; \Theta^{P,m}_{n,j})\) for \(j\) in \(1{:} J\)
- \(\quad\quad\quad\quad\quad\) Draw \(k_{1{:}J}\) with \(P[k_j=i]= w_{n,i}^m\Big/\sum_{u=1}^J w_{n,u}^m\)
- \(\quad\quad\quad\quad\quad\) \(\Theta^{F,m}_{n,j}=\Theta^{P,m}_{n,k_j}\) and \(X^{F,m}_{n,j}=X^{P,m}_{n,k_j}\) for \(j\) in \(1{:} J\)
- \(\quad\quad\quad\) End For
- \(\quad\quad\quad\) Set \(\Theta^{m}_{j}=\Theta^{F,m}_{N,j}\) for \(j\) in \(1{:} J\)
- \(\quad\) End For

**comments**:

- The \(N\) loop (lines 4 through 10) is a basic particle filter applied to a model with stochastic perturbations to the parameters.
- The \(M\) loop repeats this particle filter with decreasing perturbations.
- The superscript \(F\) in \(\Theta^{F,m}_{n,j}\) and \(X^{F,m}_{n,j}\) denote solutions to the
*filtering*problem, with the particles \(j=1,\dots,J\) providing a Monte Carlo representation of the conditional distribution at time \(n\) given data \(y^*_{1:n}\) for filtering iteration \(m\). - The superscript \(P\) in \(\Theta^{P,m}_{n,j}\) and \(X^{P,m}_{n,j}\) denote solutions to the
*prediction*problem, with the particles \(j=1,\dots,J\) providing a Monte Carlo representation of the conditional distribution at time \(n\) given data \(y^*_{1:n-1}\) for filtering iteration \(m\). - The
*weight*\(w^m_{n,j}\) gives the likelihood of the data at time \(n\) for particle \(j\) in filtering iteration \(m\).

For a relatively simple epidemiological example of IF2, we consider fitting a stochastic SIR model to an influenza outbreak in a British boarding school (Anonymous 1978). Reports consist of the number of children confined to bed for each of the 14 days of the outbreak. The total number of children at the school was 763, and a total of 512 children spent time away from class. Only one adult developed influenza-like illness, so adults are omitted from the data and model. First, we read in the data:

`bsflu_data <- read.table("http://kingaa.github.io/short-course/stochsim/bsflu_data.txt")`

Our model is a variation on a basic SIR Markov chain, with state \(X(t)=(S(t),I(t),R_1(t),R_2(t),R_3(t))\) giving the number of individuals in the susceptible and infectious categories, and three stages of recovery. The recovery stages, \(R_1\), \(R_2\) and \(R_3\), are all modeled to be non-contagious. \(R_1\) consists of individuals who are bed-confined if they show symptoms; \(R_2\) consists of individuals who are convalescent if they showed symptoms; \(R_3\) consists of recovered individuals who have returned to school-work if they were symtomatic. The observation on day \(n\) of the observed epidemic (with \(t_1\) being 22 January) consists of the numbers of children who are bed-confined and convalescent. These measurements are modeled as \(Y_n=(B_n,C_n)\) with \(B_n\sim\mathrm{Poisson}(\rho R_1(t_n))\) and \(C_n\sim\mathrm{Poisson}(\rho R_2(t_n))\). Here, \(\rho\) is a reporting rate corresponding to the chance of being symptomatic.

The index case for the epidemic was proposed to be a boy returning to Britain from Hong Kong, who was reported to have a transient febrile illness from 15 to 18 January. It would therefore be reasonable to initialize the epidemic with \(I(t_0)=1\) at \(t_0=-6\). This is a little tricky to reconcile with the rest of the data; for now, we avoid this issue by instead initializing with \(I(t_0)=1\) at \(t_0=0\). All other individuals are modeled to be initially susceptible.

Our Markov transmission model is that each individual in \(S\) transitions to \(I\) at rate \(\beta\,I(t)/N\); each individual in \(I\) transitions at rate \(\mu_I\) to \(R_1\). Subsequently, the individual moves from \(R_1\) to \(R_2\) at rate \(\mu_{R_1}\), and finally from \(R_2\) to \(R_3\) at rate \(\mu_{R_2}\). Therefore, \(1/\mu_I\) is the mean infectious time prior to bed-confinement; \(1/\mu_{R1}\) is the mean duration of bed-confinement for symptomatic cases; \(1/\mu_{R2}\) is the mean duration of convalescence for symptomatic cases. All rates have units \(\mathrm{day}^{-1}\).

This model has limitations and weaknesses, but writing down and fitting a model is a starting point for data analysis, not an end point. In particular, one should try model variations. For example, one could include a latency period for infections, or one could modify the model to give a better description of the bed-confinement and convalescence processes. Ten individuals received antibiotics for secpondary infections, and they had longer bed-confinement and convalescence times. Partly for this reason, we will initially fit only the bed-confinement data, using \(Y_n=B_n\) for our `dmeasure`

.

We do not need a representation of \(R_3\) since this variable has consequences neither for the dynamics of the state process nor for the data. If we confine ourselves for the present to fitting only the bed-confinement data, then we need not track \(R_2\). For the code, we enumerate the state variables (\(S\), \(I\), \(R_1\)) and the parameters (\(\beta\), \(\mu_I\), \(\rho\), \(\mu_{R_1}\)) as follows:

```
statenames <- c("S","I","R1")
paramnames <- c("Beta","mu_I","mu_R1","rho")
```

In the codes below, we’ll refer to the data variables by their names (\(B\), \(C\)), as given in the `bsflu_data`

data-frame:

`colnames(bsflu_data)[1:2]`

`## [1] "B" "C"`

Now, we write the model code:

```
dmeas <- Csnippet("
lik = dpois(B,rho*R1+1e-6,give_log);
")
rmeas <- Csnippet("
B = rpois(rho*R1+1e-6);
")
rproc <- Csnippet("
double N = 763;
double t1 = rbinom(S,1-exp(-Beta*I/N*dt));
double t2 = rbinom(I,1-exp(-mu_I*dt));
double t3 = rbinom(R1,1-exp(-mu_R1*dt));
S -= t1;
I += t1 - t2;
R1 += t2 - t3;
")
fromEst <- Csnippet("
TBeta = exp(Beta);
Tmu_I = exp(mu_I);
Trho = expit(rho);
")
toEst <- Csnippet("
TBeta = log(Beta);
Tmu_I = log(mu_I);
Trho = logit(rho);
")
init <- Csnippet("
S = 762;
I = 1;
R1 = 0;
")
```

We build the `pomp`

object:

```
library(pomp)
pomp(
data=subset(bsflu_data,select=-C),
times="day",t0=0,
rprocess=euler.sim(rproc,delta.t=1/12),
rmeasure=rmeas,dmeasure=dmeas,
fromEstimationScale=fromEst,toEstimationScale=toEst,
initializer=init,
statenames=statenames,
paramnames=paramnames
) -> bsflu
```

`plot(bsflu,main="")`

To develop and debug code, it is useful to have example codes that run quickly. Here, we run some simulations and a particle filter, simply to check that the codes are working correctly. We’ll use the following parameters, derived from our earlier explorations:

`params <- c(Beta=2,mu_I=1,rho=0.9,mu_R1=1/3,mu_R2=1/2)`

Now to run and plot some simulations:

`y <- simulate(bsflu,params=params,nsim=10,as.data.frame=TRUE)`

```
library(ggplot2)
library(reshape2)
ggplot(data=y,mapping=aes(x=time,y=B,group=sim))+
geom_line()+theme_classic()
```

Before engaging in iterated filtering, it is a good idea to check that the basic particle filter is working since iterated filtering builds on this technique. The simulations above check the `rprocess`

and `rmeasure`

codes; the particle filter depends on the `rprocess`

and `dmeasure`

codes and so is a check of the latter.

`pf <- pfilter(bsflu,params=params,Np=1000)`

`plot(pf)`

These plots show the data along with the *effective sample size* of the particle filter (`ess`

) and the log likelihood of each observation conditional on the preceding ones (`cond.logLik`

).

Let’s treat \(\mu_{R_1}\) and \(\mu_{R_2}\) as known, and fix these parameters at the empirical means of the bed-confinement and convalescence times for symptomatic cases, respectively:

`(fixed_params <- with(bsflu_data,c(mu_R1=1/(sum(B)/512),mu_R2=1/(sum(C)/512))))`

```
## mu_R1 mu_R2
## 0.3324675 0.5541126
```

We will estimate \(\beta\), \(\mu_I\), and \(\rho\).

It will be helpful to parallelize most of the computations. Most machines nowadays have multiple cores and using this computational capacity involves only:

- the following lines of code to let R know you plan to use multiple processors;
- the parallel for loop provided by the
**foreach**package; and - proper attention to the use of parallel random number generators.

```
library(foreach)
library(doParallel)
registerDoParallel()
```

The first two lines load the **foreach** and **doParallel** packages, the latter being a “backend” for the **foreach** package. The next line tells **foreach** that we will use the **doParallel** backend. By default, **R** will guess how many concurrent **R** processes to run on this single multicore machine.

We proceed to carry out replicated particle filters at an initial guess of \(\beta=2\), \(\mu_I=1\), and \(\rho=0.9\).

```
stew(file="pf.rda",{
t_pf <- system.time(
pf <- foreach(i=1:10,.packages='pomp',
.options.multicore=list(set.seed=TRUE),
.export=c("bsflu","fixed_params")
) %dopar% {
pfilter(bsflu,params=c(Beta=2,mu_I=1,rho=0.9,fixed_params),Np=10000)
}
)
n_pf <- getDoParWorkers()
},seed=625904618,kind="L'Ecuyer")
(L_pf <- logmeanexp(sapply(pf,logLik),se=TRUE))
```

```
## se
## -86.9221208 0.7738137
```

In 1.4 seconds on a 30-core machine, we obtain an unbiased likelihood estimate of -86.9 with a Monte Carlo standard error of 0.77.

Given a model and a set of data, the likelihood surface is well defined, though it may be difficult to visualize. We can develop a progressively more complete picture of this surface by storing likelihood estimates whenever we compute them. In particular, it is very useful to construct a database within which to store the likelihood of every point for which we have an estimated likelihood. This will become progressively more complete as our parameter-space search goes on. At this point, we’ve computed the likelihood at a single point. Let’s store this point, together with the estimated likelihood and our estimate of the standard error on that likelihood, in a CSV file:

```
results <- as.data.frame(as.list(c(coef(pf[[1]]),loglik=L_pf[1],loglik=L_pf[2])))
write.csv(results,file="bsflu_params.csv",row.names=FALSE)
```

Let’s carry out a local search using `mif2`

around this point in parameter space. For that, we need to set the `rw.sd`

and `cooling.fraction.50`

algorithmic parameters. Since \(\beta\) and \(\mu_I\) will be estimated on the log scale, and we expect that multiplicative perturbations of these parameters will have roughly similar effects on the likelihood, we’ll use a perturbation size of \(0.02\), which we imagine will have a small but non-negligible effect. For simplicity, we’ll use the same perturbation size on \(\rho\). We fix `cooling.fraction.50=0.5`

, so that after 50 `mif2`

iterations, the perturbations are reduced to half their original magnitudes.

```
stew(file="box_search_local.rda",{
t_local_mif <- system.time({
mifs_local <- foreach(i=1:20,
.packages='pomp',
.combine=c,
.options.multicore=list(set.seed=TRUE),
.export=c("bsflu","fixed_params")
) %dopar%
{
mif2(
bsflu,
start=c(Beta=2,mu_I=1,rho=0.9,fixed_params),
Np=2000,
Nmif=50,
cooling.type="geometric",
cooling.fraction.50=0.5,
transform=TRUE,
rw.sd=rw.sd(Beta=0.02,mu_I=0.02,rho=0.02)
)
}
})
},seed=482947940,kind="L'Ecuyer")
```

```
ggplot(data=melt(conv.rec(mifs_local)),
aes(x=iteration,y=value,group=L1,color=factor(L1)))+
geom_line()+
guides(color=FALSE)+
facet_wrap(~variable,scales="free_y")+
theme_bw()
```