Licensed under the Creative Commons Attribution-NonCommercial license. Please share and remix noncommercially, mentioning its origin.
Note: This document has been updated for pomp version 4.
Data y∗1,…,y∗N collected at times t1<⋯<tN are modeled as noisy, incomplete, and indirect observations of a Markov process {X(t),t≥t0}.
This is a partially observed Markov process (POMP) model, also known as a hidden Markov model or a state space model.
The POMP class of models can accommodate a variety of complexities that commonly arise, especially in biological models:
The R package pomp provides
The goals of the pomp project are to:
Our goal in this presentation is two-fold:
To demonstrate and explain the package…
Write Xn=X(tn) and X0:N=(X0,…,XN). Let Yn be a random variable modeling the observation at time tn.
The one-step transition density, fXn|Xn−1(xn|xn−1;θ), together with the measurement density, fYn|Xn(yn|xn;θ) and the initial density, fX0(x0;θ), specify the entire joint density via
fX0:N,Y1:N(x0:N,y1:N;θ)=fX0(x0;θ)N∏n=1fXn|Xn−1(xn|xn−1;θ)fYn|Xn(yn|xn;θ).
fY1:N(y∗1:N;θ)=∫fX0:N,Y1:N(x0:N,y∗1:N;θ)dx0:N.
fXn|X0:n−1,Y1:n−1(xn|x0:n−1,y1:n−1)=fXn|Xn−1(xn|xn−1).
Moreover, the measurable random variable, Yn, depends only on the state at that time: fYn|X0:N,Y1:n−1(yn|x0:n,y1:n−1)=fYn|Xn(yn|xn), for all n=1,…,N.
Observations times tn need not be regularly spaced.
To think algorithmically, we define some function calls:
rprocess( )
: a draw from fXn|Xn−1(xn|xn−1;θ)
dprocess( )
: evaluation of fXn|Xn−1(xn|xn−1;θ)
rmeasure( )
: a draw from fYn|Xn(yn|xn;θ)
dmeasure( )
: evaluation of fYn|Xn(yn|xn;θ)
initializer( )
: a draw from fX0(x0;θ)
pomp is an R package for data analysis using partially observed Markov process (POMP) models.
Note the distinction: lower case pomp is a software package; upper case POMP is a class of models.
pomp builds methodology for POMP models in terms of arbitrary user-specified rprocess()
, dprocess()
, rmeasure()
, dmeasure()
, and initializer
functions.
Simulating random processes is often much easier than evaluating their transition probabilities.
In other words, having formulated a scientifically interesting model, we frequently find ourselves able to write rprocess()
but not dprocess()
.
Simulation-based methods require the user to specify rprocess()
but not dprocess()
.
Plug-and-play, likelihood-free and equation-free are alternative terms for “simulation-based” methods.
Much development of simulation-based statistical methodology has occurred in the past decade.
The algorithms in pomp at the present moment are all plug-and-play in this sense. As we will see, there is great scope for inclusion of new general methods within or on top of the package.
As of v. 1.14.
All of the above are plug-and-play methods.
The Ricker (1954) map describes the stochastic dynamics of a simple population, Nt+1=rNtexp(−cNt+εt),εt∼Normal(0,σ),
N is a state variable, r and c are parameters. r is dimensionless and c has units of inverse density.
For simplicity, we will fix c=1 for the remainder of this document.
We can view the initial condition, N0 as a special kind of parameter, an initial-value parameter.
The Ricker model is perhaps the simplest useful nonlinear population dynamics model. As such, it allows us to introduce the basic features of the pomp package with a minimum amount of distracting complexity. We will examine a more complex model below.
Let’s suppose that the Ricker model is our model for the dynamics of a real population.
However, we cannot know the exact population density at any time, but only estimate it through sampling.
Let’s model measurement error by assuming the observed measurement, yt, is modeled as a realization of a random variable Yt that is Poisson with mean ϕNt: Yt∼Poisson(ϕNt)
In this equation,
To map this model onto the general framework discussed above, we have Xt=(Nt)orXt=(Nt,εt),Yt=Yt,θ=(r,c,ϕ,N0)
The basic data-structure provided by pomp is the object of class pomp
, alternatively known as a “pomp object”.
It is a container that holds real or simulated data and a POMP model, possibly together with other information such as model parameters and covariates.
A real pomp data analysis begins with constructing one or more pomp
objects to hold the data and the model or models under consideration. Here, we’ll illustrate this process a dataset of Parus major abundance in Wytham Wood, near Oxford (McCleery and Perrins 1991).
Download and plot the data:
url("https://kingaa.github.io/pomp/vignettes/parus.csv")
loc <- read.csv(loc)
dat <-head(dat)
year <int> | pop <int> | |||
---|---|---|---|---|
1 | 1960 | 148 | ||
2 | 1961 | 258 | ||
3 | 1962 | 185 | ||
4 | 1963 | 170 | ||
5 | 1964 | 267 | ||
6 | 1965 | 239 |
plot(pop~year,data=dat,type='o')
The call to construct a pomp
object is, naturally enough, pomp
. Documentation on this function can be had by doing ?pomp
.
Now, to construct our pomp
object:
library(pomp)
pomp(dat,times="year",t0=1959) parus <-
The times
argument specifies that the column of dat
labelled “year” gives the measurement times; t0
is the “zero-time”, the time at which the state process will be initialized. We’ve set it to one year prior to the beginning of the data. Plot the new pomp
object:
plot(parus)
We can add the stochastic Ricker model to parus
by writing a procedure that simulates one realization of the stochastic process, from an arbitary time t to t+1, given arbitrary states and parameters. We provide this to pomp
in the form of a Csnippet, a little snippet of C code that performs the computation. The following does this.
Csnippet("
stochStep <- N = r*N*exp(-c*N+rnorm(0,sigma));
")
pomp(
parus,rprocess=discrete_time(step.fun=stochStep,delta.t=1),
rinit=Csnippet("N = N_0;"),
paramnames=c("r","c","sigma","N_0"),
statenames=c("N")
parus ) ->
Note that in the above, we use thernorm
function from the R API. In general any C function provided by R is available to you. pomp also provides a number of C functions that are documented in the header file, pomp.h
, that is installed with the package. See the Csnippet documentation (?Csnippet
) to read more about how to write them. Note too that we use discrete.time.sim
here because the model is a stochastic map. We specify that the time step of the discrete-time process is delta.t
, here, 1 yr.
At this point, we have what we need to simulate the state process:
simulate(parus, params=c(N_0=1,r=12,c=1,sigma=0.5),
sim <-format="data.frame")
plot(N~year,data=sim,type='o')
We complete the specification of the POMP by specifying the measurement model. To obtain the Poisson measurement model described above, we write two Csnippets. The first simulates:
Csnippet("pop = rpois(phi*N);") rmeas <-
The second computes the likelihood of observing pop
birds given a true density of N
:
Csnippet("lik = dpois(pop,phi*N,give_log);") dmeas <-
In the above, rpois
and dpois
again come from R’s C API. Note the give_log
argument. When this code is evaluated, give_log
will be set to 1 if the log likelihood is desired, and 0 else.
We fold these into the pomp
object:
pomp(parus,
rmeasure=rmeas,
dmeasure=dmeas,
statenames=c("N"),
paramnames=c("phi")
parus ) ->
Now we can simulate the whole POMP. First, let’s add some parameters:
coef(parus) <- c(N_0=1,r=20,c=1,sigma=0.1,phi=200)
library(ggplot2)
simulate(parus,nsim=3,format="data.frame",include.data=TRUE)
sims <-
ggplot(data=sims,
mapping=aes(x=year,y=pop))+
geom_line()+
facet_wrap(~.id,ncol=1,scales="free_y")
Let’s see what simple things can be done with a pomp
object.
We can plot the data by doing
plot(parus)
We can simulate by doing
simulate(parus) x <-
What kind of object have we created with this call to simulate
?
class(x)
## [1] "pomp"
## attr(,"package")
## [1] "pomp"
plot(x)
Why do we see more time series in the simulated pomp
object?
We can turn a pomp
object into a data frame:
as.data.frame(parus)
y <-head(y)
year <dbl> | pop <dbl> | |||
---|---|---|---|---|
1 | 1960 | 148 | ||
2 | 1961 | 258 | ||
3 | 1962 | 185 | ||
4 | 1963 | 170 | ||
5 | 1964 | 267 | ||
6 | 1965 | 239 |
head(simulate(parus,format="data.frame"))
year <dbl> | .id <ord> | N <dbl> | pop <dbl> | |
---|---|---|---|---|
1 | 1960 | 1 | 8.16694084 | 1614 |
2 | 1961 | 1 | 0.04233592 | 7 |
3 | 1962 | 1 | 0.75949589 | 141 |
4 | 1963 | 1 | 7.04479206 | 1409 |
5 | 1964 | 1 | 0.13100851 | 37 |
6 | 1965 | 1 | 2.07165724 | 370 |
We can also run multiple simulations simultaneously:
simulate(parus,nsim=10)
x <-class(x)
## [1] "pompList"
## attr(,"package")
## [1] "pomp"
sapply(x,class)
## [1] "pomp" "pomp" "pomp" "pomp" "pomp" "pomp" "pomp" "pomp" "pomp" "pomp"
simulate(parus,nsim=10,format="data.frame")
x <-head(x)
year <dbl> | .id <ord> | N <dbl> | pop <dbl> | |
---|---|---|---|---|
1 | 1960 | 1 | 7.609331 | 1535 |
2 | 1960 | 2 | 7.019241 | 1444 |
3 | 1960 | 3 | 7.783268 | 1458 |
4 | 1960 | 4 | 7.350210 | 1439 |
5 | 1960 | 5 | 6.692291 | 1293 |
6 | 1960 | 6 | 7.327808 | 1493 |
str(x)
## 'data.frame': 270 obs. of 4 variables:
## $ year: num 1960 1960 1960 1960 1960 1960 1960 1960 1960 1960 ...
## $ .id : Ord.factor w/ 10 levels "1"<"2"<"3"<"4"<..: 1 2 3 4 5 6 7 8 9 10 ...
## $ N : num 7.61 7.02 7.78 7.35 6.69 ...
## $ pop : num 1535 1444 1458 1439 1293 ...
Also,
library(ggplot2)
simulate(parus,nsim=9,format="data.frame",include.data=TRUE)
x <-ggplot(data=x,aes(x=year,y=pop,group=.id,color=(.id=="data")))+
geom_line()+guides(color="none")+
facet_wrap(~.id,ncol=2)
Notice that parus
has parameters associated with it:
coef(parus)
## N_0 r c sigma phi
## 1.0 20.0 1.0 0.1 200.0
These are the parameters at which the simulations and deterministic trajectory computations above were done. We can run these at different parameters:
coef(parus)
theta <-c("r","N_0")] <- c(5,3)
theta[
simulate(parus,params=theta)
x <-
plot(x,var="pop")
We can also change the parameters stored inside of parus
:
coef(parus,c("r","N_0","sigma")) <- c(5,1.5,0.1)
coef(parus)
## N_0 r c sigma phi
## 1.5 5.0 1.0 0.1 200.0
plot(simulate(parus),var=c("pop","N"))
We can obtain a Monte Carlo estimate of the likelihood using the particle filter:
pfilter(parus,Np=1000)
pf <-class(pf)
## [1] "pfilterd_pomp"
## attr(,"package")
## [1] "pomp"
logLik(pf)
## [1] -597.9426
plot(pf,var=c("ess","cond.logLik"))
A mainstay of theoretical epidemiology, the SIR model describes the progress of a contagious, immunizing infection through a population of hosts (Anderson and May 1991; Keeling and Rohani 2009; Kermack and McKendrick 1927). The hosts are divided into three classes, according to their status vis-à-vis the infection. The susceptible class (S) contains those that have not yet been infected and are thereby still susceptible to it; the infected class (I) comprises those who are currently infected and, by assumption, infectious; the removed class (R) includes those who are recovered or quarantined as a result of the infection. Individuals in R are assumed to be immune against reinfection. We let S(t), I(t), and R(t) represent the numbers of individuals within the respective classes at time t.
We formulate this model as a continuous-time Markov process.
The numbers of individuals within each class change through time in whole-number increments as discrete births, deaths, and passages between compartments occur.
NAB is the stochastic counting process whose value at time t is the number of individuals that have passed from compartment A to compartment B during the interval [t0,t), where t0 is an arbitrary starting point not later than the first observation.
N∙S is births and NA∙ is deaths from compartment A.
Per capita birth and death rates, and the rate of transition, γ, from I to R are constants.
The S to I transition rate, the so-called force of infection is λ(t)=βI(t)/P, where β is the transmission rate and P=S+I+R is the population size.
Assume that birth and death rates are equal and independent of infection status: let μ denote the common rate.
A consequence is that the expected population size remains constant.
The continuous-time Markov process is fully specified by the infinitesimal increment probabilities. Specifically, writing ΔN(t)=N(t+h)−N(t), we have P[ΔN∙S(t)=1|S(t),I(t),R(t)]=μP(t)h+o(h),P[ΔNSI(t)=1|S(t),I(t),R(t)]=λ(t)S(t)h+o(h),P[ΔNIR(t)=1|S(t),I(t),R(t)]=γI(t)h+o(h),P[ΔNS∙(t)=1|S(t),I(t),R(t)]=μS(t)h+o(h),P[ΔNI∙(t)=1|S(t),I(t),R(t)]=μI(t)h+o(h),P[ΔNR∙(t)=1|S(t),I(t),R(t)]=μR(t)h+o(h), together with statement that all events of the form {ΔNAB(t)>1}and{ΔNAB(t)=1,ΔNCD(t)=1} for A, B, C, D with (A,B)≠(C,D) have probability o(h). The counting processes are coupled to the state variables (Bretó and Ionides 2011) via the following identities ΔS(t)=ΔN∙S(t)−ΔNSI(t)−ΔNS∙(t),ΔI(t)=ΔNSI(t)−ΔNIR(t)−ΔNI∙(t),ΔR(t)=ΔNIR(t)−ΔNR∙(t).
It is typically impossible to monitor S, I, and R, directly. It sometimes happens, however, that public health authorities keep records of cases, i.e., individual infections. The number of cases, C(t1,t2), recorded within a given reporting interval [t1,t2) might perhaps be modeled by a negative binomial process C(t1,t2)∼NegBin(ρΔNSI(t1,t2),θ), where ΔNSI(t1,t2) is the true incidence (the accumulated number of new infections that have occured over the [t1,t2) interval), ρ is the reporting rate, (the probability that an infection is observed and recorded), θ is the negative binomial “size” parameter, and the notation is meant to indicate that E[C(t1,t2)|ΔNSI(t1,t2)=H]=ρH and Var[C(t1,t2)|ΔNSI(t1,t2)=H]=ρH+ρ2H2θ. The fact that the observed data are linked to an accumulation, as opposed to an instantaneous value, introduces a slight complication, which we discuss below.
As before, we will need to write functions to implement some or all of the SIR model’s rprocess
, rmeasure
, and dmeasure
components. As above, we will write these components using pomp’s Csnippets. Recall that these are snippets of C code that pomp automatically assembles, compiles, and dynamically loads into the running R session.
To start with, we will write snippets that specify the measurement model (rmeasure
and dmeasure
):
"
rmeas <- cases = rnbinom_mu(theta, rho * H);
"
"
dmeas <- lik = dnbinom_mu(cases, theta, rho * H, give_log);
"
Here, we are using cases
to refer to the data (number of reported cases) and H
to refer to the true incidence over the reporting interval.
The negative binomial simulator rnbinom_mu
and density function dnbinom_mu
are provided by R.
The logical flag give_log
requests the likelihood when FALSE
, the log likelihood when TRUE
.
Notice that, in these snippets, we never declare the variables; pomp
will ensure that the state variable (H
), observable (cases
), parameters (theta
, rho
), and likelihood (lik
) are defined in the contexts within which these snippets are executed.
For the rprocess
portion, we could simulate from the continuous-time Markov process exactly (Gillespie 1977); the pomp function gillespie.sim
implements this algorithm. However, for practical purposes, the exact algorithm is often prohibitively slow.
If we are willing to live with an approximate simulation scheme, we can use the so-called “tau-leap” algorithm, one version of which is implemented in pomp via the euler
plug-in.
This algorithm holds the transition rates λ, μ, γ constant over a small interval of time Δt and simulates the numbers of births, deaths, and transitions that occur over that interval.
It then updates the state variables S, I, R accordingly, increments the time variable by Δt, recomputes the transition rates, and repeats.
Naturally, as Δt→0, this approximation to the true continuous-time process becomes better and better.
The critical feature from the inference point of view, however, is that no relationship needs to be assumed between the Euler simulation interval Δt and the reporting interval, which itself does not even need to be the same from one observation to the next.
Under the above assumptions, the number of individuals leaving any of the classes by all available routes over a particular time interval is a multinomial process. For example, if ΔNSI and ΔNS∙ are the numbers of S individuals acquiring infection and dying, respectively, over the Euler simulation interval [t,t+Δt), then (ΔNSI,ΔNS,S−ΔNSI−ΔNS∙)∼Multinom(S(t);pSI,pS∙,1−pSI−pS∙), where pSI=λ(t)λ(t)+μ(1−e−(λ(t)+μ)Δt),pS∙=μλ(t)+μ(1−e−(λ(t)+μ)Δt). By way of shorthand, we say that the random variable (ΔNSI,ΔNS) in has an Euler-multinomial distribution. pomp provides convenience functions for such distributions, which arise with some frequency in compartmental models. Specifically, the functions reulermultinom
and deulermultinom
respectively draw random deviates from, and evaluate the probability mass function of, such distributions. As the help pages relate, reulermultinom
and deulermultinom
parameterize the Euler-multinomial distributions by the size (S(t) in ), rates (λ(t) and μ), and time interval Δt. Obviously, the Euler-multinomial distributions generalize to an arbitrary number of exit routes.
The help page (?euler
) informs us that to use euler
, we need to specify a procedure that advances the states from t to t+Δt. We do this via a Csnippet:
"
sir.step <- double rate[6];
double dN[6];
double P;
P = S + I + R;
rate[0] = mu * P; // birth
rate[1] = Beta * I / P; // transmission
rate[2] = mu; // death from S
rate[3] = gamma; // recovery
rate[4] = mu; // death from I
rate[5] = mu; // death from R
dN[0] = rpois(rate[0] * dt);
reulermultinom(2, S, &rate[1], dt, &dN[1]);
reulermultinom(2, I, &rate[3], dt, &dN[3]);
reulermultinom(1, R, &rate[5], dt, &dN[5]);
S += dN[0] - dN[1] - dN[2];
I += dN[1] - dN[3] - dN[4];
R += dN[3] - dN[5];
H += dN[1];
"
As before, pomp will ensure that the undeclared state variables and parameters are defined in the context within which the snippet is executed.
Note, however, that in the above we do declare certain local variables.
In particular, the rate
and dN
arrays hold the rates and numbers of transition events, respectively.
Note too, that we make use of pomp’s C interface to reulermultinom
, documented in the package help pages (?reulermultinom
).
The package help system (?pomp
) includes instructions for, and examples of, the use of Csnippets.
Two significant wrinkles remain to be explained.
First, notice that in sir.step
, the variable H
simply accumulates the numbers of new infections: H
is a counting process that is nondecreasing with time. In fact, the incidence within an interval [t1,t2) is ΔNSI(t1,t2)=H(t2)−H(t1). This leads to a technical difficulty with the measurement process, however, in that the data are assumed to be records of new infections occurring within the latest reporting interval, while the process model tracks the accumulated number of new infections since time t0. We can get around this difficulty by re-setting H
to zero immediately after each observation. We cause pomp to do this via the pomp
function’s accumvars
argument, as we will see in a moment. The section on “accumulator variables” in the pomp
help page (?pomp
) discusses this in more detail.
The second wrinkle has to do with the initial conditions, i.e., the states S(t0), I(t0), R(t0). By default, pomp will allow us to specify these initial states arbitrarily. For the model to be consistent, they should be positive integers that sum to the population size N. We can enforce this constraint by customizing the parameterization of our initial conditions. We do this by furnishing a custom initializer
in the call to pomp
. Let us construct it now and fill it with simulated data.
simulate(
sir1 <-times = seq(0, 10, by = 1/52),
t0 = -1/52,
dmeasure = Csnippet(dmeas),
rmeasure = Csnippet(rmeas),
rprocess = euler(step.fun = Csnippet(sir.step), delta.t = 1/52/20),
obsnames="cases",
statenames = c("S", "I", "R", "H"),
paramnames = c("gamma", "mu", "theta", "Beta", "popsize",
"rho", "S.0", "I.0", "R.0"),
accumvars = "H",
rinit = Csnippet("
double sum = S_0 + I_0 + R_0;
S = nearbyint(popsize * S_0 / sum);
I = nearbyint(popsize * I_0 / sum);
R = nearbyint(popsize * R_0 / sum);
H = 0;
"),
params = c(popsize = 500000, Beta = 400, gamma = 26,
mu = 1/50, rho = 0.1, theta = 100, S.0 = 26/400,
I.0 = 0.002, R.0 = 1),
seed = 1914679908L) -> sir1
Notice that we are assuming here that the data are collected weekly and use an Euler step-size of 1/20wk. Here, we have assumed an infectious period of 2wk (1/γ=1/26yr) and a basic reproductive number, R0 of β/(γ+μ)≈15. We have assumed a host population size of 500,000 and 10% reporting efficiency. The plot below shows one realization of this process.
To illustrate the flexibility afforded by pomp’s plug-and-play methods, let us add a bit of real-world complexity to the simple SIR model. We will modify the model to take four facts into account:
To incorporate seasonality, we would like to assume a flexible functional form for β(t). Here, we will use a three-coefficient Fourier series: logβ(t)=b0+b1cos2πt+b2sin2πt.
There are a variety of ways to account for imported infections. Here, we will simply assume that there is some constant number, ι, of infected hosts visiting the population. Putting this together with the seasonal contact rate results in a force of infection λ(t)=β(t)I(t)+ιP.
To incorporate birth-rate information, let us suppose we have data on the number of births occurring each month in this population and that these data are in the form of a data frame birthdat
with columns time
and births
.
We can incorporate the varying birth rate into our model by passing it as a covariate to the simulation code.
When we pass birthdat
as the covar
argument to pomp
, we cause a look-up table to be created and made available to the simulator.
The package employs linear interpolation to provide a value of each variable in the covariate table at any requisite time: from the user’s perspective, a variable births
will simply be available for use by the model codes.
Finally, we can allow for extrademographic stochasticity by allowing the force of infection to be itself a random variable. We will accomplish this by assuming a random phase in β: λ(t)=β(Φ(t))I(t)+ιN, where the phase Φ satisfies the stochastic differential equation dΦ=dt+σdWt, where dW(t) is a white noise, specifically an increment of standard Brownian motion. This model assumption attempts to capture variability in the timing of seasonal changes in transmission rates. As σ varies, it can represent anything from a very mild modulation of the timing of the seasonal progression to much more intense variation.
Let us modify the process-model simulator to incorporate these complexities.
"
seas.sir.step <- double rate[6];
double dN[6];
double Beta;
double dW;
Beta = exp(b1 + b2 * cos(M_2PI * Phi) + b3 * sin(M_2PI * Phi));
rate[0] = births; // birth
rate[1] = Beta * (I + iota) / P; // infection
rate[2] = mu; // death from S
rate[3] = gamma; // recovery
rate[4] = mu; // death from I
rate[5] = mu; // death from R
dN[0] = rpois(rate[0] * dt);
reulermultinom(2, S, &rate[1], dt, &dN[1]);
reulermultinom(2, I, &rate[3], dt, &dN[3]);
reulermultinom(1, R, &rate[5], dt, &dN[5]);
dW = rnorm(dt, sigma * sqrt(dt));
S += dN[0] - dN[1] - dN[2];
I += dN[1] - dN[3] - dN[4];
R += dN[3] - dN[5];
P = S + I + R;
Phi += dW;
H += dN[1];
noise += (dW - dt) / sigma;
"
simulate(
sir2 <-
sir1, rprocess = euler(
step.fun = Csnippet(seas.sir.step), delta.t = 1/52/20
),dmeasure = Csnippet(dmeas),
rmeasure = Csnippet(rmeas),
covar=covariate_table(birthdat, times = "time"),
accumvars = c("H", "noise"),
statenames = c("S", "I", "R", "H", "P", "Phi", "noise"),
paramnames = c("gamma", "mu", "popsize", "rho", "theta",
"sigma", "S.0", "I.0", "R.0",
"b1", "b2", "b3", "iota"),
rinit = Csnippet("
double sum = S_0 + I_0 + R_0;
S = nearbyint(popsize * S_0 / sum);
I = nearbyint(popsize * I_0 / sum);
R = nearbyint(popsize * R_0 / sum);
P = S + I + R;
H = 0;
Phi = 0;
noise = 0;
"),
params = c(popsize = 500000, iota = 5,
b1 = 6, b2 = 0.2, b3 = -0.1,
gamma = 26, mu = 1/50, rho = 0.1, theta = 100,
sigma = 0.3, S.0 = 0.055, I.0 = 0.002, R.0 = 0.94),
seed = 619552910L
)
The next plot shows the simulated data and latent states.
The sir2
object we have constructed here contains all the key elements of models used within pomp to ask and answer a wide range of scientific questions in epidemiology and infectious disease ecology. See the pomp website for a bibliography of publications.
The particle filter, among other things, is an efficient algorithm giving unbiased Monte Carlo estimates of the likelihood.
We’ve seen it used above. To obtain an estimate of the Monte Carlo error on the likelihood, it’s most convenient to simply perform a few independent pfilter
computations.
replicate(10, pfilter(parus,Np=5000))
pf <-plot(pf[[1]])
sapply(pf,logLik)
ll <-logmeanexp(ll,se=TRUE)
## est se
## -555.696199 6.543973
Although our filtering is just barely possible, we can use iterated filtering to rapidly improve the likelihood.
mif2(parus, Nmif=30, Np=1000,
cooling.fraction.50=0.8,cooling.type="geometric",
rw.sd=rw_sd(r=0.02,sigma=0.02,phi=0.02,N_0=ivp(0.1))
mf
) ->
plot(mf)
To evaluate the likelihood, we need to run the particle filter at the new parameter values. Can you see why this is necessary?
replicate(5, pfilter(mf,Np=1000))
pf <- sapply(pf,logLik)
ll <-logmeanexp(ll,se=TRUE)
## est se
## -144.9473163 0.1592354
pomp provides a variety of tools to facilitate model criticism.
At the most basic level, plots of simulations are useful.
simulate(mf,nsim=5,format="data.frame",include.data=TRUE) -> sims
library(ggplot2)
ggplot(sims,
mapping=aes(x=year,y=pop,group=.id,color=.id=="data"))+
geom_line()+
guides(color="none")+
facet_wrap(~.id)
The probe
allows one to evaluate one or more summary statistics (probes) on both data and simulations.
probe(mf, nsim=200,
probes=list(
mean=probe_mean("pop"),
sd=probe_sd("pop"),
probe_acf("pop",transform=sqrt,lags=c(1,2)),
probe_quantile("pop",prob=c(0.2,0.8))
pb
)) ->
plot(pb)
A variety of useful probes (many suggested by Wood (2010)) are provided. It is also easy to apply custom probes.
A reduced-information approach to parameter estimation is afforded by the synthetic likelihood of Wood (2010).
One generates the distribution of a set of probes via simulation. Assuming these are multivariate-normally distributed, a likelihood can be defined.
The pomp function probe_objfun
constructs an objective function that can be used with any numerical optimizer to maximize this likelihood.
probe_objfun(pb, nsim=200, est = c("N_0","r"),
seed = 669237763L) -> pm
library(subplex)
subplex(par=coef(pm,c("N_0","r")),fn=pm) -> fit
pm(fit$par)
## [1] 16.65935
summary(pm)
## $coef
## N_0 r c sigma phi
## 1.4465983 2.5807959 1.0000000 0.3183974 201.0255703
##
## $nsim
## [1] 200
##
## $quantiles
## mean sd acf[1] acf[2] 20% 80%
## 0.510 0.130 0.660 0.830 0.885 0.210
##
## $pvals
## mean sd acf[1] acf[2] 20% 80%
## 0.9850746 0.2686567 0.6865672 0.3482587 0.2288557 0.4278607
##
## $synth.loglik
## [1] -17.03416
plot(pm)
The particle Metropolis-Hastings algorithm of Andrieu et al. (2010) exploits the efficiency and unbiasedness of the particle filter’s likelihood estimates to perform MCMC on the parameters.
This is implemented as pmcmc
in pomp.
"
priorDens <- lik = dnorm(sigma,0.2,1,1)+
dnorm(phi,200,100,1)+
dexp(r,0.1,1);
if (!give_log) lik = exp(lik);
"
pmcmc(pomp(mf, dprior=Csnippet(priorDens),
paramnames=c("sigma","phi","r")),
Nmcmc = 500, Np = 1000,
proposal = mvn_diag_rw(
rw.sd=c(N_0=0.1, sigma=0.02, r=0.02, phi=0.02)
pmh
)) ->
plot(pmh,pars=c("loglik","log.prior","N_0","sigma","r","phi"))
The usual coda diagnostics are readily available. For example,
library(coda)
traces(pmh) -> trace
class(trace)
## [1] "mcmc"
autocorr.diag(trace[,c("r","sigma","phi")])
## r sigma phi
## Lag 0 1.0000000 1.00000000 1.0000000
## Lag 1 0.9701803 0.92349706 0.9957196
## Lag 5 0.8641329 0.71747826 0.9789541
## Lag 10 0.7518992 0.54514098 0.9583805
## Lag 50 -0.1172034 0.09378954 0.7631714
As of version 2.1:
The nonlinear forecasting method, an indirect inference approach, is demonstrated in the JSS paper (King, Nguyen, and Ionides 2016). This paper also demonstrates ABC.
Not all the above have been equally well developed. Contributions and improvements from the community are most welcome!
All of the above are plug-and-play methods. Non plug-and-play methods can also be implemented, via the dprocess
slot.
Read an overview of the package as a whole with links to its main features by doing package?pomp
. A complete index of the functions in pomp is returned by the command library(help=pomp)
. Finally, the home page for the pomp
project is https://kingaa.github.io/pomp/; there you have access to the complete source code, tutorials, manuals, a news blog, a bibliography of publications using pomp, a facility for reporting issues and making help requests, etc.
From the developer’s point of view, the key aspect of pomp is its low-level interface. This consists of R and C functions that give access to the basic model components rprocess
, dprocess
, rmeasure
, dmeasure
, initializer
, rprior
, and dprior
.
To see these in action, we examine the following ancient piece of pomp code.
function (object, params, Np,
pfilter <-tol = 1e-17, warn = TRUE,
max.fail = 0,
...) {if (missing(Np))
NCOL(params)
Np <- length(time(object))
ntimes <-if (is.null(dim(params))) {
matrix(params,nrow=length(params),ncol=Np,
params <-dimnames=list(names(params),NULL))
} nrow(params)
npars <- rownames(params)
paramnames <-if (is.null(paramnames))
stop("pfilter error: 'params' must have rownames")
## sample from the X0 distribution
init.state(object,params=params,t0=object@t0)
xstart <- rownames(xstart)
statenames <- nrow(xstart)
nvars <-
rep(NA,ntimes)
loglik <- rep(NA,ntimes)
eff.sample.size <- 0
nfail <-
c(object@t0,time(object))
times <- xstart
x <-
for (nt in seq(length=ntimes)) {
## advance the state variables according to the process model
try(
X <-rprocess(object,x=x,times=times[c(nt,nt+1)],
params=params)[,,2,drop=FALSE],
silent=FALSE
)if (inherits(X,'try-error'))
stop("pfilter error: process simulation error")
X # drop the third dimension
x[,] <-
## determine the weights
try(
weights <-dmeasure(object,y=object@data[,nt,drop=FALSE],x=X,
times=times[nt+1],params=params),
silent=FALSE
)if (inherits(weights,'try-error'))
stop("pfilter error: error in calculation of weights")
if (any(is.na(weights))) {
stop("pfilter error: dmeasure returns NA")
}
## test for failure to filter
dim(weights) <- NULL
weights < tol
failures <- all(failures)
all.fail <-if (all.fail) { # all particles are lost
if (warn)
message("filtering failure at time t = ",times[nt+1])
nfail+1
nfail <-if (nfail > max.fail)
stop('pfilter error: too many filtering failures')
log(tol) # worst log-likelihood
loglik[nt] <- rep(1/Np,Np)
weights <- 0
eff.sample.size[nt] <-else { # not all particles are lost
} ## compute log-likelihood
log(mean(weights))
loglik[nt] <- 0
weights[failures] <- weights/sum(weights)
weights <-## compute effective sample-size
1/(weights%*%weights)
eff.sample.size[nt] <-
}
if (!all.fail) {
.Call(systematic_resampling,weights)
sample <- x[,sample,drop=FALSE]
x <- params[,sample,drop=FALSE]
params <-
}
}
list(
pred.mean=pred.m,
pred.var=pred.v,
filter.mean=filt.m,
eff.sample.size=eff.sample.size,
cond.loglik=loglik,
nfail=nfail,
loglik=sum(loglik)
) }
To go deeper, we’ll go to the source.
This document draws on King et al. (2016) and on materials from the “Simulation-based Inference for Epidemiological Dynamics” course taught by E. L. Ionides and A.A.K.
Produced in R version 4.4.1 using pomp version 5.10.
Anderson JL (2001). “An Ensemble Adjustment Kalman Filter for Data Assimilation.” Mon Wea Rev, 129(12), 2884–2903. https://doi.org/10.1175/1520-0493(2001)129<2884:aeakff>2.0.co;2.
Anderson RM, May RM (1991). Infectious Diseases of Humans: Dynamics and Control. Oxford University Press, Oxford. https://doi.org/10.1093/oso/9780198545996.001.0001.
Andrieu C, Doucet A, Holenstein R (2010). “Particle Markov Chain Monte Carlo Methods.” J R Stat Soc B, 72(3), 269–342. https://doi.org/10.1111/j.1467-9868.2009.00736.x.
Bretó C, Ionides EL (2011). “Compound Markov Counting Processes and Their Applications to Modeling Infinitesimally over-Dispersed Systems.” Stochastic Process Appl, 121(11), 2571–2591. https://doi.org/10.1016/j.spa.2011.07.005.
Ellner SP, Bailey BA, Bobashev GV, Gallant AR, Grenfell BT, Nychka DW (1998). “Noise and Nonlinearity in Measles Epidemics: Combining Mechanistic and Statistical Approaches to Population Modeling.” Am Nat, 151, 425–440. https://doi.org/10.1086/286130.
Evensen G (1994). “Sequential Data Assimilation with a Nonlinear Quasi-Geostrophic Model Using Monte Carlo Methods to Forecast Error Statistics.” J Geophys Res Oceans, 99(C5), 10143–10162. https://doi.org/10.1029/94JC00572.
Evensen G (2009). Data Assimilation: The Ensemble Kalman Filter. Springer, Dordrecht New York. https://doi.org/10.1007/978-3-642-03711-5.
Gillespie DT (1977). “Exact Stochastic Simulation of Coupled Chemical Reactions.” J Phys Chem, 81, 2340–2361. https://doi.org/10.1021/j100540a008.
Ionides EL, Bretó C, King AA (2006). “Inference for Nonlinear Dynamical Systems.” Proc Natl Acad Sci, 103(49), 18438–18443. https://doi.org/10.1073/pnas.0603181103.
Ionides EL, Nguyen D, Atchadé Y, Stoev S, King AA (2015). “Inference for Dynamic and Latent Variable Models via Iterated, Perturbed Bayes Maps.” Proc Natl Acad Sci, 112(3), 719–724. https://doi.org/10.1073/pnas.1410597112.
Keeling MJ, Rohani P (2009). Modeling Infectious Diseases in Humans and Animals. Princeton University Press, Princeton, NJ. https://doi.org/10.2307/j.ctvcm4gk0.
Kendall BE, Briggs CJ, Murdoch WW, Turchin P, Ellner SP, McCauley E, Nisbet RM, Wood SN (1999). “Why Do Populations Cycle? A Synthesis of Statistical and Mechanistic Modeling Approaches.” Ecology, 80(6), 1789–1805. https://doi.org/10.2307/176658.
Kendall BE, Ellner SP, McCauley E, Wood SN, Briggs CJ, Murdoch WW, Turchin P (2005). “Population Cycles in the Pine Looper Moth: Dynamical Tests of Mechanistic Hypotheses.” Ecol Monogr, 75(2), 259–276. https://doi.org/10.1890/03-4056.
Kermack W, McKendrick A (1927). “A Contribution to the Mathematical Theory of Epidemics.” Proc R Soc Lond A, 115, 700–721. https://doi.org/10.1098/rspa.1927.0118.
King AA, Nguyen D, Ionides EL (2016). “Statistical Inference for Partially Observed Markov Processes via the R Package Pomp.” J Stat Softw, 69(12), 1–43. https://doi.org/10.18637/jss.v069.i12.
Liu J, West M (2001). “Combining Parameter and State Estimation in Simulation-Based Filtering.” In Sequential Monte Carlo Methods in Practice, eds. A Doucet, N de Freitas, and NJ Gordon, 197–224. Springer, New York. https://doi.org/10.1007/978-1-4757-3437-9_10.
McCleery RH, Perrins CM (1991). “Effects of Predation on the Numbers of Great Tits Parus Major.” In Bird population studies. Relevence to conservation and management 129–147. Oxford University Press, Oxford. https://doi.org/10.1093/oso/9780198577300.003.0006.
Reuman DC, Desharnais RA, Costantino RF, Ahmad OS, Cohen JE (2006). “Power Spectra Reveal the Influence of Stochasticity on Nonlinear Population Dynamics.” Proc Natl Acad Sci, 103(49), 18860–18865. https://doi.org/10.1073/pnas.0608571103.
Ricker WE (1954). “Stock and Recruitment.” J Fish Res Board Can, 11, 559–623. https://doi.org/10.1139/f54-039.
Wood SN (2010). “Statistical Inference for Noisy Nonlinear Ecological Dynamic Systems.” Nature, 466(7310), 1102–1104. https://doi.org/10.1038/nature09319.