Produced in R version 3.5.3 using pomp version 1.19.

## Partially observed Markov process (POMP) models

• Data $$y^*_1,\dots,y^*_N$$ collected at times $$t_1<\dots<t_N$$ are modeled as noisy, incomplete, and indirect observations of a Markov process $$\{X(t), t\ge t_0\}$$.

• This is a partially observed Markov process (POMP) model, also known as a hidden Markov model or a state space model.

• The POMP class of models can accommodate a variety of complexities that commonly arise, especially in biological models:
• latent variables
• nonlinear dynamics
• non-Gaussian distributions
• continuous-time models (as well as discrete-time models)
• intractable likelihoods
• non-differentiable models

## Goals

The R package pomp provides

• facilities for modeling POMPs,
• a toolbox of statistical inference methods for analyzing data using POMPs, and
• a development platform for implementing new POMP inference methods.

The goals of the pomp project are to:

1. facilitate scientific progress by providing high quality, general purpose, reproducible algorithms for statistical inference on POMPs.
2. help separate model issues from method issues to allow one to accurately distinguish model failure from method failure, method improvement from model improvement, etc.
3. provide a test-bed for the development and implementation of new inference algorithms by simplifying the model interface and providing a plethora of benchmarks.
4. exploit potential synergies afforded by hybrid approaches and cross-fertilization of ideas.

Our goal in this presentation is two-fold:

To demonstrate and explain the package…

1. …from the point of view of the user interested in application of the included methods to a specific data analysis based on a specific set of models.
2. …from the point of view of a methods developer, who wishes to exploit the package’s structure write new methods in such a way that they can be applied to a POMP models generally.

#### Notation for partially observed Markov process models

• Write $$X_n=X(t_n)$$ and $$X_{0:N}=(X_0,\dots,X_N)$$. Let $$Y_n$$ be a random variable modeling the observation at time $$t_n$$.

• The one-step transition density, $$f_{X_n|X_{n-1}}(x_n|x_{n-1};\theta)$$, together with the measurement density, $$f_{Y_n|X_n}(y_n|x_n;\theta)$$ and the initial density, $$f_{X_0}(x_0;\theta)$$, specify the entire joint density via

$f_{X_{0:N},Y_{1:N}}(x_{0:N},y_{1:N};\theta) = f_{X_0}(x_0;\theta)\,\prod_{n=1}^N\!f_{X_n | X_{n-1}}(x_n|x_{n-1};\theta)\,f_{Y_n|X_n}(y_n|x_n;\theta).$

• The marginal density for sequence of measurements, $$Y_{1:N}$$, evaluated at the data, $$y_{1:N}^*$$, is

$f_{Y_{1:N}}(y^*_{1:N};\theta)=\int f_{X_{0:N},Y_{1:N}}(x_{0:N},y^*_{1:N};\theta)\, dx_{0:N}.$

### A POMP model schematic

• In the following diagram, arrows show direct dependence among model variables:

• The state process, $$X_n$$, is Markovian, i.e.,

$f_{X_n|X_{0:n-1},Y_{1:n-1}}(x_n|x_{0:n-1},y_{1:n-1})=f_{X_n|X_{n-1}}(x_n|x_{n-1}).$

• Moreover, the measurable random variable, $$Y_n$$, depends only on the state at that time: $f_{Y_n|X_{0:N},Y_{1:n-1}}(y_n|x_{0:n},y_{1:n-1})=f_{Y_n|X_{n}}(y_n|x_n),$ for all $$n=1,\dots,N$$.

• Observations times $$t_n$$ need not be regularly spaced.

### POMP models viewed algorithmically

To think algorithmically, we define some function calls:

• rprocess( ): a draw from $$f_{X_n|X_{n-1}}(x_n| x_{n-1};\theta)$$

• dprocess( ): evaluation of $$f_{X_n|X_{n-1}}(x_n| x_{n-1};\theta)$$

• rmeasure( ): a draw from $$f_{Y_n|X_n}(y_n| x_n;\theta)$$

• dmeasure( ): evaluation of $$f_{Y_n|X_n}(y_n| x_n;\theta)$$

• initializer( ): a draw from $$f_{X_0}(x_0;\theta)$$

#### The pomp package for POMP models

• pomp is an R package for data analysis using partially observed Markov process (POMP) models.

• Note the distinction: lower case pomp is a software package; upper case POMP is a class of models.

• pomp builds methodology for POMP models in terms of arbitrary user-specified rprocess(), dprocess(), rmeasure(), dmeasure(), and initializer functions.

#### What does it mean for methodology to be simulation-based?

• Simulating random processes is often much easier than evaluating their transition probabilities.

• In other words, having formulated a scientifically interesting model, we frequently find ourselves able to write rprocess() but not dprocess().

• Simulation-based methods require the user to specify rprocess() but not dprocess().

• Plug-and-play, likelihood-free and equation-free are alternative terms for “simulation-based” methods.

• Much development of simulation-based statistical methodology has occurred in the past decade.

• The algorithms in pomp at the present moment are all plug-and-play in this sense. As we will see, there is great scope for inclusion of new general methods within or on top of the package.

#### Algorithms currently implemented in pomp

As of v. 1.14.

• classical trajectory matching
• basic particle filtering (AKA sequential importance sampling or sequential Monte Carlo)
• the approximate Bayesian sequential Monte Carlo algorithm of Liu and West (2001)
• the particle Markov chain Monte Carlo method of Andrieu et al. (2010)
• approximate Bayesian computation (ABC) of e.g., Toni et al. (2009)
• the iterated filtering method of Ionides et al. (2006)
• the improved iterated filtering method of Ionides et al. (2015)
• probe-matching methods (Kendall et al. 1999, Wood 2010)
• the nonlinear forecasting method (Ellner et al. 1998, Kendall et al. 2005)
• the ensemble Kalman filter (Evensen 1994, 2009)
• the ensemble adjustment Kalman filter of Anderson (2001)
• power-spectrum-matching methods of Reuman et al. (2006).

All of the above are plug-and-play methods.

## Using pomp

### A first example: the stochastic Ricker map

#### The unobserved state process

• The Ricker (1954) map describes the stochastic dynamics of a simple population, $N_{t+1} = r\,N_{t}\,\exp(-c\,N_{t}+\varepsilon_{t}), \qquad \varepsilon_{t}\;\sim\;{\mathrm{Normal}\left(0,\sigma\right)},$
• $$N_t$$ is the population density at time $$t$$.
• $$r$$ is a fixed value (a parameter) describing the population’s intrinsic capacity to increase in one unit of time.
• $$\sigma$$ is the standard deviation of the noise process $$\varepsilon$$.
• The parameter $$c$$ scales the density-dependent population regulation.
• The equilibrium population is $$N_t=\log(r)/c$$.
• $$N$$ is a state variable, $$r$$ and $$c$$ are parameters. $$r$$ is dimensionless and $$c$$ has units of inverse density.

• For simplicity, we will fix $$c=1$$ for the remainder of this document.

• We can view the initial condition, $$N_0$$ as a special kind of parameter, an initial-value parameter.

The Ricker model is perhaps the simplest useful nonlinear population dynamics model. As such, it allows us to introduce the basic features of the pomp package with a minimum amount of distracting complexity. We will examine a more complex model below.

#### Measurement error

• Let’s suppose that the Ricker model is our model for the dynamics of a real population.

• However, we cannot know the exact population density at any time, but only estimate it through sampling.

• Let’s model measurement error by assuming the observed measurement, $$y_t$$, is modeled as a realization of a random variable $$Y_t$$ that is Poisson with mean $$\phi\,N_t$$: $Y_{t}\;\sim\;{\mathrm{Poisson}\left(\phi\,N_{t}\right)}$

• In this equation,

1. $$N_t$$ models the true population density at time $$t$$,
2. $$Y_t$$ models the number of individuals sampled at time $$t$$,
3. the parameter $$\phi$$ is proportional to our sampling effort.
4. $$Y_t$$ is dimensionless, so $$\phi N_t$$ must also be dimensionless.

To map this model onto the general framework discussed above, we have $\begin{gathered} X_t = (N_t) \qquad \text{or} \qquad X_t = (N_t, \varepsilon_t), \\ Y_t = Y_t, \qquad \theta = (r, c, \phi, N_0) \end{gathered}$

#### Putting the Ricker model into pomp.

• The basic data-structure provided by pomp is the object of class pomp, alternatively known as a “pomp object”.

• It is a container that holds real or simulated data and a POMP model, possibly together with other information such as model parameters and covariates.

A real pomp data analysis begins with constructing one or more pomp objects to hold the data and the model or models under consideration. Here, we’ll illustrate this process a dataset of Parus major abundance in Wytham Wood, near Oxford (McCleery and Perrins 1991).

loc <- url("https://kingaa.github.io/pomp/vignettes/parus.csv")
head(dat)
##   year pop
## 1 1960 148
## 2 1961 258
## 3 1962 185
## 4 1963 170
## 5 1964 267
## 6 1965 239
plot(pop~year,data=dat,type='o')

The call to construct a pomp object is, naturally enough, pomp. Documentation on this function can be had by doing ?pomp.

Now, to construct our pomp object:

library(pomp)
parus <- pomp(dat,times="year",t0=1959)

The times argument specifies that the column of dat labelled “year” gives the measurement times; t0 is the “zero-time”, the time at which the state process will be initialized. We’ve set it to one year prior to the beginning of the data. Plot the new pomp object:

plot(parus)

##### Adding in the process model simulator and initializer

We can add the stochastic Ricker model to parus by writing a procedure that simulates one realization of the stochastic process, from an arbitary time $$t$$ to $$t+1$$, given arbitrary states and parameters. We provide this to pomp in the form of a Csnippet, a little snippet of C code that performs the computation. The following does this.

stochStep <- Csnippet("
N = r*N*exp(-c*N+rnorm(0,sigma));
")

pomp(
parus,
rprocess=discrete.time.sim(step.fun=stochStep,delta.t=1),
initializer=Csnippet("N = N_0;"),
paramnames=c("r","c","sigma","N_0"),
statenames=c("N")
) -> parus

Note that in the above, we use thernorm function from the R API. In general any C function provided by R is available to you. pomp also provides a number of C functions that are documented in the header file, pomp.h, that is installed with the package. See the Csnippet documentation (?Csnippet) to read more about how to write them. Note too that we use discrete.time.sim here because the model is a stochastic map. We specify that the time step of the discrete-time process is delta.t, here, 1 yr.

At this point, we have what we need to simulate the state process:

sim <- simulate(parus, params=c(N_0=1,r=12,c=1,sigma=0.5),
as.data.frame=TRUE, states=TRUE)

plot(N~time,data=sim,type='o')

##### Adding in the measurement model and parameters

We complete the specification of the POMP by specifying the measurement model. To obtain the Poisson measurement model described above, we write two Csnippets. The first simulates:

rmeas <- Csnippet("pop = rpois(phi*N);")

The second computes the likelihood of observing pop birds given a true density of N:

dmeas <- Csnippet("lik = dpois(pop,phi*N,give_log);")

In the above, rpois and dpois again come from R’s C API. Note the give_log argument. When this code is evaluated, give_log will be set to 1 if the log likelihood is desired, and 0 else.

We fold these into the pomp object:

pomp(parus,
rmeasure=rmeas,
dmeasure=dmeas,
statenames=c("N"),
paramnames=c("phi")
) -> parus

Now we can simulate the whole POMP. First, let’s add some parameters:

coef(parus) <- c(N_0=1,r=20,c=1,sigma=0.1,phi=200)
library(ggplot2)

sims <- simulate(parus,nsim=3,as.data.frame=TRUE,include.data=TRUE)

ggplot(data=sims,
mapping=aes(x=time,y=pop))+
geom_line()+
facet_wrap(~sim,ncol=1,scales="free_y")