Licensed under the Creative Commons Attribution-NonCommercial license. Please share and remix noncommercially, mentioning its origin.

**Note:** This document has been updated for **pomp** version 4.

Data \(y^*_1,\dots,y^*_N\) collected at times \(t_1<\dots<t_N\) are modeled as noisy, incomplete, and indirect observations of a Markov process \(\{X(t), t\ge t_0\}\).

This is a

**partially observed Markov process (POMP)**model, also known as a hidden Markov model or a state space model.The POMP class of models can accommodate a variety of complexities that commonly arise, especially in biological models:

- latent variables
- nonlinear dynamics
- non-Gaussian distributions
- continuous-time models (as well as discrete-time models)
- intractable likelihoods
- non-differentiable models

The **R** package **pomp** provides

- facilities for modeling POMPs,
- a toolbox of statistical inference methods for analyzing data using POMPs, and
- a development platform for implementing new POMP inference methods.

The goals of the **pomp** project are to:

- facilitate scientific progress by providing high quality, general purpose, reproducible algorithms for statistical inference on POMPs.
- help separate model issues from method issues to allow one to accurately distinguish model failure from method failure, method improvement from model improvement, etc.
- provide a test-bed for the development and implementation of new inference algorithms by simplifying the model interface and providing a plethora of benchmarks.
- exploit potential synergies afforded by hybrid approaches and cross-fertilization of ideas.

Our goal in this presentation is two-fold:

To demonstrate and explain the package…

- …from the point of view of the user interested in application of the included methods to a specific data analysis based on a specific set of models.
- …from the point of view of a methods developer, who wishes to exploit the package’s structure write new methods in such a way that they can be applied to a POMP models generally.

Write \(X_n=X(t_n)\) and \(X_{0:N}=(X_0,\dots,X_N)\). Let \(Y_n\) be a random variable modeling the observation at time \(t_n\).

The one-step transition density, \(f_{X_n|X_{n-1}}(x_n|x_{n-1};\theta)\), together with the measurement density, \(f_{Y_n|X_n}(y_n|x_n;\theta)\) and the initial density, \(f_{X_0}(x_0;\theta)\), specify the entire joint density via

\[f_{X_{0:N},Y_{1:N}}(x_{0:N},y_{1:N};\theta) = f_{X_0}(x_0;\theta)\,\prod_{n=1}^N\!f_{X_n | X_{n-1}}(x_n|x_{n-1};\theta)\,f_{Y_n|X_n}(y_n|x_n;\theta).\]

- The marginal density for sequence of measurements, \(Y_{1:N}\), evaluated at the data, \(y_{1:N}^*\), is

\[f_{Y_{1:N}}(y^*_{1:N};\theta)=\int f_{X_{0:N},Y_{1:N}}(x_{0:N},y^*_{1:N};\theta)\, dx_{0:N}.\]

- In the following diagram, arrows show direct dependence among model variables:

- The state process, \(X_n\), is Markovian, i.e.,

\[f_{X_n|X_{0:n-1},Y_{1:n-1}}(x_n|x_{0:n-1},y_{1:n-1})=f_{X_n|X_{n-1}}(x_n|x_{n-1}).\]

Moreover, the measurable random variable, \(Y_n\), depends only on the state at that time: \[f_{Y_n|X_{0:N},Y_{1:n-1}}(y_n|x_{0:n},y_{1:n-1})=f_{Y_n|X_{n}}(y_n|x_n),\] for all \(n=1,\dots,N\).

Observations times \(t_n\) need not be regularly spaced.

To think algorithmically, we define some function calls:

`rprocess( )`

: a draw from \(f_{X_n|X_{n-1}}(x_n| x_{n-1};\theta)\)`dprocess( )`

: evaluation of \(f_{X_n|X_{n-1}}(x_n| x_{n-1};\theta)\)`rmeasure( )`

: a draw from \(f_{Y_n|X_n}(y_n| x_n;\theta)\)`dmeasure( )`

: evaluation of \(f_{Y_n|X_n}(y_n| x_n;\theta)\)`initializer( )`

: a draw from \(f_{X_0}(x_0;\theta)\)

**pomp**is an**R**package for data analysis using partially observed Markov process (POMP) models.Note the distinction: lower case

**pomp**is a software package; upper case POMP is a class of models.**pomp**builds methodology for POMP models in terms of arbitrary user-specified`rprocess()`

,`dprocess()`

,`rmeasure()`

,`dmeasure()`

, and`initializer`

functions.

Simulating random processes is often much easier than evaluating their transition probabilities.

In other words, having formulated a scientifically interesting model, we frequently find ourselves able to write

`rprocess()`

but not`dprocess()`

.**Simulation-based**methods require the user to specify`rprocess()`

but not`dprocess()`

.**Plug-and-play**,**likelihood-free**and**equation-free**are alternative terms for “simulation-based” methods.Much development of simulation-based statistical methodology has occurred in the past decade.

The algorithms in

**pomp**at the present moment are all plug-and-play in this sense. As we will see, there is great scope for inclusion of new general methods within or on top of the package.

As of v. 1.14.

- classical trajectory matching
- basic particle filtering (AKA sequential importance sampling or sequential Monte Carlo)
- the approximate Bayesian sequential Monte Carlo algorithm of Liu and West (2001)
- the particle Markov chain Monte Carlo method of Andrieu, Doucet, and Holenstein (2010)
- approximate Bayesian computation (ABC) of e.g., (
**???**) - the iterated filtering method of Ionides, Bretó, and King (2006)
- the improved iterated filtering method of Ionides, Nguyen, Atchadé, Stoev, and King (2015)
- probe-matching methods (Kendall et al. 1999; Wood 2010)
- the nonlinear forecasting method (Ellner et al. 1998; Kendall et al. 2005)
- the ensemble Kalman filter (Evensen 1994, 2009)
- the ensemble adjustment Kalman filter of Anderson (2001)
- power-spectrum-matching methods of Reuman, Desharnais, Costantino, Ahmad, and Cohen (2006).

All of the above are plug-and-play methods.

The Ricker (1954) map describes the stochastic dynamics of a simple population, \[N_{t+1} = r\,N_{t}\,\exp(-c\,N_{t}+\varepsilon_{t}), \qquad \varepsilon_{t}\;\sim\;\mathrm{Normal}\left(0,\sigma\right),\]

- \(N_t\) is the population density at time \(t\).
- \(r\) is a fixed value (a parameter) describing the population’s intrinsic capacity to increase in one unit of time.
- \(\sigma\) is the standard deviation of the noise process \(\varepsilon\).
- The parameter \(c\) scales the density-dependent population regulation.
- The equilibrium population is \(N_t=\log(r)/c\).

\(N\) is a

*state variable*, \(r\) and \(c\) are*parameters*. \(r\) is dimensionless and \(c\) has units of inverse density.For simplicity, we will fix \(c=1\) for the remainder of this document.

We can view the initial condition, \(N_0\) as a special kind of parameter, an

*initial-value parameter*.

The Ricker model is perhaps the simplest useful nonlinear population dynamics model. As such, it allows us to introduce the basic features of the **pomp** package with a minimum amount of distracting complexity. We will examine a more complex model below.

Let’s suppose that the Ricker model is our model for the dynamics of a real population.

However, we cannot know the exact population density at any time, but only estimate it through sampling.

Let’s model measurement error by assuming the observed measurement, \(y_t\), is modeled as a realization of a random variable \(Y_t\) that is Poisson with mean \(\phi\,N_t\): \[Y_{t}\;\sim\;\mathrm{Poisson}\left(\phi\,N_{t}\right)\]

In this equation,

- \(N_t\) models the true population density at time \(t\),
- \(Y_t\) models the number of individuals sampled at time \(t\),
- the parameter \(\phi\) is proportional to our sampling effort.
- \(Y_t\) is dimensionless, so \(\phi N_t\) must also be dimensionless.

To map this model onto the general framework discussed above, we have \[\begin{gathered} X_t = (N_t) \qquad \text{or} \qquad X_t = (N_t, \varepsilon_t), \\ Y_t = Y_t, \qquad \theta = (r, c, \phi, N_0) \end{gathered}\]

The basic data-structure provided by

**pomp**is the*object of class*`pomp`

, alternatively known as a “pomp object”.It is a container that holds real or simulated data and a POMP model, possibly together with other information such as model parameters and covariates.

A real **pomp** data analysis begins with constructing one or more `pomp`

objects to hold the data and the model or models under consideration. Here, we’ll illustrate this process a dataset of *Parus major* abundance in Wytham Wood, near Oxford (McCleery and Perrins 1991).

Download and plot the data:

```
url("https://kingaa.github.io/pomp/vignettes/parus.csv")
loc <- read.csv(loc)
dat <-head(dat)
```

`plot(pop~year,data=dat,type='o')`

The call to construct a `pomp`

object is, naturally enough, `pomp`

. Documentation on this function can be had by doing `?pomp`

.

Now, to construct our `pomp`

object:

```
library(pomp)
pomp(dat,times="year",t0=1959) parus <-
```

The `times`

argument specifies that the column of `dat`

labelled “year” gives the measurement times; `t0`

is the “zero-time”, the time at which the state process will be initialized. We’ve set it to one year prior to the beginning of the data. Plot the new `pomp`

object:

`plot(parus)`