Licensed under the Creative Commons Attribution-NonCommercial license. Please share and remix noncommercially, mentioning its origin.

Produced in **R** version 3.5.3 using **pomp** version 1.19.

Data \(y^*_1,\dots,y^*_N\) collected at times \(t_1<\dots<t_N\) are modeled as noisy, incomplete, and indirect observations of a Markov process \(\{X(t), t\ge t_0\}\).

This is a

**partially observed Markov process (POMP)**model, also known as a hidden Markov model or a state space model.- The POMP class of models can accommodate a variety of complexities that commonly arise, especially in biological models:
- latent variables
- nonlinear dynamics
- non-Gaussian distributions
- continuous-time models (as well as discrete-time models)
- intractable likelihoods
- non-differentiable models

The **R** package **pomp** provides

- facilities for modeling POMPs,
- a toolbox of statistical inference methods for analyzing data using POMPs, and
- a development platform for implementing new POMP inference methods.

The goals of the **pomp** project are to:

- facilitate scientific progress by providing high quality, general purpose, reproducible algorithms for statistical inference on POMPs.
- help separate model issues from method issues to allow one to accurately distinguish model failure from method failure, method improvement from model improvement, etc.
- provide a test-bed for the development and implementation of new inference algorithms by simplifying the model interface and providing a plethora of benchmarks.
- exploit potential synergies afforded by hybrid approaches and cross-fertilization of ideas.

Our goal in this presentation is two-fold:

To demonstrate and explain the package…

- …from the point of view of the user interested in application of the included methods to a specific data analysis based on a specific set of models.
- …from the point of view of a methods developer, who wishes to exploit the package’s structure write new methods in such a way that they can be applied to a POMP models generally.

Write \(X_n=X(t_n)\) and \(X_{0:N}=(X_0,\dots,X_N)\). Let \(Y_n\) be a random variable modeling the observation at time \(t_n\).

The one-step transition density, \(f_{X_n|X_{n-1}}(x_n|x_{n-1};\theta)\), together with the measurement density, \(f_{Y_n|X_n}(y_n|x_n;\theta)\) and the initial density, \(f_{X_0}(x_0;\theta)\), specify the entire joint density via

\[f_{X_{0:N},Y_{1:N}}(x_{0:N},y_{1:N};\theta) = f_{X_0}(x_0;\theta)\,\prod_{n=1}^N\!f_{X_n | X_{n-1}}(x_n|x_{n-1};\theta)\,f_{Y_n|X_n}(y_n|x_n;\theta).\]

- The marginal density for sequence of measurements, \(Y_{1:N}\), evaluated at the data, \(y_{1:N}^*\), is

\[f_{Y_{1:N}}(y^*_{1:N};\theta)=\int f_{X_{0:N},Y_{1:N}}(x_{0:N},y^*_{1:N};\theta)\, dx_{0:N}.\]

- In the following diagram, arrows show direct dependence among model variables:

- The state process, \(X_n\), is Markovian, i.e.,

\[f_{X_n|X_{0:n-1},Y_{1:n-1}}(x_n|x_{0:n-1},y_{1:n-1})=f_{X_n|X_{n-1}}(x_n|x_{n-1}).\]

Moreover, the measurable random variable, \(Y_n\), depends only on the state at that time: \[f_{Y_n|X_{0:N},Y_{1:n-1}}(y_n|x_{0:n},y_{1:n-1})=f_{Y_n|X_{n}}(y_n|x_n),\] for all \(n=1,\dots,N\).

Observations times \(t_n\) need not be regularly spaced.

To think algorithmically, we define some function calls:

`rprocess( )`

: a draw from \(f_{X_n|X_{n-1}}(x_n| x_{n-1};\theta)\)`dprocess( )`

: evaluation of \(f_{X_n|X_{n-1}}(x_n| x_{n-1};\theta)\)`rmeasure( )`

: a draw from \(f_{Y_n|X_n}(y_n| x_n;\theta)\)`dmeasure( )`

: evaluation of \(f_{Y_n|X_n}(y_n| x_n;\theta)\)`initializer( )`

: a draw from \(f_{X_0}(x_0;\theta)\)

**pomp**is an**R**package for data analysis using partially observed Markov process (POMP) models.Note the distinction: lower case

**pomp**is a software package; upper case POMP is a class of models.**pomp**builds methodology for POMP models in terms of arbitrary user-specified`rprocess()`

,`dprocess()`

,`rmeasure()`

,`dmeasure()`

, and`initializer`

functions.

Simulating random processes is often much easier than evaluating their transition probabilities.

In other words, having formulated a scientifically interesting model, we frequently find ourselves able to write

`rprocess()`

but not`dprocess()`

.**Simulation-based**methods require the user to specify`rprocess()`

but not`dprocess()`

.**Plug-and-play**,**likelihood-free**and**equation-free**are alternative terms for “simulation-based” methods.Much development of simulation-based statistical methodology has occurred in the past decade.

The algorithms in

**pomp**at the present moment are all plug-and-play in this sense. As we will see, there is great scope for inclusion of new general methods within or on top of the package.

As of v. 1.14.

- classical trajectory matching
- basic particle filtering (AKA sequential importance sampling or sequential Monte Carlo)
- the approximate Bayesian sequential Monte Carlo algorithm of Liu and West (2001)
- the particle Markov chain Monte Carlo method of Andrieu et al. (2010)
- approximate Bayesian computation (ABC) of e.g., Toni et al. (2009)
- the iterated filtering method of Ionides et al. (2006)
- the improved iterated filtering method of Ionides et al. (2015)
- probe-matching methods (Kendall et al. 1999, Wood 2010)
- the nonlinear forecasting method (Ellner et al. 1998, Kendall et al. 2005)
- the ensemble Kalman filter (Evensen 1994, 2009)
- the ensemble adjustment Kalman filter of Anderson (2001)
- power-spectrum-matching methods of Reuman et al. (2006).

All of the above are plug-and-play methods.

- The Ricker (1954) map describes the stochastic dynamics of a simple population, \[N_{t+1} = r\,N_{t}\,\exp(-c\,N_{t}+\varepsilon_{t}), \qquad \varepsilon_{t}\;\sim\;{\mathrm{Normal}\left(0,\sigma\right)},\]
- \(N_t\) is the population density at time \(t\).
- \(r\) is a fixed value (a parameter) describing the population’s intrinsic capacity to increase in one unit of time.
- \(\sigma\) is the standard deviation of the noise process \(\varepsilon\).
- The parameter \(c\) scales the density-dependent population regulation.
- The equilibrium population is \(N_t=\log(r)/c\).

\(N\) is a

*state variable*, \(r\) and \(c\) are*parameters*. \(r\) is dimensionless and \(c\) has units of inverse density.For simplicity, we will fix \(c=1\) for the remainder of this document.

We can view the initial condition, \(N_0\) as a special kind of parameter, an

*initial-value parameter*.

The Ricker model is perhaps the simplest useful nonlinear population dynamics model. As such, it allows us to introduce the basic features of the **pomp** package with a minimum amount of distracting complexity. We will examine a more complex model below.

Let’s suppose that the Ricker model is our model for the dynamics of a real population.

However, we cannot know the exact population density at any time, but only estimate it through sampling.

Let’s model measurement error by assuming the observed measurement, \(y_t\), is modeled as a realization of a random variable \(Y_t\) that is Poisson with mean \(\phi\,N_t\): \[Y_{t}\;\sim\;{\mathrm{Poisson}\left(\phi\,N_{t}\right)}\]

In this equation,

- \(N_t\) models the true population density at time \(t\),
- \(Y_t\) models the number of individuals sampled at time \(t\),
- the parameter \(\phi\) is proportional to our sampling effort.
- \(Y_t\) is dimensionless, so \(\phi N_t\) must also be dimensionless.

To map this model onto the general framework discussed above, we have \[\begin{gathered} X_t = (N_t) \qquad \text{or} \qquad X_t = (N_t, \varepsilon_t), \\ Y_t = Y_t, \qquad \theta = (r, c, \phi, N_0) \end{gathered}\]

The basic data-structure provided by

**pomp**is the*object of class*`pomp`

, alternatively known as a “pomp object”.It is a container that holds real or simulated data and a POMP model, possibly together with other information such as model parameters and covariates.

A real **pomp** data analysis begins with constructing one or more `pomp`

objects to hold the data and the model or models under consideration. Here, we’ll illustrate this process a dataset of *Parus major* abundance in Wytham Wood, near Oxford (McCleery and Perrins 1991).

Download and plot the data:

```
loc <- url("https://kingaa.github.io/pomp/vignettes/parus.csv")
dat <- read.csv(loc)
head(dat)
```

```
## year pop
## 1 1960 148
## 2 1961 258
## 3 1962 185
## 4 1963 170
## 5 1964 267
## 6 1965 239
```

`plot(pop~year,data=dat,type='o')`

The call to construct a `pomp`

object is, naturally enough, `pomp`

. Documentation on this function can be had by doing `?pomp`

.

Now, to construct our `pomp`

object:

```
library(pomp)
parus <- pomp(dat,times="year",t0=1959)
```

The `times`

argument specifies that the column of `dat`

labelled “year” gives the measurement times; `t0`

is the “zero-time”, the time at which the state process will be initialized. We’ve set it to one year prior to the beginning of the data. Plot the new `pomp`

object:

`plot(parus)`

We can add the stochastic Ricker model to `parus`

by writing a procedure that simulates one realization of the stochastic process, from an arbitary time \(t\) to \(t+1\), given arbitrary states and parameters. We provide this to `pomp`

in the form of a *Csnippet*, a little snippet of C code that performs the computation. The following does this.

```
stochStep <- Csnippet("
N = r*N*exp(-c*N+rnorm(0,sigma));
")
pomp(
parus,
rprocess=discrete.time.sim(step.fun=stochStep,delta.t=1),
initializer=Csnippet("N = N_0;"),
paramnames=c("r","c","sigma","N_0"),
statenames=c("N")
) -> parus
```

Note that in the above, we use the`rnorm`

function from the **R** API. In general any C function provided by **R** is available to you. **pomp** also provides a number of C functions that are documented in the header file, `pomp.h`

, that is installed with the package. See the Csnippet documentation (`?Csnippet`

) to read more about how to write them. Note too that we use `discrete.time.sim`

here because the model is a stochastic map. We specify that the time step of the discrete-time process is `delta.t`

, here, 1 yr.

At this point, we have what we need to simulate the state process:

```
sim <- simulate(parus, params=c(N_0=1,r=12,c=1,sigma=0.5),
as.data.frame=TRUE, states=TRUE)
plot(N~time,data=sim,type='o')
```

We complete the specification of the POMP by specifying the measurement model. To obtain the Poisson measurement model described above, we write two Csnippets. The first simulates:

`rmeas <- Csnippet("pop = rpois(phi*N);")`

The second computes the likelihood of observing `pop`

birds given a true density of `N`

:

`dmeas <- Csnippet("lik = dpois(pop,phi*N,give_log);")`

In the above, `rpois`

and `dpois`

again come from **R**’s C API. Note the `give_log`

argument. When this code is evaluated, `give_log`

will be set to 1 if the log likelihood is desired, and 0 else.

We fold these into the `pomp`

object:

```
pomp(parus,
rmeasure=rmeas,
dmeasure=dmeas,
statenames=c("N"),
paramnames=c("phi")
) -> parus
```

Now we can simulate the whole POMP. First, let’s add some parameters:

`coef(parus) <- c(N_0=1,r=20,c=1,sigma=0.1,phi=200)`

```
library(ggplot2)
sims <- simulate(parus,nsim=3,as.data.frame=TRUE,include.data=TRUE)
ggplot(data=sims,
mapping=aes(x=time,y=pop))+
geom_line()+
facet_wrap(~sim,ncol=1,scales="free_y")
```