The **R** codes in this script are available for download. This work is licensed under the Creative Commons attribution-noncommercial license. Please share and remix noncommercially, mentioning its origin.

This tutorial aims to help you get started using **pomp** as a suite of tools for analysis of time series data based on stochastic dynamical systems models. First, we give some conceptual background regarding the class of models—partially observed Markov processes (POMPs)—that **pomp** handles. We then discuss some preliminaries: installing the package and so on. Next, we show how to simulate a POMP using **pomp**. We then analyze some data using a few different tools. In so doing, we illustrate some of the package’s capabilities by using its algorithms to fit, compare, and criticize the models using **pomp**’s algorithms. From time to time, exercises for the reader are given.

As its name implies **pomp** is focused on partially observed Markov process models. These are also known as state-space models, hidden Markov models, and stochastic dynamical systems models. Such models consist of an unobserved Markov state process, connected to the data via an explicit model of the observation process. We refer to the former as the *latent process model* (or process model for short) and the latter as the *measurement model*.

Mathematically, each model is a probability distribution. Let \(Y_n\) denote the measurement process and \(X_n\) the latent state process, then by definition, the process model is determined by the density \(f_{X_n|X_{n-1}}\) and the initial density \(f_{X_0}\). The measurement process is determined by the density \(f_{Y_n|X_n}\). These two submodels determine the full POMP model, i.e., the joint distribution \(f_{X_{0:N},Y_{1:N}}\). If we have a sequence of measurements, \(y^*_{n}\), made at times \(t_n\), \(n=1,\dots,N\), then we think of these data, collectively, as a single realization of the \(Y\) process.

The following schematic shows shows causal relations among the process model, the measurement model, and the data. *From the statistical point of view, the key perspective is that the model is, at least hypothetically, the process that generated the data.*

Mathematically, a POMP is characterized by two conditions.

- The state process, \(X_n\), is Markovian, i.e., \[\mathrm{Prob}[X_n|X_0,\dots,X_{n-1},Y_1,\dots,Y_{n-1}]=\mathrm{Prob}[X_n|X_{n-1}].\]
- The measurements, \(Y_n\), depend only on the state at that time: \[\mathrm{Prob}[Y_n|X_0,\dots,X_{n},Y_1,\dots,Y_{n-1}]=\mathrm{Prob}[Y_n|X_{n}],\] for all \(n=1,\dots,N\).

These conditions can be represented schematically by the following diagram, which indicates the direct dependencies among model variables.

To implement a POMP model in **pomp**, we have to specify the measurement and process distributions. Note however, that, for each of the process and the measurement models there are two distinct operations that we might desire to perform:

- we might wish to
*simulate*, i.e., to draw a random sample from the distribution, or - we might wish to
*evaluate the density*itself at given values of \(X_n\) and/or \(Y_n\).

Following the **R** convention, we refer to the simulation of \(f_{X_n|X_{n-1}}\) as the *rprocess* component of the POMP model and the evaluation of \(f_{X_n|X_{n-1}}\) as the *dprocess* component. Similarly, the simulation of \(f_{Y_n|X_n}\) is the *rmeasure* component while the evaluation of \(f_{Y_n|X_n}\) is the *dmeasure* component. Finally, we’ll call a simulator of \(f_{X_0}\) the *rinit* component. Collectively, we’ll refer to these, and other, similarly basic elements of the model, as the model’s *basic components*.

A method that makes no use of the *dprocess* component is said to be “plug-and-play” or to have the “plug-and-play property”. At present, **pomp** is focused on such methods, so there is no reason to discuss the dprocess component further in this document. In the following, we will illustrate and explain how one specifies the rprocess, rmeasure, and dmeasure components of a model in **pomp**. We will illustrate this using some simple examples.

To get started, we must install **pomp**, if it is not already installed. This package cannot yet be downloaded from CRAN (though that will change in the near future). However, the latest version is always available at the package homepage on Github. See the package website for installation instructions.

In this document, we will ultimately learn to implement POMP models using the package’s “C snippet” facility. This allows the user to write model components using snippets of C code, which is then compiled and linked into a running **R** session. This typically affords a manyfold increase in computation time. It is possible to avoid C snippets entirely by writing model components as **R** functions, and we will begin by doing so, but the speed-ups afforded by C snippets are typically too good to pass up. To use C snippets, you must be able to compile C codes. Compilers are not by default installed on Windows or Mac systems, so users of such systems must do a bit more work to make use of **pomp**’s facilities. The installation instructions on the package website give details.

Having dispensed with the preliminaries, we now explore some of the functionality provided by **pomp**. To assist the reader in following this exploration, the **R** codes for this document are available.

Let us see how to implement a very simple POMP model. In particular, let’s begin by focusing on the famous Ricker model (Ricker 1954), which posits a nonlinear relationship between the size, \(N(t)\), of a population in year \(t\) and its size, \(N(t+1)\), one year later: \[N(t+1)=r\,N(t)\,\exp\left(1-\frac{N(t)}{K}\right).\tag{1}\] Here, \(r\) and \(K\) are constant parameters, usually termed the *intrinsic growth rate* and the *carrying capacity*, respectively. As written, this is a deterministic model: it does not allow for any variability in the population dynamics. Let’s modify the Ricker equation by assuming that \(r\) is not constant, but instead has random variation from year to year. If we assume that the intrinsic growth rate varies from year to year as a lognormal random variable, we obtain \[N(t+1)=r\,N(t)\,\exp\left(1-\frac{N(t)}{K}+\varepsilon(t)\right),\tag{2}\] where \(\varepsilon(t)\sim\mathrm{Normal}(0,\sigma)\). Note that we’ve introduced a new parameter, \(\sigma\), which quantifies the intensity of the noise in the population dynamics. Ecologically speaking, Eq. 2 is a model with *environmental stochasticity*.

Typically, it is relatively straightforward to simulate a POMP model. To accomplish this in **pomp**, as we’ve already discussed, we specify the rprocess component of the model. We’ll also need to choose values for the model parameters, \(r\), \(K\), and \(\sigma\). We’ll also need to make a choice regarding the initial condition, \(N(0)\). The simplest choice is to treat \(N(0)=N_0\) as a parameter.

`library(pomp)`

```
simulate(t0=0, times=1:20,
params=c(r=1.2,K=200,sigma=0.1,N_0=50),
rinit=function (N_0, ...) {
c(N=N_0)
},rprocess=discrete_time(
function (N, r, K, sigma, ...) {
rnorm(n=1,mean=0,sd=sigma)
eps <-c(N=r*N*exp(1-N/K+eps))
},delta.t=1
) sim1 ) ->
```

`## Warning: 'rmeasure' unspecified: NAs generated.`

Notice that we’ve specified the rinit and rprocess components of the model as functions. These functions take as arguments the relevant variables (whether these are state variables or parameters). Importantly, they return *named numeric vectors*. Names of variables and parameters are very important in **pomp**. Notice too that we’ve used the `discrete_time`

function, which encodes the fact that our Ricker model is a discrete-time stochastic process (a Markov chain). The first argument of `discrete_time`

is an **R** function encoding Eq. 2; the second argument specifies the discrete time-step.

Note also that the parameters are furnished in the form of a named vector, and that we’ve specified both `t0`

and `times`

. The former is the *initial time*, \(t_0\), i.e., the time at which the initial conditions apply. Since our initial condition is that \(N(0)=N_0\), our initial time is \(t_0=0\). The `times`

argument specifies the observation times \(t_1,\dots,t_N\).

Finally, note that we received a warning about `NA`

values being generated. We will soon see what this is about.

What sort of an object is `sim1`

? If we print it, we see

` sim1`

`## <object of class 'pomp'>`

`sim1`

is evidently an object of class ‘pomp’. We refer to these as ‘pomp’ objects.

To get more insight into the structure of `sim1`

, we can use `spy`

:

`spy(sim1)`

**pomp** provides methods for plotting ‘pomp’ objects. For example,

`plot(sim1)`