The R codes in this script are available for download. This work is licensed under the Creative Commons attribution-noncommercial license. Please share and remix noncommercially, mentioning its origin.
CC-BY_NC

Introduction

This tutorial aims to help you get started using pomp as a suite of tools for analysis of time series data based on stochastic dynamical systems models. First, we give some conceptual background regarding the class of models—partially observed Markov processes (POMPs)—that pomp handles. We then discuss some preliminaries: installing the package and so on. Next, we show how to simulate a POMP using pomp. We then analyze some data using a few different tools. In so doing, we illustrate some of the package’s capabilities by using its algorithms to fit, compare, and criticize the models using pomp’s algorithms. From time to time, exercises for the reader are given.

Partially observed Markov process (POMP) models

Structure of a POMP

As its name implies pomp is focused on partially observed Markov process models. These are also known as state-space models, hidden Markov models, and stochastic dynamical systems models. Such models consist of an unobserved Markov state process, connected to the data via an explicit model of the observation process. We refer to the former as the latent process model (or process model for short) and the latter as the measurement model.

Mathematically, each model is a probability distribution. Let \(Y_n\) denote the measurement process and \(X_n\) the latent state process, then by definition, the process model is determined by the density \(f_{X_n|X_{n-1}}\) and the initial density \(f_{X_0}\). The measurement process is determined by the density \(f_{Y_n|X_n}\). These two submodels determine the full POMP model, i.e., the joint distribution \(f_{X_{0:N},Y_{1:N}}\). If we have a sequence of measurements, \(y^*_{n}\), made at times \(t_n\), \(n=1,\dots,N\), then we think of these data, collectively, as a single realization of the \(Y\) process.


The following schematic shows shows causal relations among the process model, the measurement model, and the data. From the statistical point of view, the key perspective is that the model is, at least hypothetically, the process that generated the data.

Structure of a POMP. Arrows show the direction of causality. The closed loop from the state process to itself indicates the dynamic nature of this Markovian process. Information flow runs in the opposite direction.


Mathematically, a POMP is characterized by two conditions.

  1. The state process, \(X_n\), is Markovian, i.e., \[\mathrm{Prob}[X_n|X_0,\dots,X_{n-1},Y_1,\dots,Y_{n-1}]=\mathrm{Prob}[X_n|X_{n-1}].\]
  2. The measurements, \(Y_n\), depend only on the state at that time: \[\mathrm{Prob}[Y_n|X_0,\dots,X_{n},Y_1,\dots,Y_{n-1}]=\mathrm{Prob}[Y_n|X_{n}],\] for all \(n=1,\dots,N\).

These conditions can be represented schematically by the following diagram, which indicates the direct dependencies among model variables.


**Conditional independence graph of a POMP.**  The latent state $X_n$ at time $t_n$ is conditionally independent of its history given $X_{n-1}$.  The observation $Y_n$ is conditionally independent of all other variables given $X_n$.  The underlying time scale can be taken to be either discrete or continuous and the observation times need not be equally spaced.

Conditional independence graph of a POMP. The latent state \(X_n\) at time \(t_n\) is conditionally independent of its history given \(X_{n-1}\). The observation \(Y_n\) is conditionally independent of all other variables given \(X_n\). The underlying time scale can be taken to be either discrete or continuous and the observation times need not be equally spaced.


Basic model components

To implement a POMP model in pomp, we have to specify the measurement and process distributions. Note however, that, for each of the process and the measurement models there are two distinct operations that we might desire to perform:

  1. we might wish to simulate, i.e., to draw a random sample from the distribution, or
  2. we might wish to evaluate the density itself at given values of \(X_n\) and/or \(Y_n\).

Following the R convention, we refer to the simulation of \(f_{X_n|X_{n-1}}\) as the rprocess component of the POMP model and the evaluation of \(f_{X_n|X_{n-1}}\) as the dprocess component. Similarly, the simulation of \(f_{Y_n|X_n}\) is the rmeasure component while the evaluation of \(f_{Y_n|X_n}\) is the dmeasure component. Finally, we’ll call a simulator of \(f_{X_0}\) the rinit component. Collectively, we’ll refer to these, and other, similarly basic elements of the model, as the model’s basic components.

The plug-and-play property

A method that makes no use of the dprocess component is said to be “plug-and-play” or to have the “plug-and-play property”. At present, pomp is focused on such methods, so there is no reason to discuss the dprocess component further in this document. In the following, we will illustrate and explain how one specifies the rprocess, rmeasure, and dmeasure components of a model in pomp. We will illustrate this using some simple examples.

Preliminaries

Installing the package

To get started, we must install pomp, if it is not already installed. This package cannot yet be downloaded from CRAN (though that will change in the near future). However, the latest version is always available at the package homepage on Github. See the package website for installation instructions.

Important information for Windows and Mac users.

In this document, we will ultimately learn to implement POMP models using the package’s “C snippet” facility. This allows the user to write model components using snippets of C code, which is then compiled and linked into a running R session. This typically affords a manyfold increase in computation time. It is possible to avoid C snippets entirely by writing model components as R functions, and we will begin by doing so, but the speed-ups afforded by C snippets are typically too good to pass up. To use C snippets, you must be able to compile C codes. Compilers are not by default installed on Windows or Mac systems, so users of such systems must do a bit more work to make use of pomp’s facilities. The installation instructions on the package website give details.

Simulation of a POMP

Having dispensed with the preliminaries, we now explore some of the functionality provided by pomp. To assist the reader in following this exploration, the R codes for this document are available.

The latent state process

Let us see how to implement a very simple POMP model. In particular, let’s begin by focusing on the famous Ricker model (Ricker 1954), which posits a nonlinear relationship between the size, \(N(t)\), of a population in year \(t\) and its size, \(N(t+1)\), one year later: \[N(t+1)=r\,N(t)\,\exp\left(1-\frac{N(t)}{K}\right).\tag{1}\] Here, \(r\) and \(K\) are constant parameters, usually termed the intrinsic growth rate and the carrying capacity, respectively. As written, this is a deterministic model: it does not allow for any variability in the population dynamics. Let’s modify the Ricker equation by assuming that \(r\) is not constant, but instead has random variation from year to year. If we assume that the intrinsic growth rate varies from year to year as a lognormal random variable, we obtain \[N(t+1)=r\,N(t)\,\exp\left(1-\frac{N(t)}{K}+\varepsilon(t)\right),\tag{2}\] where \(\varepsilon(t)\sim\mathrm{Normal}(0,\sigma)\). Note that we’ve introduced a new parameter, \(\sigma\), which quantifies the intensity of the noise in the population dynamics. Ecologically speaking, Eq. 2 is a model with environmental stochasticity.

Typically, it is relatively straightforward to simulate a POMP model. To accomplish this in pomp, as we’ve already discussed, we specify the rprocess component of the model. We’ll also need to choose values for the model parameters, \(r\), \(K\), and \(\sigma\). We’ll also need to make a choice regarding the initial condition, \(N(0)\). The simplest choice is to treat \(N(0)=N_0\) as a parameter.

library(pomp)
simulate(t0=0, times=1:20,
  params=c(r=1.2,K=200,sigma=0.1,N_0=50),
  rinit=function (N_0, ...) {
    c(N=N_0)
  },
  rprocess=discrete_time(
    function (N, r, K, sigma, ...) {
      eps <- rnorm(n=1,mean=0,sd=sigma)
      c(N=r*N*exp(1-N/K+eps))
    },
    delta.t=1
  )
) -> sim1
## Warning: 'rmeasure' unspecified: NAs generated.

Notice that we’ve specified the rinit and rprocess components of the model as functions. These functions take as arguments the relevant variables (whether these are state variables or parameters). Importantly, they return named numeric vectors. Names of variables and parameters are very important in pomp. Notice too that we’ve used the discrete_time function, which encodes the fact that our Ricker model is a discrete-time stochastic process (a Markov chain). The first argument of discrete_time is an R function encoding Eq. 2; the second argument specifies the discrete time-step.

Note also that the parameters are furnished in the form of a named vector, and that we’ve specified both t0 and times. The former is the initial time, \(t_0\), i.e., the time at which the initial conditions apply. Since our initial condition is that \(N(0)=N_0\), our initial time is \(t_0=0\). The times argument specifies the observation times \(t_1,\dots,t_N\).

Finally, note that we received a warning about NA values being generated. We will soon see what this is about.

What sort of an object is sim1? If we print it, we see

sim1
## <object of class 'pomp'>

sim1 is evidently an object of class ‘pomp’. We refer to these as ‘pomp’ objects.

To get more insight into the structure of sim1, we can use spy:

spy(sim1)

pomp provides methods for plotting ‘pomp’ objects. For example,

plot(sim1)