version 2 upgrade guide


Overview

In pomp version 2, a number of backward incompatible changes have been made. These have been designed to increase the usability of the package by making the interfaces more uniform, by increasing code stability, reducing the number of special cases, and removing many of the idiosyncrasies that were present in earlier versions. The goal has been to keep backward-incompatible changes to a minimum. However, to achieve some improvements, it has been necessary to make some that will break existing code. This guide is intended to explain the new structure and point out the changes that are needed to make existing codes that use pomp work with the new version.

The main novelty of pomp version 2 is that one will rarely, if ever, need to interact directly with the low-level pomp constructor. Instead, one can supply new or modify existing model components at (almost) any stage in a chain of pomp computations. In particular, there are now data-frame methods for all of the pomp inference algorithms, as well as for the so-called elementary POMP algorithms: simulate, pfilter, probe, and spect. One can pass data directly to these methods, along with the requisite model components, and achieve the same effect as one would by first constructing a object and then performing the operation.

pomp version 2 represents a thorough reworking of almost all the package codes. A consequence is that these codes have been streamlined, with increases in flexibility and ease of use, at no cost to performance. The following table shows the number of lines in the latest pre-2 pomp compared to the current pomp version 2.

category 1.19 2.5 fraction
R 12150 9847 -0.190
man 4069 6908 0.698
src 5238 5567 0.063
tests 4846 3677 -0.241
total 26303 25999 -0.012

Terminology

It is useful to divide the pomp package functionality into different levels.

The term basic model component is meant to refer to the procedure itself, as distinct from the execution of the procedure. The user specifies the procedure (in one of several forms); the package decides when and where to execute the procedure.

Changes in the way models are specified

Basic component specification

The manner in which one writes R functions to specify basic model components has been totally changed. Before, one wrote functions that took specific arguments such as x, params, and covars. Now, one writes such functions with any or all state variables, observables, covariates, and/or time as arguments. Thus for example, in versions <2, one might have specified a measurement model density evaluator (‘dmeasure’) so:

...,
dmeasure = function (y, x, t, params, covars, ..., log) {
  dnbinom(x=y["count"],mu=x["s"],size=params["theta"],log=log)
},
...

Here, the state variable “s”, passed via the argument x is the expected value of the observable “count”, which is assumed to be negative-binomially distributed with size parameter “theta”. The observables are passed via the vector y and the parameters via the vector params and so the variables of interest must be extracted by name from these vectors. Note that the time variable t is not used but must nevertheless be named as a formal argument.

In pomp version 2, the corresponding ‘dmeasure’ specification would be

...,
dmeasure = function (count, s, theta, ..., log) {
  dnbinom(x=count,mu=s,size=theta,log=log)
},
...

Note that there is no longer a need to extract the relevant variables from vectors. Moreover, the only required argument is .... The available arguments are taken from the set of observables, state variables, parameters, covariates, and time, as before.

These remarks apply whenever one specifies a basic model component using an R function. For the most part, C snippets that worked with pomp version <2 will continue to work. The exception is that parameter transformation C snippets may need to be rewritten (see below).

Process model specification

Prior to version 2, one specified the rprocess component using one of the plugins onestep.sim, discrete.time.sim, euler.sim, gillespie.sim, or gillespie.hl.sim. These have been renamed onestep, discrete_time, euler, gillespie, and gillespie_hl, respectively, but are otherwise unchanged.

Note that, if one uses an R function to specify the process model simulator, the remarks above under “Basic component specification” apply.

Initialization of the latent state process

In pomp versions <2, one simulated from the distribution of the latent state process via the “initializer” component. As of version 2, this has been renamed “rinit”. The “rinit” component is supplied to pomp functions using the argument of the same name.

If rinit is not explicitly specified, the default behavior is to treat any parameters ending with suffices _0 or .0 as the values of correspondingly named latent state variables at time t0.

Covariates

Prior to version 2, time-varying covariates upon which basic model components depended were supplied via the two arguments covar and tcovar to pomp. For example, in one of the examples, we see

cbind(
  time=seq(from=1928,to=1934,by=0.01),
  as.data.frame(
    periodic_bspline_basis(
      x=seq(from=1928,to=1934,by=0.01),
      nbasis=3,
      degree=3,
      period=1,
      names="seas%d"
    )
  ),
  pop=dat$population
) -> covar

pomp(
  ...,
  covar=covar,
  tcovar="time",
  ...
)

Note that, as this example illustrates, it was often necessary to first construct the covariate table as a data frame before passing it as argument to pomp.

As of version 2, one includes such time-varying covariates as specially constructed table via the single argument covar, which can be furnished to any POMP elementary or estimation algorithm. One constructs a covariate table using the covariate_table command. The syntax for specifying a covariate table is quite flexible. In particular, the arguments to covariate_table are evaluated sequentially, so that later ones can depend on earlier ones. Once evaluated, the covariates are bound column-wise into a single data frame. One can also provide a data frame to covariate_table, along with the name of the time variable, which matches the old usage very closely. Thus, one might use the following in place of the above:

pomp(
  ...,
  covar=covariate_table(
    t=seq(from=1928,to=1934,by=0.01),
    seas=periodic_bspline_basis(t,nbasis=3,degree=3,period=1),
    pop=dat$population,
    times="t"
  ),
  ...
)

Prior to version 2, covariates were always linearly interpolated when furnished to any of the basic model components. As of version 2, although linear interpolation remains the default, one can also use the order argument to direct that the covariates be treated as piecewise-constant, right-continuous functions.

Parameter transformations

In pomp version <2, one specified parameter transformations by means to two arguments to pomp: fromEstimationScale and toEstimationScale. As of version 2, both forward and inverse transformations are encapsulated in a single object passed via the partrans argument to any of the pomp elementary or inference algorithms. The parameter transformation object is constructed by means of a call to parameter_trans. One can specify general forward and inverse transformations via the fromEst and toEst arguments to this function. If these are specified using R functions, the remarks above (“Basic component specification”) apply. If one specifies these using C snippets, the syntax has changed from that of versions <2. In particular, when writing C snippets for parameter transformations, for a parameter “p”, the notation p always refers to p on the natural scale and T_p refers to its value of p on the estimation scale. Thus, if parameter “alpha” is to be log-transformed for estimation, the toEst snippet would contain the line

T_alpha = log(alpha);

and the fromEst snippet would include

alpha = exp(T_alpha);

Because many use cases involve simply log, logit, or log-barycentric transformations, one can handle these cases without needing to write toEst and fromEst C snippets, by naming the parameters to be so transformed in the log, logit, and barycentric arguments of parameter_trans. When these arguments are supplied, parameter_trans internally writes a pair of C snippets to implement the transformations. As with other cases when one refers to parameters in C snippets, one must include all parameters named in the paramnames argument.

At any point, if one sets partrans = NULL in an elementary or inference algorithm, the parameter transformations are reset to the identity transformation.

Accumulator variables

In pomp versions <2, one named accumulator variables in the zeronames argument of pomp. This argument has been renamed accumvars: it can be supplied to any pomp elementary or estimation algorithm.

Changes in elementary POMP algorithms

simulate

In pomp versions <2, if one wanted to simulate a POMP model, one had to construct a ‘pomp’ first, via a call to pomp. The package structure necessitated that one provide a dummy data frame to pomp, a fairly artificial proceeding. As of version 2, one can simulate a POMP model without reference to any data at all. One simply calls simulate with the necessary arguments supplied. The needed arguments include times, t0, and rprocess. One can optionally provide rinit and rmeasure.

The format of the expressions returned by simulate is determined by the new format argument. The old obs, states, and as.data.frame arguments have been eliminated. See ?simulate for more information.

simulate now returns more informative results when simulations from multiple parameter sets are simultaneously computed. Specifically, if on the call to simulate, params has column names, these are used to identify the resulting simulations. Thus when format = "pomps" (the default), the names of the resulting list of pomps will be constructed from the column names of params. Likewise, when format = "arrays", the resulting arrays will have informative column names and when format = "data.frame", the identifier variable will make use of the column names.

pfilter and bsmc2

The basic particle filter, pfilter, has a simpler mode of operation. In calls to pfilter, params should be a single parameter set only. That is, one can no longer possible to pass a matrix of parameters to pfilter.

Similarly, the Liu-West algorithm, bsmc2, has a simpler mode of operation. In calls to bsmc2, params should be a single parameter set only. The requisite Np particles are drawn from the distribution specified by the “rprior” basic model component.

Change in pomp estimation algorithms

Parameter transformations in estimation

In pomp versions <2, several of the estimation algorithms had a logical transform option. Setting transform = TRUE caused estimation to be performed on the transformed scale, if parameter transformations had been supplied. As of version 2, transformations are automatically applied when it is appropriate to do so and when they exist. One can remove transformations by simply setting partrans = NULL in the call to any pomp estimation algorithm.

mif2 defaults

The default cooling schedule (cooling.type) in mif2 is now “geometric”, in contrast to “hyperbolic”, as before.

params not start

To specify the parameters used to start an iterative estimation algorithm, such as mif2, pmcmc, or abc, use the params argument. In pomp versions < 2, these functions had an argument named start for this purpose.

Changes in optimization-based methods: probe matching, spectrum matching, trajectory matching, nonlinear forecasting

In pomp versions <2, the functions probe.match, spect.match, traj.match, and nlf called numerical optimizers (e.g., optim or optimizers from the subplex or nloptr packages) to estimate model parameters by minimizing their respective objective functions. As of version 2, these functions are no longer part of the package. Their functionality has been replaced by a new set of objective function methods.

The functions probe_objfun, spect_objfun, traj_objfun, and nlf_objfun construct stateful objective functions. These functions can be passed to any optim-like numerical minimization routine. This is a large increase in the flexibility of the package, since one is free to choose essentially any minimization routine.

The fact that the objective functions are stateful means that each such function stores in memory the results of the last time it was called. This makes it very easy to extract information about the fitted model regardless of the optimization routine. See the help documentation for examples.

Changes in pomp workhorses

Default model components

In pomp version <2, attempts to use a basic model component that had not been defined by the user resulted in an error message. The exception to this was the initial latent state sampler initializer which had a default setting. As of version 2, all the basic model components now have defaults.

From the user’s point of view, this means, for example, that a call to simulate when rmeasure has not been specified will result in either empty or missing observables. If rprocess has not been specified, then a call to simulate will likewise return either zero latent state variables, or latent state variables with NA values.

skeleton

In skeleton, the t argument has been replaced by times, to make this uniform with the other workhorse functions.

trajectory

The as.data.frame argument to trajectory has been removed in favor of a new format argument that allows one to choose between receiving the results in the form of an array or a data frame.

When trajectory calls on deSolve routines to numerically integrate a model vectorfield, more informative error messages are generated, and diagnostics are printed when verbose = TRUE.

flow

A new workhorse has been introduced, similar to trajectory but at a lower level. The flow function iterates or integrates the deterministic skeleton to return trajectories. See ?flow for details.

rprocess

The call to rprocess has changed. One now retrieves simulations of the latent state process by doing

rprocess(object, x0, t0, times, params)

where object, as usual, is the pomp, x0 is the named vector or matrix with rownames giving the state of the process at time t0, times are the times at which one desires simulated states, and params are the parameters.

Included pomp examples

The pompExample function has been eliminated. The suite of toy examples has been enlarged, but these are accessed via calls to ordinary package functions. Thus, for example, one creates the Ricker toy model via a call to ricker() and the SIR model simulated via Euler’s method via a call to sir(). See the documentation for a full list of toy examples

Included datasets

A number of new datasets are provided with the package. See data(package="pomp") for a full list.

Facilities removed in version 2

Recoding your models for pomp version 2: examples

The following two examples are taken from the pomp Journal of Statistical Software paper. In each case, we first look at the pomp version <2 implementation, then at the version 2 implementation.

Gompertz model

First, the Gompertz model, which has a single latent state variable, \(X\), and a single observable, \(Y\). The model is defined by the relations \[\begin{gathered} S = e^{r \delta}\\ X(t+\delta)\;\sim\;\mathrm{Lognormal}\left(\log(K^{1-S}\,X^S),\sigma\right)\\ Y(t)\;\sim\;\mathrm{Lognormal}\left(\log{X(t)},\tau\right) \end{gathered}\] where \(r\), \(K\), \(\sigma\), and \(\tau\) are parameters and \(\delta\) is the discrete time-step.

Here is how the model was implemented in pomp version <2. We write all the basic model components using R functions assuming \(\delta=1\). We also do one simulation.

## rprocess
gompertz.proc.sim <- function (x, t, params, delta.t, ...) {
  eps <- exp(rnorm(n=1,mean=0,sd=params["sigma"]))
  S <- exp(-params["r"]*delta.t)
  setNames(params["K"]^(1-S)*x["X"]^S*eps,"X")
}

## rmeasure
gompertz.meas.sim <- function (x, t, params, ...) {
  setNames(rlnorm(n=1,meanlog=log(x["X"]),sd=params["tau"]),"Y")
}

## dmeasure
gompertz.meas.dens <- function (y, x, t, params, log, ...) {
  dlnorm(x=y["Y"],meanlog=log(x["X"]),sdlog=params["tau"],log=log)
}

## initializer
gompertz.init <- function (t0, params, ...) {
  setNames(params["X_0"],"X")
}

## Parameter transformations
gompertz.log.tf <- function (params, ...) log(params)
gompertz.exp.tf <- function (params, ...) exp(params)

## pomp construction
pomp(
  data=data.frame(time=1:100, Y=NA), 
  times="time", t0=0,
  rprocess=discrete.time.sim(step.fun=gompertz.proc.sim,delta.t=1), 
  rmeasure=gompertz.meas.sim,
  dmeasure=gompertz.meas.dens,
  initializer=gompertz.init,
  toEstimationScale=gompertz.log.tf,
  fromEstimationScale=gompertz.exp.tf,
  params=c(r=0.1,K=1,sigma=0.1,tau=0.1,X_0=1)
) -> gomp1R

simulate(gomp1R) -> gomp1R

Now, we implement the same model using pomp version 2. Note that initializer becomes rinit and that discrete.time.sim becomes discrete_time. Also, the parameter transformation syntax is different.

library(pomp)

## rprocess
gompertz.proc.sim <- function (X, r, K, sigma, delta.t, ...) {
  eps <- exp(rnorm(n=1,mean=0,sd=sigma))
  S <- exp(-r*delta.t)
  c(X=K^(1-S)*X^S*eps)
}

## rmeasure
gompertz.meas.sim <- function (X, tau, ...) {
  c(Y=rlnorm(n=1,meanlog=log(X),sd=tau))
}

## dmeasure
gompertz.meas.dens <- function (X, tau, Y, log, ...) {
  dlnorm(x=Y,meanlog=log(X),sdlog=tau,log=log)
}

## rinit
gompertz.init <- function (X_0, ...) {
  c(X=X_0)
}

## pomp construction via simulation
simulate(
  times=1:100, t0=0,
  rprocess=discrete_time(gompertz.proc.sim,delta.t=1), 
  rmeasure=gompertz.meas.sim,
  dmeasure=gompertz.meas.dens,
  rinit=gompertz.init,
  partrans=parameter_trans(log=c("K","r","sigma","tau")),
  paramnames=c("K","r","sigma","tau"),
  params=c(r=0.1,K=1,sigma=0.1,tau=0.1,X_0=1)
) -> gomp2R

To implement the model using C snippets, in pomp versions <2 we might do the following.

pomp(
  data=data.frame(time=1:100, Y=NA), 
  times="time", t0=0,
  rprocess=discrete.time.sim(
    step.fun=Csnippet("
      double e = rnorm(0,sigma);
      double S = exp(-r*dt);
      X = exp((1-S)*log(K)+S*log(X)+e);
  "),
    delta.t=1
  ), 
  rmeasure=Csnippet("
      Y = exp(rnorm(log(X),tau));
    "),
  dmeasure=Csnippet("
      lik = dlnorm(Y,log(X),tau,give_log);
    "),
  initializer=Csnippet("
      X = X_0;
    "),
  toEstimationScale=Csnippet("
      TK = log(K);
      Tr = log(r);
      Tsigma = log(sigma);
      Ttau = log(tau);
      TX_0 = log(X_0);
    "),
  fromEstimationScale=Csnippet("
      TK = exp(K);
      Tr = exp(r);
      Tsigma = exp(sigma);
      Ttau = exp(tau);
      TX_0 = exp(X_0);
    "),
  paramnames=c("r","K","sigma","tau","X_0"),
  statenames="X",
  params=c(r=0.1,K=1,sigma=0.1,tau=0.1,X_0=1)
) -> gomp1C

simulate(gomp1C) -> gomp1C

In version 2, we would do something like the following. In addition to the differences noted above, notice that we must specify obsnames here, since the name of the observed variable is not provided in a dummy data frame as in versions <2. Note also that no changes to the C snippets themselves are needed.

simulate(
  times=1:100, t0=0,
  rprocess=discrete_time(
    step.fun=Csnippet("
      double e = rnorm(0,sigma);
      double S = exp(-r*dt);
      X = exp((1-S)*log(K)+S*log(X)+e);
  "),
    delta.t=1
  ), 
  rmeasure=Csnippet("
      Y = exp(rnorm(log(X),tau));
    "),
  dmeasure=Csnippet("
      lik = dlnorm(Y,log(X),tau,give_log);
    "),
  rinit=Csnippet("
      X = X_0;
    "),
  partrans=parameter_trans(log=c("K","r","sigma","tau")),
  paramnames=c("r","K","sigma","tau","X_0"),
  statenames="X",
  obsnames="Y",
  params=c(r=0.1,K=1,sigma=0.1,tau=0.1,X_0=1)
) -> gomp2C

SIR model

This example, also drawn from the Journal of Statistical Software paper, involves covariates and accumulator variables as well as more complex parameter transformations.

## Construct some fake birthrate data.
library(tidyverse)

data.frame(time=seq(-1,11,by=1/12)) %>%
  mutate(
    birthrate=5e5*bspline_basis(time,nbasis=5)%*%c(0.018,0.019,0.021,0.019,0.015)
  ) -> birthdat
## measurement model C snippets
rmeas <- "
  cases = rnbinom_mu(theta, rho * H);
"

dmeas <- "
  lik = dnbinom_mu(cases, theta, rho * H, give_log);
"

## initializer
rinit <- "
  double m = popsize/(S_0 + I_0 + R_0);
  S = nearbyint(m*S_0);
  I = nearbyint(m*I_0);
  R = nearbyint(m*R_0);
  H = 0;
  Phi = 0;
  noise = 0;
"

## rprocess
seas.sir.step <- "
  double rate[6];
  double dN[6];
  double Beta;
  double dW;
  Beta = exp(b1 + b2 * cos(M_2PI * Phi) + b3 * sin(M_2PI * Phi));
  rate[0] = birthrate;
  rate[1] = Beta * (I + iota) / popsize;
  rate[2] = mu;
  rate[3] = gamma;
  rate[4] = mu;
  rate[5] = mu;
  dN[0] = rpois(rate[0] * dt);
  reulermultinom(2, S, &rate[1], dt, &dN[1]);
  reulermultinom(2, I, &rate[3], dt, &dN[3]);
  reulermultinom(1, R, &rate[5], dt, &dN[5]);
  dW = rnorm(dt, sigma * sqrt(dt));
  S += dN[0] - dN[1] - dN[2];
  I += dN[1] - dN[3] - dN[4];
  R += dN[3] - dN[5];
  Phi += dW;
  H += dN[1];
  noise += (dW - dt) / sigma;
"

toest <- "
  to_log_barycentric(&TS_0,&S_0,3);
  Tsigma = log(sigma);
  Tiota = log(iota);
"

fromest <- "
  from_log_barycentric(&TS_0,&S_0,3);
  Tsigma = exp(sigma);
  Tiota = exp(iota);
"
data.frame(time=seq(0,10,by=1/52),cases=NA) %>%
  pomp(
    times="time", t0=-1/52, 
    covar = birthdat, tcovar = "time",
    dmeasure = Csnippet(dmeas),
    rmeasure = Csnippet(rmeas),
    initializer=Csnippet(rinit),
    rprocess = euler.sim(
      step.fun = Csnippet(seas.sir.step), 
      delta.t = 1/52/20
    ),
    toEstimationScale=Csnippet(toest),
    fromEstimationScale=Csnippet(fromest),
    statenames = c("S", "I", "R", "H", "Phi", "noise"),
    paramnames = c("gamma", "mu", "theta", "b1", "b2", "b3",
      "popsize","rho", "iota", "sigma", "S_0", "I_0", "R_0"), 
    zeronames = c("H", "noise"),
    params = c(popsize = 500000, iota = 5, gamma = 26, mu = 1/50,
      b1 = 6, b2 = 0.2, b3 = -0.1, rho = 0.1, theta = 100, 
      sigma = 0.3, S_0 = 0.055, I_0 = 0.002, R_0 = 0.94)
  ) %>% 
  simulate() -> sir1C

When implementing the same model in pomp version 2, we can use almost all the same C snippets without modification. The exception is the C snippets for the parameter transformations. The syntax for the inclusion of the covariates is different as is that used to specify the names of the accumulator variables. For example:

simulate(
  times=seq(0,10,by=1/52), t0=-1/52,
  covar=covariate_table(birthdat,times="time"),
  dmeasure = Csnippet(dmeas),
  rmeasure = Csnippet(rmeas),
  rinit=Csnippet(rinit),
  rprocess = euler(
    step.fun = Csnippet(seas.sir.step), 
    delta.t = 1/52/20
  ),
  partrans=parameter_trans(
    toEst=Csnippet("
      to_log_barycentric(&T_S_0,&S_0,3);
      T_sigma = log(sigma);
      T_iota = log(iota);"),
    fromEst=Csnippet("
      from_log_barycentric(&S_0,&T_S_0,3);
      sigma = exp(T_sigma);
      iota = exp(T_iota);")
  ),
  statenames = c("S", "I", "R", "H", "Phi", "noise"),
  obsnames=c("cases"),
  paramnames = c("gamma", "mu", "theta", "b1", "b2", "b3",
    "popsize","rho", "iota", "sigma", "S_0", "I_0", "R_0"), 
  accumvars = c("H", "noise"),
  params = c(popsize = 500000, iota = 5, gamma = 26, mu = 1/50,
    b1 = 6, b2 = 0.2, b3 = -0.1, rho = 0.1, theta = 100, 
    sigma = 0.3, S_0 = 0.055, I_0 = 0.002, R_0 = 0.94)
) -> sir2C

Note also that, for specifying the above parameter transformations, pomp version 2 provides the alternate form

simulate(
  ...,
  partrans=parameter_trans(
    log=c("sigma","iota"),
    barycentric=c("S_0","I_0","R_0")
  ),
  ...
) -> sir2C

This document was produced using pomp version 5.10 and R version 4.4.1.