In pomp version 2, a number of backward incompatible changes have been made. These have been designed to increase the usability of the package by making the interfaces more uniform, by increasing code stability, reducing the number of special cases, and removing many of the idiosyncrasies that were present in earlier versions. The goal has been to keep backward-incompatible changes to a minimum. However, to achieve some improvements, it has been necessary to make some that will break existing code. This guide is intended to explain the new structure and point out the changes that are needed to make existing codes that use pomp work with the new version.
The main novelty of pomp version 2 is that one will rarely, if ever, need to interact directly with the low-level pomp
constructor. Instead, one can supply new or modify existing model components at (almost) any stage in a chain of pomp computations. In particular, there are now data-frame methods for all of the pomp inference algorithms, as well as for the so-called elementary POMP algorithms: simulate
, pfilter
, probe
, and spect
. One can pass data directly to these methods, along with the requisite model components, and achieve the same effect as one would by first constructing a object and then performing the operation.
pomp version 2 represents a thorough reworking of almost all the package codes. A consequence is that these codes have been streamlined, with increases in flexibility and ease of use, at no cost to performance. The following table shows the number of lines in the latest pre-2 pomp compared to the current pomp version 2.
category | 1.19 | 2.5 | fraction |
---|---|---|---|
R | 12150 | 9847 | -0.190 |
man | 4069 | 6908 | 0.698 |
src | 5238 | 5567 | 0.063 |
tests | 4846 | 3677 | -0.241 |
total | 26303 | 25999 | -0.012 |
It is useful to divide the pomp package functionality into different levels.
rinit
: simulator for the initial-state distribution, i.e., the distribution of the latent state at time t0
.rprocess
and dprocess
: simulator and density evaluation procedure, respectively, for the process model.rmeasure
and dmeasure
: simulator and density evaluation procedure, respectively, for the measurement model.rprior
and dprior
: simulator and density evaluation procedure, respectively, for the prior distribution.skeleton
: evaluation of the deterministic skeleton.partrans
: parameter transformations.The term basic model component is meant to refer to the procedure itself, as distinct from the execution of the procedure. The user specifies the procedure (in one of several forms); the package decides when and where to execute the procedure.
Workhorses: R functions, built into the package, that cause the basic model component procedures to be executed. Each workhorse has a name that matches that of the corresponding basic model component. In addition, there is the trajectory
workhorse, which iterates or integrates the deterministic skeleton (according to whether it is a map or a vectorfield, respectively) to obtain state trajectories.
Elementary POMP algorithms: algorithms that interrogate the model or the model/data confrontation without attempting to estimate parameters. There are currently four of these:
simulate
performs simulations of the POMP model, i.e., it samples from the joint distribution of latent states and observables.pfilter
runs a sequential Monte Carlo (particle filter) algorithm to compute the likelihood and (optionally) estimate the prediction and filtering distributions of the latent state process.probe
computes one or more uni- or multi-variate summary statistics on both actual and simulated data.spect
estimates the power spectral density functions for the actual and simulated data.POMP estimation algorithms: procedures that build on the elementary algorithms and are used for estimation of parameters and other inferential tasks. There are currently ten of these:
abc
: approximate Bayesian computationbsmc2
: Liu-West algorithm for Bayesian SMCpmcmc
: a particle MCMC algorithmmif2
: iterated filtering (IF2)enkf
: ensemble Kalman filtereakf
: ensemble adjusted Kalman filtertraj_objfun
: trajectory matchingspect_objfun
: power spectrum matchingprobe_objfun
: probe matchingnlf_objfun
: nonlinear forecastingObjective function methods: among the estimation algorithms just listed, four are methods that construct stateful objective functions that can be optimized using general-purpose numerical optimization algorithms such as optim
, subplex
, or the optimizers in the nloptr package. These have certain new features that will be described below.
The manner in which one writes R functions to specify basic model components has been totally changed. Before, one wrote functions that took specific arguments such as x
, params
, and covars
. Now, one writes such functions with any or all state variables, observables, covariates, and/or time as arguments. Thus for example, in versions <2, one might have specified a measurement model density evaluator (‘dmeasure’) so:
..., function (y, x, t, params, covars, ..., log) {
dmeasure =dnbinom(x=y["count"],mu=x["s"],size=params["theta"],log=log)
}, ...
Here, the state variable “s”, passed via the argument x
is the expected value of the observable “count”, which is assumed to be negative-binomially distributed with size parameter “theta”. The observables are passed via the vector y
and the parameters via the vector params
and so the variables of interest must be extracted by name from these vectors. Note that the time variable t
is not used but must nevertheless be named as a formal argument.
In pomp version 2, the corresponding ‘dmeasure’ specification would be
..., function (count, s, theta, ..., log) {
dmeasure =dnbinom(x=count,mu=s,size=theta,log=log)
}, ...
Note that there is no longer a need to extract the relevant variables from vectors. Moreover, the only required argument is ...
. The available arguments are taken from the set of observables, state variables, parameters, covariates, and time, as before.
These remarks apply whenever one specifies a basic model component using an R function. For the most part, C snippets that worked with pomp version <2 will continue to work. The exception is that parameter transformation C snippets may need to be rewritten (see below).
Prior to version 2, one specified the rprocess
component using one of the plugins onestep.sim
, discrete.time.sim
, euler.sim
, gillespie.sim
, or gillespie.hl.sim
. These have been renamed onestep
, discrete_time
, euler
, gillespie
, and gillespie_hl
, respectively, but are otherwise unchanged.
Note that, if one uses an R function to specify the process model simulator, the remarks above under “Basic component specification” apply.
In pomp versions <2, one simulated from the distribution of the latent state process via the “initializer” component. As of version 2, this has been renamed “rinit”. The “rinit” component is supplied to pomp functions using the argument of the same name.
If rinit
is not explicitly specified, the default behavior is to treat any parameters ending with suffices _0
or .0
as the values of correspondingly named latent state variables at time t0
.
Prior to version 2, time-varying covariates upon which basic model components depended were supplied via the two arguments covar
and tcovar
to pomp
. For example, in one of the examples, we see
cbind(
time=seq(from=1928,to=1934,by=0.01),
as.data.frame(
periodic_bspline_basis(
x=seq(from=1928,to=1934,by=0.01),
nbasis=3,
degree=3,
period=1,
names="seas%d"
)
),pop=dat$population
covar
) ->
pomp(
...,covar=covar,
tcovar="time",
... )
Note that, as this example illustrates, it was often necessary to first construct the covariate table as a data frame before passing it as argument to pomp
.
As of version 2, one includes such time-varying covariates as specially constructed table via the single argument covar
, which can be furnished to any POMP elementary or estimation algorithm. One constructs a covariate table using the covariate_table
command. The syntax for specifying a covariate table is quite flexible. In particular, the arguments to covariate_table
are evaluated sequentially, so that later ones can depend on earlier ones. Once evaluated, the covariates are bound column-wise into a single data frame. One can also provide a data frame to covariate_table
, along with the name of the time variable, which matches the old usage very closely. Thus, one might use the following in place of the above:
pomp(
...,covar=covariate_table(
t=seq(from=1928,to=1934,by=0.01),
seas=periodic_bspline_basis(t,nbasis=3,degree=3,period=1),
pop=dat$population,
times="t"
),
... )
Prior to version 2, covariates were always linearly interpolated when furnished to any of the basic model components. As of version 2, although linear interpolation remains the default, one can also use the order
argument to direct that the covariates be treated as piecewise-constant, right-continuous functions.
In pomp version <2, one specified parameter transformations by means to two arguments to pomp
: fromEstimationScale
and toEstimationScale
. As of version 2, both forward and inverse transformations are encapsulated in a single object passed via the partrans
argument to any of the pomp elementary or inference algorithms. The parameter transformation object is constructed by means of a call to parameter_trans
. One can specify general forward and inverse transformations via the fromEst
and toEst
arguments to this function. If these are specified using R functions, the remarks above (“Basic component specification”) apply. If one specifies these using C snippets, the syntax has changed from that of versions <2. In particular, when writing C snippets for parameter transformations, for a parameter “p”, the notation p
always refers to p
on the natural scale and T_p
refers to its value of p
on the estimation scale. Thus, if parameter “alpha” is to be log-transformed for estimation, the toEst
snippet would contain the line
T_alpha = log(alpha);
and the fromEst
snippet would include
alpha = exp(T_alpha);
Because many use cases involve simply log, logit, or log-barycentric transformations, one can handle these cases without needing to write toEst
and fromEst
C snippets, by naming the parameters to be so transformed in the log
, logit
, and barycentric
arguments of parameter_trans
. When these arguments are supplied, parameter_trans
internally writes a pair of C snippets to implement the transformations. As with other cases when one refers to parameters in C snippets, one must include all parameters named in the paramnames
argument.
At any point, if one sets partrans = NULL
in an elementary or inference algorithm, the parameter transformations are reset to the identity transformation.
In pomp versions <2, one named accumulator variables in the zeronames
argument of pomp
. This argument has been renamed accumvars
: it can be supplied to any pomp elementary or estimation algorithm.
simulate
In pomp versions <2, if one wanted to simulate a POMP model, one had to construct a ‘pomp’ first, via a call to pomp
. The package structure necessitated that one provide a dummy data frame to pomp
, a fairly artificial proceeding. As of version 2, one can simulate a POMP model without reference to any data at all. One simply calls simulate
with the necessary arguments supplied. The needed arguments include times
, t0
, and rprocess
. One can optionally provide rinit
and rmeasure
.
The format of the expressions returned by simulate
is determined by the new format
argument. The old obs
, states
, and as.data.frame
arguments have been eliminated. See ?simulate
for more information.
simulate
now returns more informative results when simulations from multiple parameter sets are simultaneously computed. Specifically, if on the call to simulate
, params
has column names, these are used to identify the resulting simulations. Thus when format = "pomps"
(the default), the names of the resulting list of pomps will be constructed from the column names of params
. Likewise, when format = "arrays"
, the resulting arrays will have informative column names and when format = "data.frame"
, the identifier variable will make use of the column names.
pfilter
and bsmc2
The basic particle filter, pfilter
, has a simpler mode of operation. In calls to pfilter
, params
should be a single parameter set only. That is, one can no longer possible to pass a matrix of parameters to pfilter
.
Similarly, the Liu-West algorithm, bsmc2
, has a simpler mode of operation. In calls to bsmc2
, params
should be a single parameter set only. The requisite Np
particles are drawn from the distribution specified by the “rprior” basic model component.
In pomp versions <2, several of the estimation algorithms had a logical transform
option. Setting transform = TRUE
caused estimation to be performed on the transformed scale, if parameter transformations had been supplied. As of version 2, transformations are automatically applied when it is appropriate to do so and when they exist. One can remove transformations by simply setting partrans = NULL
in the call to any pomp estimation algorithm.
mif2
defaultsThe default cooling schedule (cooling.type
) in mif2
is now “geometric”, in contrast to “hyperbolic”, as before.
params
not start
To specify the parameters used to start an iterative estimation algorithm, such as mif2
, pmcmc
, or abc
, use the params
argument. In pomp versions < 2, these functions had an argument named start
for this purpose.
In pomp versions <2, the functions probe.match
, spect.match
, traj.match
, and nlf
called numerical optimizers (e.g., optim
or optimizers from the subplex or nloptr packages) to estimate model parameters by minimizing their respective objective functions. As of version 2, these functions are no longer part of the package. Their functionality has been replaced by a new set of objective function methods.
The functions probe_objfun
, spect_objfun
, traj_objfun
, and nlf_objfun
construct stateful objective functions. These functions can be passed to any optim
-like numerical minimization routine. This is a large increase in the flexibility of the package, since one is free to choose essentially any minimization routine.
The fact that the objective functions are stateful means that each such function stores in memory the results of the last time it was called. This makes it very easy to extract information about the fitted model regardless of the optimization routine. See the help documentation for examples.
In pomp version <2, attempts to use a basic model component that had not been defined by the user resulted in an error message. The exception to this was the initial latent state sampler initializer
which had a default setting. As of version 2, all the basic model components now have defaults.
rinit
behavior remains as it was: it assumes the initial state distribution is concentrated at a point mass determined by parameters with .0
or _0
suffices.dprocess
and rprocess
will result in missing values (NA
).dprior
result in 1 (0 if log = TRUE
), and all calls to rprior
result in NA
.From the user’s point of view, this means, for example, that a call to simulate
when rmeasure
has not been specified will result in either empty or missing observables. If rprocess
has not been specified, then a call to simulate
will likewise return either zero latent state variables, or latent state variables with NA
values.
skeleton
In skeleton
, the t
argument has been replaced by times
, to make this uniform with the other workhorse functions.
trajectory
The as.data.frame
argument to trajectory
has been removed in favor of a new format
argument that allows one to choose between receiving the results in the form of an array or a data frame.
When trajectory
calls on deSolve
routines to numerically integrate a model vectorfield, more informative error messages are generated, and diagnostics are printed when verbose = TRUE
.
flow
A new workhorse has been introduced, similar to trajectory
but at a lower level. The flow
function iterates or integrates the deterministic skeleton to return trajectories. See ?flow
for details.
rprocess
The call to rprocess
has changed. One now retrieves simulations of the latent state process by doing
rprocess(object, x0, t0, times, params)
where object
, as usual, is the pomp, x0
is the named vector or matrix with rownames giving the state of the process at time t0
, times
are the times at which one desires simulated states, and params
are the parameters.
The pompExample
function has been eliminated. The suite of toy examples has been enlarged, but these are accessed via calls to ordinary package functions. Thus, for example, one creates the Ricker toy model via a call to ricker()
and the SIR model simulated via Euler’s method via a call to sir()
. See the documentation for a full list of toy examples
A number of new datasets are provided with the package. See data(package="pomp")
for a full list.
probe.match
, spect.match
, traj.match
, and nlf
functions have been removed entirely. The new approach to parameter estimation based on numerical optimization involves constructing stateful objective functions. See above.mif
and bsmc
methods have been removed.measurement.model
argument to pomp
has been removed. It is now necessary to specify the measurement model directly using rmeasure
and/or dmeasure
.conv.rec
method has been deprecated, replaced by the new traces
method.$
methods to access the slots of pomp
S4 objects has now been removed.The following two examples are taken from the pomp Journal of Statistical Software paper. In each case, we first look at the pomp version <2 implementation, then at the version 2 implementation.
First, the Gompertz model, which has a single latent state variable, \(X\), and a single observable, \(Y\). The model is defined by the relations \[\begin{gathered} S = e^{r \delta}\\ X(t+\delta)\;\sim\;\mathrm{Lognormal}\left(\log(K^{1-S}\,X^S),\sigma\right)\\ Y(t)\;\sim\;\mathrm{Lognormal}\left(\log{X(t)},\tau\right) \end{gathered}\] where \(r\), \(K\), \(\sigma\), and \(\tau\) are parameters and \(\delta\) is the discrete time-step.
Here is how the model was implemented in pomp version <2. We write all the basic model components using R functions assuming \(\delta=1\). We also do one simulation.
## rprocess
function (x, t, params, delta.t, ...) {
gompertz.proc.sim <- exp(rnorm(n=1,mean=0,sd=params["sigma"]))
eps <- exp(-params["r"]*delta.t)
S <-setNames(params["K"]^(1-S)*x["X"]^S*eps,"X")
}
## rmeasure
function (x, t, params, ...) {
gompertz.meas.sim <-setNames(rlnorm(n=1,meanlog=log(x["X"]),sd=params["tau"]),"Y")
}
## dmeasure
function (y, x, t, params, log, ...) {
gompertz.meas.dens <-dlnorm(x=y["Y"],meanlog=log(x["X"]),sdlog=params["tau"],log=log)
}
## initializer
function (t0, params, ...) {
gompertz.init <-setNames(params["X_0"],"X")
}
## Parameter transformations
function (params, ...) log(params)
gompertz.log.tf <- function (params, ...) exp(params)
gompertz.exp.tf <-
## pomp construction
pomp(
data=data.frame(time=1:100, Y=NA),
times="time", t0=0,
rprocess=discrete.time.sim(step.fun=gompertz.proc.sim,delta.t=1),
rmeasure=gompertz.meas.sim,
dmeasure=gompertz.meas.dens,
initializer=gompertz.init,
toEstimationScale=gompertz.log.tf,
fromEstimationScale=gompertz.exp.tf,
params=c(r=0.1,K=1,sigma=0.1,tau=0.1,X_0=1)
gomp1R
) ->
simulate(gomp1R) -> gomp1R
Now, we implement the same model using pomp version 2. Note that initializer
becomes rinit
and that discrete.time.sim
becomes discrete_time
. Also, the parameter transformation syntax is different.
library(pomp)
## rprocess
function (X, r, K, sigma, delta.t, ...) {
gompertz.proc.sim <- exp(rnorm(n=1,mean=0,sd=sigma))
eps <- exp(-r*delta.t)
S <-c(X=K^(1-S)*X^S*eps)
}
## rmeasure
function (X, tau, ...) {
gompertz.meas.sim <-c(Y=rlnorm(n=1,meanlog=log(X),sd=tau))
}
## dmeasure
function (X, tau, Y, log, ...) {
gompertz.meas.dens <-dlnorm(x=Y,meanlog=log(X),sdlog=tau,log=log)
}
## rinit
function (X_0, ...) {
gompertz.init <-c(X=X_0)
}
## pomp construction via simulation
simulate(
times=1:100, t0=0,
rprocess=discrete_time(gompertz.proc.sim,delta.t=1),
rmeasure=gompertz.meas.sim,
dmeasure=gompertz.meas.dens,
rinit=gompertz.init,
partrans=parameter_trans(log=c("K","r","sigma","tau")),
paramnames=c("K","r","sigma","tau"),
params=c(r=0.1,K=1,sigma=0.1,tau=0.1,X_0=1)
gomp2R ) ->
To implement the model using C snippets, in pomp versions <2 we might do the following.
pomp(
data=data.frame(time=1:100, Y=NA),
times="time", t0=0,
rprocess=discrete.time.sim(
step.fun=Csnippet("
double e = rnorm(0,sigma);
double S = exp(-r*dt);
X = exp((1-S)*log(K)+S*log(X)+e);
"),
delta.t=1
), rmeasure=Csnippet("
Y = exp(rnorm(log(X),tau));
"),
dmeasure=Csnippet("
lik = dlnorm(Y,log(X),tau,give_log);
"),
initializer=Csnippet("
X = X_0;
"),
toEstimationScale=Csnippet("
TK = log(K);
Tr = log(r);
Tsigma = log(sigma);
Ttau = log(tau);
TX_0 = log(X_0);
"),
fromEstimationScale=Csnippet("
TK = exp(K);
Tr = exp(r);
Tsigma = exp(sigma);
Ttau = exp(tau);
TX_0 = exp(X_0);
"),
paramnames=c("r","K","sigma","tau","X_0"),
statenames="X",
params=c(r=0.1,K=1,sigma=0.1,tau=0.1,X_0=1)
gomp1C
) ->
simulate(gomp1C) -> gomp1C
In version 2, we would do something like the following. In addition to the differences noted above, notice that we must specify obsnames
here, since the name of the observed variable is not provided in a dummy data frame as in versions <2. Note also that no changes to the C snippets themselves are needed.
simulate(
times=1:100, t0=0,
rprocess=discrete_time(
step.fun=Csnippet("
double e = rnorm(0,sigma);
double S = exp(-r*dt);
X = exp((1-S)*log(K)+S*log(X)+e);
"),
delta.t=1
), rmeasure=Csnippet("
Y = exp(rnorm(log(X),tau));
"),
dmeasure=Csnippet("
lik = dlnorm(Y,log(X),tau,give_log);
"),
rinit=Csnippet("
X = X_0;
"),
partrans=parameter_trans(log=c("K","r","sigma","tau")),
paramnames=c("r","K","sigma","tau","X_0"),
statenames="X",
obsnames="Y",
params=c(r=0.1,K=1,sigma=0.1,tau=0.1,X_0=1)
gomp2C ) ->
This example, also drawn from the Journal of Statistical Software paper, involves covariates and accumulator variables as well as more complex parameter transformations.
## Construct some fake birthrate data.
library(tidyverse)
data.frame(time=seq(-1,11,by=1/12)) %>%
mutate(
birthrate=5e5*bspline_basis(time,nbasis=5)%*%c(0.018,0.019,0.021,0.019,0.015)
birthdat ) ->
## measurement model C snippets
"
rmeas <- cases = rnbinom_mu(theta, rho * H);
"
"
dmeas <- lik = dnbinom_mu(cases, theta, rho * H, give_log);
"
## initializer
"
rinit <- double m = popsize/(S_0 + I_0 + R_0);
S = nearbyint(m*S_0);
I = nearbyint(m*I_0);
R = nearbyint(m*R_0);
H = 0;
Phi = 0;
noise = 0;
"
## rprocess
"
seas.sir.step <- double rate[6];
double dN[6];
double Beta;
double dW;
Beta = exp(b1 + b2 * cos(M_2PI * Phi) + b3 * sin(M_2PI * Phi));
rate[0] = birthrate;
rate[1] = Beta * (I + iota) / popsize;
rate[2] = mu;
rate[3] = gamma;
rate[4] = mu;
rate[5] = mu;
dN[0] = rpois(rate[0] * dt);
reulermultinom(2, S, &rate[1], dt, &dN[1]);
reulermultinom(2, I, &rate[3], dt, &dN[3]);
reulermultinom(1, R, &rate[5], dt, &dN[5]);
dW = rnorm(dt, sigma * sqrt(dt));
S += dN[0] - dN[1] - dN[2];
I += dN[1] - dN[3] - dN[4];
R += dN[3] - dN[5];
Phi += dW;
H += dN[1];
noise += (dW - dt) / sigma;
"
"
toest <- to_log_barycentric(&TS_0,&S_0,3);
Tsigma = log(sigma);
Tiota = log(iota);
"
"
fromest <- from_log_barycentric(&TS_0,&S_0,3);
Tsigma = exp(sigma);
Tiota = exp(iota);
"
data.frame(time=seq(0,10,by=1/52),cases=NA) %>%
pomp(
times="time", t0=-1/52,
covar = birthdat, tcovar = "time",
dmeasure = Csnippet(dmeas),
rmeasure = Csnippet(rmeas),
initializer=Csnippet(rinit),
rprocess = euler.sim(
step.fun = Csnippet(seas.sir.step),
delta.t = 1/52/20
),toEstimationScale=Csnippet(toest),
fromEstimationScale=Csnippet(fromest),
statenames = c("S", "I", "R", "H", "Phi", "noise"),
paramnames = c("gamma", "mu", "theta", "b1", "b2", "b3",
"popsize","rho", "iota", "sigma", "S_0", "I_0", "R_0"),
zeronames = c("H", "noise"),
params = c(popsize = 500000, iota = 5, gamma = 26, mu = 1/50,
b1 = 6, b2 = 0.2, b3 = -0.1, rho = 0.1, theta = 100,
sigma = 0.3, S_0 = 0.055, I_0 = 0.002, R_0 = 0.94)
%>%
) simulate() -> sir1C
When implementing the same model in pomp version 2, we can use almost all the same C snippets without modification. The exception is the C snippets for the parameter transformations. The syntax for the inclusion of the covariates is different as is that used to specify the names of the accumulator variables. For example:
simulate(
times=seq(0,10,by=1/52), t0=-1/52,
covar=covariate_table(birthdat,times="time"),
dmeasure = Csnippet(dmeas),
rmeasure = Csnippet(rmeas),
rinit=Csnippet(rinit),
rprocess = euler(
step.fun = Csnippet(seas.sir.step),
delta.t = 1/52/20
),partrans=parameter_trans(
toEst=Csnippet("
to_log_barycentric(&T_S_0,&S_0,3);
T_sigma = log(sigma);
T_iota = log(iota);"),
fromEst=Csnippet("
from_log_barycentric(&S_0,&T_S_0,3);
sigma = exp(T_sigma);
iota = exp(T_iota);")
),statenames = c("S", "I", "R", "H", "Phi", "noise"),
obsnames=c("cases"),
paramnames = c("gamma", "mu", "theta", "b1", "b2", "b3",
"popsize","rho", "iota", "sigma", "S_0", "I_0", "R_0"),
accumvars = c("H", "noise"),
params = c(popsize = 500000, iota = 5, gamma = 26, mu = 1/50,
b1 = 6, b2 = 0.2, b3 = -0.1, rho = 0.1, theta = 100,
sigma = 0.3, S_0 = 0.055, I_0 = 0.002, R_0 = 0.94)
sir2C ) ->
Note also that, for specifying the above parameter transformations, pomp version 2 provides the alternate form
simulate(
...,partrans=parameter_trans(
log=c("sigma","iota"),
barycentric=c("S_0","I_0","R_0")
),
... sir2C ) ->
This document was produced using pomp version 5.10 and R version 4.4.1.