In pomp version 2, a number of backward incompatible changes have been made. These have been designed to increase the usability of the package by making the interfaces more uniform, by increasing code stability, reducing the number of special cases, and removing many of the idiosyncrasies that were present in earlier versions. The goal has been to keep backward-incompatible changes to a minimum. However, to achieve some improvements, it has been necessary to make some that will break existing code. This guide is intended to explain the new structure and point out the changes that are needed to make existing codes that use pomp work with the new version.
The main novelty of pomp version 2 is that one will
rarely, if ever, need to interact directly with the low-level
pomp constructor. Instead, one can supply new or modify
existing model components at (almost) any stage in a chain of
pomp computations. In particular, there are now
data-frame methods for all of the pomp inference
algorithms, as well as for the so-called elementary POMP
algorithms: simulate, pfilter,
probe, and spect. One can pass data directly
to these methods, along with the requisite model components, and achieve
the same effect as one would by first constructing a object and then
performing the operation.
pomp version 2 represents a thorough reworking of almost all the package codes. A consequence is that these codes have been streamlined, with increases in flexibility and ease of use, at no cost to performance. The following table shows the number of lines in the latest pre-2 pomp compared to the current pomp version 2.
| category | 1.19 | 2.5 | fraction |
|---|---|---|---|
| R | 12150 | 9847 | -0.190 |
| man | 4069 | 6908 | 0.698 |
| src | 5238 | 5567 | 0.063 |
| tests | 4846 | 3677 | -0.241 |
| total | 26303 | 25999 | -0.012 |
It is useful to divide the pomp package functionality into different levels.
rinit: simulator for the initial-state distribution,
i.e., the distribution of the latent state at time t0.rprocess and dprocess: simulator and
density evaluation procedure, respectively, for the process model.rmeasure and dmeasure: simulator and
density evaluation procedure, respectively, for the measurement
model.rprior and dprior: simulator and density
evaluation procedure, respectively, for the prior distribution.skeleton: evaluation of the deterministic
skeleton.partrans: parameter transformations.The term basic model component is meant to refer to the procedure itself, as distinct from the execution of the procedure. The user specifies the procedure (in one of several forms); the package decides when and where to execute the procedure.
Workhorses: R functions, built into the
package, that cause the basic model component procedures to be executed.
Each workhorse has a name that matches that of the corresponding basic
model component. In addition, there is the trajectory
workhorse, which iterates or integrates the deterministic skeleton
(according to whether it is a map or a vectorfield, respectively) to
obtain state trajectories.
Elementary POMP algorithms: algorithms that interrogate the model or the model/data confrontation without attempting to estimate parameters. There are currently four of these:
simulate performs simulations of the POMP model, i.e.,
it samples from the joint distribution of latent states and
observables.pfilter runs a sequential Monte Carlo (particle filter)
algorithm to compute the likelihood and (optionally) estimate the
prediction and filtering distributions of the latent state process.probe computes one or more uni- or multi-variate
summary statistics on both actual and simulated data.spect estimates the power spectral density functions
for the actual and simulated data.POMP estimation algorithms: procedures that build on the elementary algorithms and are used for estimation of parameters and other inferential tasks. There are currently ten of these:
abc: approximate Bayesian computationbsmc2: Liu-West algorithm for Bayesian SMCpmcmc: a particle MCMC algorithmmif2: iterated filtering (IF2)enkf: ensemble Kalman filtereakf: ensemble adjusted Kalman filtertraj_objfun: trajectory matchingspect_objfun: power spectrum matchingprobe_objfun: probe matchingnlf_objfun: nonlinear forecastingObjective function methods: among the estimation
algorithms just listed, four are methods that construct stateful
objective functions that can be optimized using general-purpose
numerical optimization algorithms such as optim,
subplex, or the optimizers in the nloptr
package. These have certain new features that will be described
below.
The manner in which one writes R functions to
specify basic model components has been totally changed. Before, one
wrote functions that took specific arguments such as x,
params, and covars. Now, one writes such
functions with any or all state variables, observables, covariates,
and/or time as arguments. Thus for example, in versions <2, one might
have specified a measurement model density evaluator (‘dmeasure’)
so:
...,
dmeasure = function (y, x, t, params, covars, ..., log) {
dnbinom(x=y["count"],mu=x["s"],size=params["theta"],log=log)
},
...Here, the state variable “s”, passed via the argument x
is the expected value of the observable “count”, which is assumed to be
negative-binomially distributed with size parameter “theta”. The
observables are passed via the vector y and the parameters
via the vector params and so the variables of interest must
be extracted by name from these vectors. Note that the time variable
t is not used but must nevertheless be named as a formal
argument.
In pomp version 2, the corresponding ‘dmeasure’ specification would be
...,
dmeasure = function (count, s, theta, ..., log) {
dnbinom(x=count,mu=s,size=theta,log=log)
},
...Note that there is no longer a need to extract the relevant variables
from vectors. Moreover, the only required argument is ....
The available arguments are taken from the set of observables, state
variables, parameters, covariates, and time, as before.
These remarks apply whenever one specifies a basic model component using an R function. For the most part, C snippets that worked with pomp version <2 will continue to work. The exception is that parameter transformation C snippets may need to be rewritten (see below).
Prior to version 2, one specified the rprocess component
using one of the plugins onestep.sim,
discrete.time.sim, euler.sim,
gillespie.sim, or gillespie.hl.sim. These have
been renamed onestep, discrete_time,
euler, gillespie, and
gillespie_hl, respectively, but are otherwise
unchanged.
Note that, if one uses an R function to specify the process model simulator, the remarks above under “Basic component specification” apply.
In pomp versions <2, one simulated from the distribution of the latent state process via the “initializer” component. As of version 2, this has been renamed “rinit”. The “rinit” component is supplied to pomp functions using the argument of the same name.
If rinit is not explicitly specified, the default
behavior is to treat any parameters ending with suffices _0
or .0 as the values of correspondingly named latent state
variables at time t0.
Prior to version 2, time-varying covariates upon which basic model
components depended were supplied via the two arguments
covar and tcovar to pomp. For
example, in one of the examples, we see
cbind(
time=seq(from=1928,to=1934,by=0.01),
as.data.frame(
periodic_bspline_basis(
x=seq(from=1928,to=1934,by=0.01),
nbasis=3,
degree=3,
period=1,
names="seas%d"
)
),
pop=dat$population
) -> covar
pomp(
...,
covar=covar,
tcovar="time",
...
)Note that, as this example illustrates, it was often necessary to
first construct the covariate table as a data frame before passing it as
argument to pomp.
As of version 2, one includes such time-varying covariates as
specially constructed table via the single argument covar,
which can be furnished to any POMP elementary or estimation algorithm.
One constructs a covariate table using the covariate_table
command. The syntax for specifying a covariate table is quite flexible.
In particular, the arguments to covariate_table are
evaluated sequentially, so that later ones can depend on earlier ones.
Once evaluated, the covariates are bound column-wise into a single data
frame. One can also provide a data frame to
covariate_table, along with the name of the time variable,
which matches the old usage very closely. Thus, one might use the
following in place of the above:
pomp(
...,
covar=covariate_table(
t=seq(from=1928,to=1934,by=0.01),
seas=periodic_bspline_basis(t,nbasis=3,degree=3,period=1),
pop=dat$population,
times="t"
),
...
)Prior to version 2, covariates were always linearly interpolated when
furnished to any of the basic model components. As of version 2,
although linear interpolation remains the default, one can also use the
order argument to direct that the covariates be treated as
piecewise-constant, right-continuous functions.
In pomp version <2, one specified parameter
transformations by means to two arguments to pomp:
fromEstimationScale and toEstimationScale. As
of version 2, both forward and inverse transformations are encapsulated
in a single object passed via the partrans argument to any
of the pomp elementary or inference algorithms. The
parameter transformation object is constructed by means of a call to
parameter_trans. One can specify general forward and
inverse transformations via the fromEst and
toEst arguments to this function. If these are specified
using R functions, the remarks above (“Basic component
specification”) apply. If one specifies these using C snippets, the
syntax has changed from that of versions <2. In particular, when
writing C snippets for parameter transformations, for a parameter “p”,
the notation p always refers to p on the
natural scale and T_p refers to its value of p
on the estimation scale. Thus, if parameter “alpha” is to be
log-transformed for estimation, the toEst snippet would
contain the line
and the fromEst snippet would include
Because many use cases involve simply log, logit, or log-barycentric
transformations, one can handle these cases without needing to write
toEst and fromEst C snippets, by naming the
parameters to be so transformed in the log,
logit, and barycentric arguments of
parameter_trans. When these arguments are supplied,
parameter_trans internally writes a pair of C snippets to
implement the transformations. As with other cases when one refers to
parameters in C snippets, one must include all parameters named in the
paramnames argument.
At any point, if one sets partrans = NULL in an
elementary or inference algorithm, the parameter transformations are
reset to the identity transformation.
In pomp versions <2, one named accumulator
variables in the zeronames argument of pomp.
This argument has been renamed accumvars: it can be
supplied to any pomp elementary or estimation
algorithm.
simulateIn pomp versions <2, if one wanted to simulate a
POMP model, one had to construct a ‘pomp’ first, via a call to
pomp. The package structure necessitated that one provide a
dummy data frame to pomp, a fairly artificial proceeding.
As of version 2, one can simulate a POMP model without reference to any
data at all. One simply calls simulate with the necessary
arguments supplied. The needed arguments include times,
t0, and rprocess. One can optionally provide
rinit and rmeasure.
The format of the expressions returned by simulate is
determined by the new format argument. The old
obs, states, and as.data.frame
arguments have been eliminated. See ?simulate for more
information.
simulate now returns more informative results when
simulations from multiple parameter sets are simultaneously computed.
Specifically, if on the call to simulate,
params has column names, these are used to identify the
resulting simulations. Thus when format = "pomps" (the
default), the names of the resulting list of pomps will be constructed
from the column names of params. Likewise, when
format = "arrays", the resulting arrays will have
informative column names and when format = "data.frame",
the identifier variable will make use of the column names.
pfilter and bsmc2The basic particle filter, pfilter, has a simpler mode
of operation. In calls to pfilter, params
should be a single parameter set only. That is, one can no longer
possible to pass a matrix of parameters to pfilter.
Similarly, the Liu-West algorithm, bsmc2, has a simpler
mode of operation. In calls to bsmc2, params
should be a single parameter set only. The requisite Np
particles are drawn from the distribution specified by the “rprior”
basic model component.
In pomp versions <2, several of the estimation
algorithms had a logical transform option. Setting
transform = TRUE caused estimation to be performed on the
transformed scale, if parameter transformations had been supplied. As of
version 2, transformations are automatically applied when it is
appropriate to do so and when they exist. One can remove transformations
by simply setting partrans = NULL in the call to any
pomp estimation algorithm.
mif2 defaultsThe default cooling schedule (cooling.type) in
mif2 is now “geometric”, in contrast to “hyperbolic”, as
before.
params not startTo specify the parameters used to start an iterative estimation
algorithm, such as mif2, pmcmc, or
abc, use the params argument. In
pomp versions < 2, these functions had an argument
named start for this purpose.
In pomp versions <2, the functions
probe.match, spect.match,
traj.match, and nlf called numerical
optimizers (e.g., optim or optimizers from the
subplex or nloptr packages) to
estimate model parameters by minimizing their respective objective
functions. As of version 2, these functions are no longer part of the
package. Their functionality has been replaced by a new set of
objective function methods.
The functions probe_objfun, spect_objfun,
traj_objfun, and nlf_objfun construct
stateful objective functions. These functions can be passed to
any optim-like numerical minimization routine. This is a
large increase in the flexibility of the package, since one is free to
choose essentially any minimization routine.
The fact that the objective functions are stateful means that each such function stores in memory the results of the last time it was called. This makes it very easy to extract information about the fitted model regardless of the optimization routine. See the help documentation for examples.
In pomp version <2, attempts to use a basic model
component that had not been defined by the user resulted in an error
message. The exception to this was the initial latent state sampler
initializer which had a default setting. As of version 2,
all the basic model components now have defaults.
rinit behavior remains as it was: it
assumes the initial state distribution is concentrated at a point mass
determined by parameters with .0 or _0
suffices.dprocess and rprocess will result in missing
values (NA).dprior result in 1 (0 if log = TRUE), and all
calls to rprior result in NA.From the user’s point of view, this means, for example, that a call
to simulate when rmeasure has not been
specified will result in either empty or missing observables. If
rprocess has not been specified, then a call to
simulate will likewise return either zero latent state
variables, or latent state variables with NA values.
skeletonIn skeleton, the t argument has been
replaced by times, to make this uniform with the other
workhorse functions.
trajectoryThe as.data.frame argument to trajectory
has been removed in favor of a new format argument that
allows one to choose between receiving the results in the form of an
array or a data frame.
When trajectory calls on deSolve routines
to numerically integrate a model vectorfield, more informative error
messages are generated, and diagnostics are printed when
verbose = TRUE.
flowA new workhorse has been introduced, similar to
trajectory but at a lower level. The flow
function iterates or integrates the deterministic skeleton to return
trajectories. See ?flow for details.
rprocessThe call to rprocess has changed. One now retrieves
simulations of the latent state process by doing
rprocess(object, x0, t0, times, params)
where object, as usual, is the pomp, x0 is
the named vector or matrix with rownames giving the state of the process
at time t0, times are the times at which one
desires simulated states, and params are the
parameters.
The pompExample function has been eliminated. The suite
of toy examples has been enlarged, but these are accessed via calls to
ordinary package functions. Thus, for example, one creates the Ricker
toy model via a call to ricker() and the SIR model
simulated via Euler’s method via a call to sir(). See the
documentation for a full list of toy examples
A number of new datasets are provided with the package. See
data(package="pomp") for a full list.
probe.match, spect.match,
traj.match, and nlf functions have been
removed entirely. The new approach to parameter estimation based on
numerical optimization involves constructing stateful objective
functions. See above.mif and bsmc
methods have been removed.measurement.model argument to pomp has
been removed. It is now necessary to specify the measurement model
directly using rmeasure and/or dmeasure.conv.rec method has been deprecated, replaced by
the new traces method.$ methods to access the slots of
pomp S4 objects has now been removed.The following two examples are taken from the pomp Journal of Statistical Software paper. In each case, we first look at the pomp version <2 implementation, then at the version 2 implementation.
First, the Gompertz model, which has a single latent state variable, \(X\), and a single observable, \(Y\). The model is defined by the relations \[\begin{gathered} S = e^{r \delta}\\ X(t+\delta)\;\sim\;\mathrm{Lognormal}\left(\log(K^{1-S}\,X^S),\sigma\right)\\ Y(t)\;\sim\;\mathrm{Lognormal}\left(\log{X(t)},\tau\right) \end{gathered}\] where \(r\), \(K\), \(\sigma\), and \(\tau\) are parameters and \(\delta\) is the discrete time-step.
Here is how the model was implemented in pomp version <2. We write all the basic model components using R functions assuming \(\delta=1\). We also do one simulation.
## rprocess
gompertz.proc.sim <- function (x, t, params, delta.t, ...) {
eps <- exp(rnorm(n=1,mean=0,sd=params["sigma"]))
S <- exp(-params["r"]*delta.t)
setNames(params["K"]^(1-S)*x["X"]^S*eps,"X")
}
## rmeasure
gompertz.meas.sim <- function (x, t, params, ...) {
setNames(rlnorm(n=1,meanlog=log(x["X"]),sd=params["tau"]),"Y")
}
## dmeasure
gompertz.meas.dens <- function (y, x, t, params, log, ...) {
dlnorm(x=y["Y"],meanlog=log(x["X"]),sdlog=params["tau"],log=log)
}
## initializer
gompertz.init <- function (t0, params, ...) {
setNames(params["X_0"],"X")
}
## Parameter transformations
gompertz.log.tf <- function (params, ...) log(params)
gompertz.exp.tf <- function (params, ...) exp(params)
## pomp construction
pomp(
data=data.frame(time=1:100, Y=NA),
times="time", t0=0,
rprocess=discrete.time.sim(step.fun=gompertz.proc.sim,delta.t=1),
rmeasure=gompertz.meas.sim,
dmeasure=gompertz.meas.dens,
initializer=gompertz.init,
toEstimationScale=gompertz.log.tf,
fromEstimationScale=gompertz.exp.tf,
params=c(r=0.1,K=1,sigma=0.1,tau=0.1,X_0=1)
) -> gomp1R
simulate(gomp1R) -> gomp1RNow, we implement the same model using pomp version
2. Note that initializer becomes rinit and
that discrete.time.sim becomes discrete_time.
Also, the parameter transformation syntax is different.
library(pomp)
## rprocess
gompertz.proc.sim <- function (X, r, K, sigma, delta.t, ...) {
eps <- exp(rnorm(n=1,mean=0,sd=sigma))
S <- exp(-r*delta.t)
c(X=K^(1-S)*X^S*eps)
}
## rmeasure
gompertz.meas.sim <- function (X, tau, ...) {
c(Y=rlnorm(n=1,meanlog=log(X),sd=tau))
}
## dmeasure
gompertz.meas.dens <- function (X, tau, Y, log, ...) {
dlnorm(x=Y,meanlog=log(X),sdlog=tau,log=log)
}
## rinit
gompertz.init <- function (X_0, ...) {
c(X=X_0)
}
## pomp construction via simulation
simulate(
times=1:100, t0=0,
rprocess=discrete_time(gompertz.proc.sim,delta.t=1),
rmeasure=gompertz.meas.sim,
dmeasure=gompertz.meas.dens,
rinit=gompertz.init,
partrans=parameter_trans(log=c("K","r","sigma","tau")),
paramnames=c("K","r","sigma","tau"),
params=c(r=0.1,K=1,sigma=0.1,tau=0.1,X_0=1)
) -> gomp2RTo implement the model using C snippets, in pomp versions <2 we might do the following.
pomp(
data=data.frame(time=1:100, Y=NA),
times="time", t0=0,
rprocess=discrete.time.sim(
step.fun=Csnippet("
double e = rnorm(0,sigma);
double S = exp(-r*dt);
X = exp((1-S)*log(K)+S*log(X)+e);
"),
delta.t=1
),
rmeasure=Csnippet("
Y = exp(rnorm(log(X),tau));
"),
dmeasure=Csnippet("
lik = dlnorm(Y,log(X),tau,give_log);
"),
initializer=Csnippet("
X = X_0;
"),
toEstimationScale=Csnippet("
TK = log(K);
Tr = log(r);
Tsigma = log(sigma);
Ttau = log(tau);
TX_0 = log(X_0);
"),
fromEstimationScale=Csnippet("
TK = exp(K);
Tr = exp(r);
Tsigma = exp(sigma);
Ttau = exp(tau);
TX_0 = exp(X_0);
"),
paramnames=c("r","K","sigma","tau","X_0"),
statenames="X",
params=c(r=0.1,K=1,sigma=0.1,tau=0.1,X_0=1)
) -> gomp1C
simulate(gomp1C) -> gomp1CIn version 2, we would do something like the following. In addition
to the differences noted above, notice that we must specify
obsnames here, since the name of the observed variable is
not provided in a dummy data frame as in versions <2. Note also that
no changes to the C snippets themselves are needed.
simulate(
times=1:100, t0=0,
rprocess=discrete_time(
step.fun=Csnippet("
double e = rnorm(0,sigma);
double S = exp(-r*dt);
X = exp((1-S)*log(K)+S*log(X)+e);
"),
delta.t=1
),
rmeasure=Csnippet("
Y = exp(rnorm(log(X),tau));
"),
dmeasure=Csnippet("
lik = dlnorm(Y,log(X),tau,give_log);
"),
rinit=Csnippet("
X = X_0;
"),
partrans=parameter_trans(log=c("K","r","sigma","tau")),
paramnames=c("r","K","sigma","tau","X_0"),
statenames="X",
obsnames="Y",
params=c(r=0.1,K=1,sigma=0.1,tau=0.1,X_0=1)
) -> gomp2CThis example, also drawn from the Journal of Statistical Software paper, involves covariates and accumulator variables as well as more complex parameter transformations.
## Construct some fake birthrate data.
library(tidyverse)
data.frame(time=seq(-1,11,by=1/12)) %>%
mutate(
birthrate=5e5*bspline_basis(time,nbasis=5)%*%c(0.018,0.019,0.021,0.019,0.015)
) -> birthdat## measurement model C snippets
rmeas <- "
cases = rnbinom_mu(theta, rho * H);
"
dmeas <- "
lik = dnbinom_mu(cases, theta, rho * H, give_log);
"
## initializer
rinit <- "
double m = popsize/(S_0 + I_0 + R_0);
S = nearbyint(m*S_0);
I = nearbyint(m*I_0);
R = nearbyint(m*R_0);
H = 0;
Phi = 0;
noise = 0;
"
## rprocess
seas.sir.step <- "
double rate[6];
double dN[6];
double Beta;
double dW;
Beta = exp(b1 + b2 * cos(M_2PI * Phi) + b3 * sin(M_2PI * Phi));
rate[0] = birthrate;
rate[1] = Beta * (I + iota) / popsize;
rate[2] = mu;
rate[3] = gamma;
rate[4] = mu;
rate[5] = mu;
dN[0] = rpois(rate[0] * dt);
reulermultinom(2, S, &rate[1], dt, &dN[1]);
reulermultinom(2, I, &rate[3], dt, &dN[3]);
reulermultinom(1, R, &rate[5], dt, &dN[5]);
dW = rnorm(dt, sigma * sqrt(dt));
S += dN[0] - dN[1] - dN[2];
I += dN[1] - dN[3] - dN[4];
R += dN[3] - dN[5];
Phi += dW;
H += dN[1];
noise += (dW - dt) / sigma;
"
toest <- "
to_log_barycentric(&TS_0,&S_0,3);
Tsigma = log(sigma);
Tiota = log(iota);
"
fromest <- "
from_log_barycentric(&TS_0,&S_0,3);
Tsigma = exp(sigma);
Tiota = exp(iota);
"data.frame(time=seq(0,10,by=1/52),cases=NA) %>%
pomp(
times="time", t0=-1/52,
covar = birthdat, tcovar = "time",
dmeasure = Csnippet(dmeas),
rmeasure = Csnippet(rmeas),
initializer=Csnippet(rinit),
rprocess = euler.sim(
step.fun = Csnippet(seas.sir.step),
delta.t = 1/52/20
),
toEstimationScale=Csnippet(toest),
fromEstimationScale=Csnippet(fromest),
statenames = c("S", "I", "R", "H", "Phi", "noise"),
paramnames = c("gamma", "mu", "theta", "b1", "b2", "b3",
"popsize","rho", "iota", "sigma", "S_0", "I_0", "R_0"),
zeronames = c("H", "noise"),
params = c(popsize = 500000, iota = 5, gamma = 26, mu = 1/50,
b1 = 6, b2 = 0.2, b3 = -0.1, rho = 0.1, theta = 100,
sigma = 0.3, S_0 = 0.055, I_0 = 0.002, R_0 = 0.94)
) %>%
simulate() -> sir1CWhen implementing the same model in pomp version 2, we can use almost all the same C snippets without modification. The exception is the C snippets for the parameter transformations. The syntax for the inclusion of the covariates is different as is that used to specify the names of the accumulator variables. For example:
simulate(
times=seq(0,10,by=1/52), t0=-1/52,
covar=covariate_table(birthdat,times="time"),
dmeasure = Csnippet(dmeas),
rmeasure = Csnippet(rmeas),
rinit=Csnippet(rinit),
rprocess = euler(
step.fun = Csnippet(seas.sir.step),
delta.t = 1/52/20
),
partrans=parameter_trans(
toEst=Csnippet("
to_log_barycentric(&T_S_0,&S_0,3);
T_sigma = log(sigma);
T_iota = log(iota);"),
fromEst=Csnippet("
from_log_barycentric(&S_0,&T_S_0,3);
sigma = exp(T_sigma);
iota = exp(T_iota);")
),
statenames = c("S", "I", "R", "H", "Phi", "noise"),
obsnames=c("cases"),
paramnames = c("gamma", "mu", "theta", "b1", "b2", "b3",
"popsize","rho", "iota", "sigma", "S_0", "I_0", "R_0"),
accumvars = c("H", "noise"),
params = c(popsize = 500000, iota = 5, gamma = 26, mu = 1/50,
b1 = 6, b2 = 0.2, b3 = -0.1, rho = 0.1, theta = 100,
sigma = 0.3, S_0 = 0.055, I_0 = 0.002, R_0 = 0.94)
) -> sir2CNote also that, for specifying the above parameter transformations, pomp version 2 provides the alternate form
simulate(
...,
partrans=parameter_trans(
log=c("sigma","iota"),
barycentric=c("S_0","I_0","R_0")
),
...
) -> sir2CThis document was produced using pomp version 6.3.0.0 and R version 4.5.0.