pmcmc
chainsThe R codes in this script are available for download. This work is licensed under the Creative Commons attribution-noncommercial license. Please share and remix noncommercially, mentioning its origin.
This tutorial aims to help you get started using pomp as a suite of tools for analysis of time series data based on stochastic dynamical systems models. First, we give some conceptual background regarding the class of models—partially observed Markov processes (POMPs)—that pomp handles. We then discuss some preliminaries: installing the package and so on. Next, we show how to simulate a POMP using pomp. We then analyze some data using a few different tools. In so doing, we illustrate some of the package’s capabilities by using its algorithms to fit, compare, and criticize the models using pomp’s algorithms. From time to time, exercises for the reader are given.
As its name implies pomp is focused on partially observed Markov process models. These are also known as state-space models, hidden Markov models, and stochastic dynamical systems models. Such models consist of an unobserved Markov state process, connected to the data via an explicit model of the observation process. We refer to the former as the latent process model (or process model for short) and the latter as the measurement model.
Mathematically, each model is a probability distribution. Let Yn denote the measurement process and Xn the latent state process, then by definition, the process model is determined by the density fXn|Xn−1 and the initial density fX0. The measurement process is determined by the density fYn|Xn. These two submodels determine the full POMP model, i.e., the joint distribution fX0:N,Y1:N. If we have a sequence of measurements, y∗n, made at times tn, n=1,…,N, then we think of these data, collectively, as a single realization of the Y process.
The following schematic shows shows causal relations among the process model, the measurement model, and the data. From the statistical point of view, the key perspective is that the model is, at least hypothetically, the process that generated the data.
Mathematically, a POMP is characterized by two conditions.
These conditions can be represented schematically by the following diagram, which indicates the direct dependencies among model variables.
To implement a POMP model in pomp, we have to specify the measurement and process distributions. Note however, that, for each of the process and the measurement models there are two distinct operations that we might desire to perform:
Following the R convention, we refer to the simulation of fXn|Xn−1 as the rprocess component of the POMP model and the evaluation of fXn|Xn−1 as the dprocess component. Similarly, the simulation of fYn|Xn is the rmeasure component while the evaluation of fYn|Xn is the dmeasure component. Finally, we’ll call a simulator of fX0 the rinit component. Collectively, we’ll refer to these, and other, similarly basic elements of the model, as the model’s basic components.
A method that makes no use of the dprocess component is said to be “plug-and-play” or to have the “plug-and-play property”. At present, pomp is focused on such methods, so there is no reason to discuss the dprocess component further in this document. In the following, we will illustrate and explain how one specifies the rprocess, rmeasure, and dmeasure components of a model in pomp. We will illustrate this using some simple examples.
To get started, we must install pomp, if it is not already installed. This package cannot yet be downloaded from CRAN (though that will change in the near future). However, the latest version is always available at the package homepage on Github. See the package website for installation instructions.
In this document, we will ultimately learn to implement POMP models using the package’s “C snippet” facility. This allows the user to write model components using snippets of C code, which is then compiled and linked into a running R session. This typically affords a manyfold increase in computation time. It is possible to avoid C snippets entirely by writing model components as R functions, and we will begin by doing so, but the speed-ups afforded by C snippets are typically too good to pass up. To use C snippets, you must be able to compile C codes. Compilers are not by default installed on Windows or Mac systems, so users of such systems must do a bit more work to make use of pomp’s facilities. The installation instructions on the package website give details.
Having dispensed with the preliminaries, we now explore some of the functionality provided by pomp. To assist the reader in following this exploration, the R codes for this document are available.
Let us see how to implement a very simple POMP model. In particular, let’s begin by focusing on the famous Ricker model (Ricker 1954), which posits a nonlinear relationship between the size, N(t), of a population in year t and its size, N(t+1), one year later: N(t+1)=rN(t)exp(1−N(t)K). Here, r and K are constant parameters, usually termed the intrinsic growth rate and the carrying capacity, respectively. As written, this is a deterministic model: it does not allow for any variability in the population dynamics. Let’s modify the Ricker equation by assuming that r is not constant, but instead has random variation from year to year. If we assume that the intrinsic growth rate varies from year to year as a lognormal random variable, we obtain N(t+1)=rN(t)exp(1−N(t)K+ε(t)), where ε(t)∼Normal(0,σ). Note that we’ve introduced a new parameter, σ, which quantifies the intensity of the noise in the population dynamics. Ecologically speaking, Eq. 2 is a model with environmental stochasticity.
Typically, it is relatively straightforward to simulate a POMP model. To accomplish this in pomp, as we’ve already discussed, we specify the rprocess component of the model. We’ll also need to choose values for the model parameters, r, K, and σ. We’ll also need to make a choice regarding the initial condition, N(0). The simplest choice is to treat N(0)=N0 as a parameter.
library(pomp)
simulate(t0=0, times=1:20,
params=c(r=1.2,K=200,sigma=0.1,N_0=50),
rinit=function (N_0, ...) {
c(N=N_0)
},rprocess=discrete_time(
function (N, r, K, sigma, ...) {
rnorm(n=1,mean=0,sd=sigma)
eps <-c(N=r*N*exp(1-N/K+eps))
},delta.t=1
) sim1 ) ->
## Warning: 'rmeasure' unspecified: NAs generated.
Notice that we’ve specified the rinit and rprocess components of the model as functions. These functions take as arguments the relevant variables (whether these are state variables or parameters). Importantly, they return named numeric vectors. Names of variables and parameters are very important in pomp. Notice too that we’ve used the discrete_time
function, which encodes the fact that our Ricker model is a discrete-time stochastic process (a Markov chain). The first argument of discrete_time
is an R function encoding Eq. 2; the second argument specifies the discrete time-step.
Note also that the parameters are furnished in the form of a named vector, and that we’ve specified both t0
and times
. The former is the initial time, t0, i.e., the time at which the initial conditions apply. Since our initial condition is that N(0)=N0, our initial time is t0=0. The times
argument specifies the observation times t1,…,tN.
Finally, note that we received a warning about NA
values being generated. We will soon see what this is about.
What sort of an object is sim1
? If we print it, we see
sim1
## <object of class 'pomp'>
sim1
is evidently an object of class ‘pomp’. We refer to these as ‘pomp’ objects.
To get more insight into the structure of sim1
, we can use spy
:
spy(sim1)
pomp provides methods for plotting ‘pomp’ objects. For example,
plot(sim1)
We can also recast a ‘pomp’ object as a data frame:
as(sim1,"data.frame")
time | N |
---|---|
1 | 148.3456 |
2 | 274.8027 |
3 | 225.5831 |
4 | 209.7361 |
5 | 248.0677 |
6 | 276.0464 |
7 | 196.4977 |
8 | 248.0970 |
9 | 214.0114 |
10 | 207.6063 |
11 | 258.5272 |
12 | 245.7282 |
13 | 180.2086 |
14 | 234.7697 |
15 | 244.2032 |
16 | 268.4648 |
17 | 199.6789 |
18 | 211.7116 |
19 | 254.1603 |
20 | 222.1691 |
Casting the ‘pomp’ object as a data frame allows us to use ggplot2 graphics:
ggplot(data=as.data.frame(sim1),aes(x=time,y=N))+
geom_line()
Tp return to the warning we got when we ran simulate
: it was telling us that simulate
could not make a random draw from the measurement process because we had not supplied it with any information about this process. In particular, we had not supplied a measurement-model simulator. Let’s now see how to specify the measurement-model simulator, or rmeasure.
Let’s suppose that non-lethal traps are used to sample the population to determine its size. Each year, some number of traps are set out and Yt is the number of animals captured. We might posit Yt∼Poisson(bN(t)), where the parameter b is proportional to sampling effort, e.g., the number of traps. This is a measurement model, and we can implement a simulator for it by specifying another function:
simulate(t0=0, times=1:20,
params=c(r=1.2,K=200,sigma=0.1,N_0=50,b=0.05),
rinit=function (N_0, ...) {
c(N=N_0)
},rprocess=discrete_time(
function (N, r, K, sigma, ...) {
rnorm(n=1,mean=0,sd=sigma)
eps <-c(N=r*N*exp(1-N/K+eps))
},delta.t=1
),rmeasure=function (N, b, ...) {
c(Y=rpois(n=1,lambda=b*N))
} sim2 ) ->
Note that, again, the rmeasure function need take only the necessary arguments (and ...
) and must return a named numeric vector.
Now, in the preceding chunk of code where we construct sim2
, there was some redundancy with our earlier construction of sim1
. In particular, we specified the same values of t0
, times
, rinit
, and rprocess
as before. Since these specifications were stored in sim1
, however, we could have simply added in just the new pieces. For example,
simulate(
sim1,params=c(r=1.2,K=200,sigma=0.1,N_0=50,b=0.05),
rmeasure=function (N, b, ...) {
c(Y=rpois(n=1,lambda=b*N))
} sim2 ) ->
As before, we can examine our handiwork:
spy(sim2)
as(sim2,"data.frame")
time | Y | N |
---|---|---|
1 | 11 | 124.0831 |
2 | 13 | 200.2960 |
3 | 7 | 267.4025 |
4 | 8 | 190.2190 |
5 | 12 | 195.8285 |
6 | 15 | 293.0790 |
7 | 14 | 216.3693 |
8 | 10 | 217.2981 |
9 | 12 | 269.7404 |
10 | 16 | 223.0512 |
11 | 11 | 200.4050 |
12 | 18 | 255.2297 |
13 | 12 | 231.3550 |
14 | 12 | 197.5631 |
15 | 14 | 263.8953 |
16 | 14 | 277.9194 |
17 | 15 | 197.6269 |
18 | 14 | 292.5811 |
19 | 13 | 250.8353 |
20 | 12 | 206.4404 |
plot(sim2)
ggplot(data=pivot_longer(as(sim2,"data.frame"),-time),
aes(x=time,y=value,color=name))+
geom_line()
Notice that now our simulate
call produces samples from both the N and the Y distributions. simulate
will try to sample from the joint distribution of latent states and observables, but will sample from just the latent state process if the rmeasure component is undefined.
We can run multiple simulations using the same parameters:
simulate(sim2,nsim=20) -> sims
ggplot(data=pivot_longer(
as.data.frame(sims),
c(Y,N)
),aes(x=time,y=value,color=name,
group=interaction(.L1,name)))+
geom_line()+
facet_grid(name~.,scales="free_y")+
labs(y="",color="")
We can also run a single simulation from multiple parameters:
parmat(coef(sim2),3)
p <-"sigma",] <- c(0.05,0.25,1)
p[colnames(p) <- LETTERS[1:3]
simulate(sim2,params=p,format="data.frame") -> sims
pivot_longer(sims,c(Y,N))
sims <-ggplot(data=sims,aes(x=time,y=value,color=name,
group=interaction(.id,name)))+
geom_line()+
scale_y_log10()+
expand_limits(y=1)+
facet_grid(name~.id,scales="free_y")+
labs(y="",color="")
In the above, coef
extracts the parameter vector stored in sim2
. parmat
makes a matrix with columns that are copies this vector. The second line modifies this matrix so that the columns are a set of points in parameter space that lie along a line with increasing values of the σ parameter.
We can even run multiple simulations at each one of a set of parameters:
simulate(sim2,params=p,
times=seq(0,3),
nsim=500,format="data.frame") -> sims
ggplot(data=separate(sims,.id,c("parset","rep")),
aes(x=N,fill=parset,group=parset,color=parset))+
geom_density(alpha=0.5)+
# geom_histogram(aes(y=..density..),position="dodge")+
facet_grid(time~.,labeller=label_both,scales="free_y")+
lims(x=c(NA,1000))
pomp uses certain syntactic conventions. For example, if x
is an object and f
is a pomp function, then typically f(x,...) -> y
yields an object, y
, that contains x
plus the results of the f
operation, together with additional information pertinent to what f
did. Here, ...
stands for additional arguments to f
. Then, if we call f(y)
, this operation will perform some kind of repeat of the original operation: though the details will vary with f
and y
, choices made in the original computation (on x
) will typically be respected. To change some of these choices, one can do f(y,...)
, where the ...
stands for options of f
that correspond to the new choices.
Morever, since y
contains so much information, this information will typically be reused when we apply a different function, g
, to y
, when it makes sense to do so.
As of version 4.1 R provides a new pipe syntax that is especially convenient for us. It allows us to construct pipelines of R commands, which leads (once one is used to it) to code that is easier to read. In brief, a command in ordinary R style, such as
h(g(f(x,...),...),...)
where f
, g
, h
are R functions, x
is some R object, and ...
represents additional arguments to each of the functions, must be read from the inside out, which is at odds with the sequential nature of the f
→ g
→ h
computation. An alternative, of course, is to write something like
y <- f(x,...)
z <- g(y,...)
w <- h(z,...)
which multiplies the entities (y
, z
, w
) one must name and keep track of. In contrast, the new pipeline syntax of R allows us to write
x |> f(...) |> g(...) |> h(...)
The object-oriented structure of pomp makes this kind of programming quite natural.
Some experienced R programmers find the pipeline syntax uncomfortable or unnecessary at first, and debugging pipelined code requires a somewhat different approach. Of course, it is not necessary to adopt this style of programming to use pomp, but it is quite natural and, experience shows, quite addictive!
We will use pipelines freely for the remainder of the document.
Important note: If you opt to load the tidyverse (or individual packages therein), be sure to load pomp after loading these packages. There are some name conflicts between the packages that would otherwise cause pomp functions to be masked.
Modify the sim2
object constructed above to change the measurement model. In particular, assume that Yt∼NegBin(bN(t),θ), where b and θ are parameters. (By this notation, we understand that Yt is negative binomially distributed with mean bN(t) and variance bN(t)+(bN(t))2/θ.) Make and plot some simulations for t=20,…,50 with different values of the θ parameter. [Hint: use the rnbinom
function with the mu
, size
parameterization.]
We will illustrate some of the pomp data-analysis algorithms by performing a limited analysis on a set of bird abundance data. The data are from annual censuses of a population of Parus major (Great Tit) in Wytham Wood, Oxfordshire. They were retrieved as dataset #10163 from the Global Population Dynamics Database version 2 (NERC Centre for Population Biology, Imperial College, 2010). The original source is McCleery and Perrins (1991). They are provided as part of the package, in the data frame parus
. Here we display the data graphically and in a table:
|>
parus ggplot(aes(x=year,y=pop))+
geom_line()+geom_point()+
expand_limits(y=0)
parus
year | pop |
---|---|
1960 | 148 |
1961 | 258 |
1962 | 185 |
1963 | 170 |
1964 | 267 |
1965 | 239 |
1966 | 196 |
1967 | 132 |
1968 | 167 |
1969 | 186 |
1970 | 128 |
1971 | 227 |
1972 | 174 |
1973 | 177 |
1974 | 137 |
1975 | 172 |
1976 | 119 |
1977 | 226 |
1978 | 166 |
1979 | 161 |
1980 | 199 |
1981 | 306 |
1982 | 206 |
1983 | 350 |
1984 | 214 |
1985 | 175 |
1986 | 211 |
The basic data-type provided by pomp—the ‘pomp’ object—is designed as a container for a model and data. Let’s construct such an object with these data, together with the Ricker model we’ve already implemented. We do this with a call to the constructor function pomp
:
|>
parus pomp(
times="year", t0=1960,
rinit=function (N_0, ...) {
c(N=N_0)
},rprocess=discrete_time(
function (N, r, K, sigma, ...) {
rnorm(n=1,mean=0,sd=sigma)
eps <-c(N=r*N*exp(1-N/K+eps))
},delta.t=1
),rmeasure=function (N, b, ...) {
c(pop=rpois(n=1,lambda=b*N))
} rick ) ->
Notice that the measurement model simulator (rmeasure) is not the same as before, reflecting the fact that the data are named pop
, not Y
as before. Also notice that we specify the time variable by name, thus allowing the data to dictate the observation times. The t0
argument specifies that the stochastic latent state process is initialized in year 1960.
We’ve already shown how to implement a discrete-time model; let’s now see how to implement a continuous-time model. We’ll implement a very simple model of stable but stochastic population dynamics, the logistic, or Verhulst-Pearl, equation with environmental stochasticity. We’ll write this model as a stochastic differential equation (SDE), specifically an Itô diffusion: dN=rN(1−NK)dt+σNdW(t), where N is the population size, r is a fixed rate, the so-called “Malthusian parameter”, K is the population’s “carrying capacity”, σ describes the intensity of extrinsic stochastic forcing on the system, and dW is an increment of a standard Wiener process. [Those unfamiliar with Wiener processes and Itô diffusions will not go far wrong thinking of dW(t), for each time t, as a normal random variable with mean zero and standard deviation √dt.] To implement this model in pomp, we must tell the package how to simulate this model. The easiest way to simulate such an SDE is via the Euler-Maruyama method. In this approximation, we take a large number of very small steps, each of duration Δt. At each step, we hold the right-hand side of the above equation constant, compute ΔN using that equation, and increment N accordingly. pomp gives us the euler
function to assist us in implementing the Euler-Maruyama method. To use it, we must encode the computations that take a single step. As before, we can do so by writing a function.
function (N, r, K, sigma, delta.t, ...) {
vpstep <- rnorm(n=1,mean=0,sd=sqrt(delta.t))
dW <-c(N = N + r*N*(1-N/K)*delta.t + sigma*N*dW)
}
This function computes the value of N(t+Δt) given the value of N(t), Δt, and the parameters.
We fold this with the data into a ‘pomp’ object via a call to the ‘pomp’-object constructor function, pomp
. We’ll also include the same measurement model we used before. Because everything but the ‘rprocess’ component is the same as with the Ricker model, to accomplish this, we can simply do
|> pomp(rprocess=euler(vpstep,delta.t=1/365)) -> vp rick
Notice that we’ve specified an Euler-Maruyama step of about 1 day: it will take 365 of these steps to get us from one observation to the next.
As before, we can plot this ‘pomp’ object, recast it as a data frame, and simulate it for any given choice of the parameters.
The following codes produce several simulations for parameters r=0.5, K=2000, σ=0.1 and b=0.1, and plot them on the same axes as the data. Notice that the format="data.frame"
and include.data=TRUE
options facilitate this. Vary the parameters to try to achieve a better fit to the data, as judged purely “by eye”.
|>
vp simulate(
params=c(r=0.5,K=2000,sigma=0.1,b=0.1,N_0=2000),
format="data.frame", include.data=TRUE, nsim=5) |>
mutate(ds=case_when(.id=="data"~"data",TRUE~"simulation")) |>
ggplot(aes(x=year,y=pop,group=.id,color=ds))+
geom_line()+
labs(color="")
To this point, we’ve implemented two models by specifying their basic components as R functions. While this has the virtue of transparency, it puts severe constraints on computational performance due to intrinsic limits in the speed with which R codes can be interpreted. If all we want to do is run a few simulations, this is not a problem. As we’ll see, however, in attempting to perform parameter estimation and other inferences, we will need all the speed we can readily get. We achieve potentially massive speed-ups by implementing our basic model components in a language that can be compiled. pomp makes this easy. The key innovation is the C snippet. Let’s see how to code up the Ricker and Verhulst-Pearl models using C snippets. First, we’ll code up the rinit and rmeasure components, which the two models share.
Csnippet("
pop = rpois(b*N);
") -> rmeas
Csnippet("
N = N_0;
") -> rinit
These are about as simple as one can get. As the name suggests, each is just a snippet of C code: not all variables are declared (in fact, in these examples, none are) and the context of the snippet is not specified. In particular, these snippets are not actually complete C functions, so by themselves, they cannot be compiled. They must not actually violate the rules of C syntax however. Among other things, lines must end with a semicolon (;
), variable names must respect C restrictions, etc. Although one can enclose essentially arbitrarily complex C code in a C snippet, one can do quite a lot with very simple snippets. If you are new (even very new) to C, don’t worry: once you master a few simple rules, you’ll be able to code up C snippets just as easily, or even more easily, than you code up R functions and you will come to value the resulting speed-up extremely.
Now we’ll code the Ricker and Verhulst-Pearl simulation steps as C snippets.
Csnippet("
double eps = rnorm(0,sigma);
N = r*N*exp(1-N/K+eps);
") -> rickstepC
Csnippet("
double dW = rnorm(0,sqrt(dt));
N += r*N*(1-N/K)*dt+sigma*N*dW;
") -> vpstepC
A few observations are in order. First, note that the local variables eps
and dW
are declared double
, the standard C data-type for (double-precision) floating point numbers. Observe that dt
is a variable that is defined in the context of the vpstepC
snippet; when this snippet is executed, dt
will be provided by pomp
and will be equal to the size of the Euler step actually being taken. Note also that in each of these snippets, the value of N
gets over-written by its new value at the next time-step. This is the goal of the C snippets we supply to specify the ‘rprocess’ component of a model. Finally, notice that neither the state variable N
nor any of the parameters are declared. The declarations will be handled in a different way, as we’ll see in a moment.
When furnished with one or more C snippets, pomp
will provide the necessary declarations and context, compile the resulting C code, dynamically link to it, and use it whenever the corresponding basic model component is needed. We cause all this to happen when we construct an object of class pomp
via a call to the constructor function. Let’s do this for the two models now.
|>
parus pomp(
times="year", t0=1960,
rinit=rinit,
rmeasure=rmeas,
rprocess=discrete_time(rickstepC,delta.t=1),
statenames="N",
paramnames=c("r","K","sigma","b","N_0")
rickC ) ->
|>
parus pomp(
times="year", t0=1960,
rinit=rinit,
rmeasure=rmeas,
rprocess=euler(vpstepC,delta.t=1/365),
statenames="N",
paramnames=c("r","K","sigma","b","N_0")
vpC ) ->
In these calls, we use the statenames
and paramnames
arguments to indicate which of the undeclared variables in the C snippets rickstepC
and vpstepC
are state variables and which are fixed parameters. Since dW
and eps
are declared as local variables within the C snippets themselves, we don’t need to mention them here. The rnorm
and rpois
functions are part of the R API: see the manual on “Writing R Extensions” for a description of these and the other distribution functions provided as part of the R API. A full set of rules for writing pomp C snippets is given in the package help (?Csnippet
).
Using the ‘pomp’ objects just constructed, explore model simulations at a variety of different parameters. As before, plot simulations and data on the same axes for comparison purposes.
Write a C snippet implementing the negative binomial measurement model you explored previously. Fold it into the ‘pomp’ objects just constructed. Remember, there is no need to re-specify components you have already specified: by calling pomp
on a ‘pomp’ object you can modify some or all of the basic model components. Plot simulations and data on the same axes, as in the immediately preceding exercise.
As mentioned in the introduction, the dmeasure component is the other side of the rmeasure component. The latter simulates the measurement model whereas the former evaluates the measurement model’s probability density function. The following C snippet encodes the dmeasure component.
Csnippet("
lik = dpois(pop,b*N,give_log);
") -> dmeas
Here, dpois
again comes from the R API. It takes three arguments, the datum (pop
), the Poisson parameter (b*N
), and give_log
. When give_log=0
, dpois
returns the Poisson likelihood; when give_log=1
, dpois
returns the log of this likelihood. When this snippet is executed, pomp
will provide the value of give_log
according to its needs. It is the user’s responsibility to make sure that the correct value is returned for both possible values of give_log
. This is one of the most common places where newbies make mistakes!
Write the dmeasure component for your negative binomial model both as an R function and as a C snippet.
We are now in a position to be able to compute the likelihood of the data given any set of parameters for either of our models. For this purpose, we use the particle filter. This powerful algorithm is at the heart of several of pomp’s inference methods. We won’t describe the theory of the particle filter here. The tutorial by Arulampalam, Maskell, Gordon, and Clapp (2002) explains the theory in an accessible way. The pomp Journal of Statistical Software paper gives pseudocode and some examples. The pomp documentation page lists several other tutorial documents that go into more detail.
In pomp, the simplest version of the particle filter is implemented in the function pfilter
. Its only required arguments are the ‘pomp’ object and the number of particles, i.e., the Monte Carlo sample size.
|>
rickC pfilter(Np=1000,
params=c(r=1.2,K=2000,sigma=0.3,N_0=1600,b=0.1),
dmeasure=dmeas,
paramnames="b",statenames="N") -> pfrick
Notice that, in this call, we specified the dmeasure component using the C snippet we wrote above. What would have happened had we not specified this?
Notice that, because we here introduced a new C snippet, we again had to indicate which of the undeclared variables in dmeas
are parameters and which are latent state variables.
What kind of object is pfrick
?
pfrick
## <object of class 'pfilterd_pomp'>
As a ‘pfilterd_pomp’ object, pfrick
contains rickC
plus a wealth of information regarding the particle filter operation that created it. For example, we can extract the estimated log likelihood at these (arbitrarily chosen) parameters:
logLik(pfrick)
## [1] -148.5748
There is also a plot
method for ‘pfilterd_pomp’ objects and one for coercing them to data frames.
plot(pfrick)
as(pfrick,"data.frame")
year | pop | ess | cond.logLik |
---|---|---|---|
1960 | 148 | 1000.00000 | -3.879800 |
1961 | 258 | 276.91478 | -5.329129 |
1962 | 185 | 245.07346 | -5.244873 |
1963 | 170 | 183.89624 | -5.551510 |
1964 | 267 | 245.35009 | -5.443870 |
1965 | 239 | 291.78302 | -5.235775 |
1966 | 196 | 274.13378 | -5.152608 |
1967 | 132 | 66.23966 | -6.545454 |
1968 | 167 | 225.27473 | -5.331978 |
1969 | 186 | 249.96622 | -5.263931 |
1970 | 128 | 56.39030 | -6.692253 |
1971 | 227 | 302.62524 | -5.174603 |
1972 | 174 | 202.36717 | -5.441043 |
1973 | 177 | 211.26047 | -5.369764 |
1974 | 137 | 83.00157 | -6.269213 |
1975 | 172 | 238.82351 | -5.239950 |
1976 | 119 | 45.15378 | -6.853124 |
1977 | 226 | 299.48172 | -5.172145 |
1978 | 166 | 184.60553 | -5.521961 |
1979 | 161 | 146.71626 | -5.714835 |
1980 | 199 | 269.30266 | -5.230779 |
1981 | 306 | 193.94797 | -5.754881 |
1982 | 206 | 313.84250 | -5.057105 |
1983 | 350 | 117.75799 | -6.321934 |
1984 | 214 | 296.00247 | -5.170972 |
1985 | 175 | 203.08438 | -5.449372 |
1986 | 211 | 293.20228 | -5.161943 |
Both of these reveal that pfrick
contains information about the effective sample size of the particle filter (ess
) and the conditional log likelihood, cond.logLik
which, in the notation introduced above, is logfYn|Y1:n−1(y∗n|y∗1:n−1).
The particle filter is a Monte Carlo algorithm. Accordingly, it gives us only a noisy estimate of the likelihood. We can reduce this noise by increasing the number of particles, Np
, and we can estimate the magnitude of the Monte Carlo error by running a few independent particle filters. For example:
|> pfilter() |> logLik() |> replicate(n=10) -> lls
pfrick lls
## [1] -148.4803 -148.5014 -148.8655 -148.6632 -149.2521 -148.1796 -148.0358
## [8] -148.5363 -148.4578 -148.5291
logmeanexp(lls,se=TRUE,ess=TRUE) -> ll_rick1
ll_rick1
## est se ess
## -148.5019709 0.1004082 9.1782054
In the first line, notice that we did not need to specify Np
, despite the fact that there is no default value of this parameter. Indeed, because pfrick
is a ‘pfilterd_pomp’ object, it knows the value of Np
that was used in its own creation. By default, this same value is used again when it is passed to pfilter
. We could, of course, have used a different value simply by specifying Np
in this call to pfilter
.
The last line of the preceding code chunk computes the log of the mean of the estimated likelihoods and the standard error on this mean using a jack-knife method. Since the particle filter gives an unbiased estimate of the likelihood (not the log likelihood), this operation is sensible, provided the Monte Carlo error is not too large. It also estimates the “effective sample size”. The facts that this is close to the nominal sample size (10) and that the standard error is small relative to 1 log unit are both indications that the log likelihood is well estimated.
Let’s repeat the operation for the Verhulst-Pearl model, again at arbitrary parameters.
|>
vpC pfilter(Np=1000,dmeasure=dmeas,
params=c(r=0.5,K=2000,sigma=0.1,b=0.1,N_0=2000),
paramnames="b",statenames="N") |>
logLik() |>
replicate(n=10) |>
logmeanexp(se=TRUE,ess=TRUE) -> ll_vp1
ll_vp1
Compute the likelihood for the parameters you found in your attempt to estimate parameters “by eye”.
Trajectory matching is the method of estimating the parameters of a deterministic model by fitting the model to data assuming independent measurement errors. Although pomp’s main focus is on stochastic models, it does provide facilities for trajectory matching. pomp makes a conceptual distinction between the stochastic process and a deterministic skeleton, which we can view as a deterministic model related to the stochastic process’ central tendency. We’ll not go into mathematical details here: instead, we’ll illustrate with two examples.
A deterministic skeleton of the stochastic Ricker model is the Ricker map, Eq. 1. We implement this for pomp, again either as an R function or a C snippet and pass it to pomp functions via the skeleton
argument. For example:
|>
rickC pomp(
skeleton=map(
Csnippet("DN = r*N*exp(1-N/K);"),
delta.t=1
),paramnames=c("r","K"), statenames="N"
rickC ) ->
Here, the left-hand side of Eq. 1 is indicated by the D
prefix: in skeleton snippets, we don’t over-write the state variable N
. We indicate that the skeleton is a discrete-time map using the map
function.
A deterministic skeleton of the Verhulst-Pearl model is the vectorfield (or ordinary differential equation), dNdt=rN(1−NK). We fold this into the vpC
‘pomp’ object so:
|>
vpC pomp(
skeleton=vectorfield(Csnippet("DN = r*N*(1-N/K);")),
paramnames=c("r","K"), statenames="N"
vpC ) ->
Since the skeleton here is a vectorfield, in this C snippet, DN
is filled with the value of the time-derivative of N
. See the package help (?Csnippet
) for a complete set of rules for writing C snippets.
With the deterministic skeleton in place we can generate trajectories of the skeleton using trajectory
. For example:
parmat(c(r=0.5,K=2000,sigma=0.1,b=0.1,N_0=2000),10)
p <-"N_0",] <- seq(10,3000,length=10)
p[|>
vpC trajectory(params=p,format="data.frame") |>
ggplot(mapping=aes(x=year,y=N,color=.id,group=.id))+
guides(color="none")+
geom_line()+
theme_bw()
As with simulate
, one can use trajectory
to compute multiple trajectories at once, for varying values of the parameters.
In pomp, the function traj_objfun
constructs an objective function quantifying the mismatch between model predictions and data. For this purpose, it uses the dmeasure component of the model. This function can be given to any of the large variety of numerical optimizers available in R and R packages. These optimizers search parameter space to find parameters under which the likelihood of the data, given a trajectory of the deterministic skeleton, is maximized.
We’ll demonstrate using the Verhulst-Pearl model.
|>
vpC traj_objfun(
est=c("K","N_0"),
params=c(r=0.5,K=2000,sigma=0.1,b=0.1,N_0=2000),
dmeasure=dmeas, statenames="N", paramnames="b"
ofun ) ->
This invocation of traj_objfun
creates an objective function, ofun
, that can be used to estimate the three parameters K and N0. It will hold the other three parameters, r, σ, and b, fixed at the values they are given in params
.
Notice that, in this code chunk, we had to specify dmeasure
once again. Why? What would have happened had we not done so?
What kind of object is ofun
?
ofun
## <object of class 'traj_match_objfun'>
‘traj_match_objfun’ objects, like the objective functions created by the pomp functions spect_objfun
, probe_objfun
, and nlf_objfun
, is an R function, but it is also more than an R function. It contains not only the vpC
‘pomp’ object, but it additionally saves information each time it is evaluated. Let’s see how such stateful objective functions make it easy to use a wide range of numerical optimization routines.
We can evaluate ofun
at any point in the 2-dimensional K-N0 space. For example:
ofun(c(1000,3000))
## [1] 1161.543
The value returned by ofun
is the negative log likelihood, as returned by the model’s dmeasure component. We can estimate the parameters using, for example, the subplex algorithm implemented in the subplex package:
library(subplex)
subplex(c(2000,1500),fn=ofun) -> fit
fit
## $par
## [1] 1968.474 1895.245
##
## $value
## [1] 276.2893
##
## $counts
## [1] 318
##
## $convergence
## [1] 0
##
## $message
## [1] "success! tolerance satisfied"
##
## $hessian
## NULL
Note that fit
contains, among other things, the estimated parameters (element par
) and the minimized value of the negative log likelihood (value
).
To make absolutely certain that ofun
remembers these estimates, we evaluate it once at fit$par
:
ofun(fit$par)
## [1] 276.2893
coef(ofun)
## r K sigma b N_0
## 0.500 1968.474 0.100 0.100 1895.245
logLik(ofun)
## [1] -276.2893
Then we can, for example, extract the fitted trajectory thus:
|>
ofun trajectory(format="data.frame") |>
ggplot(mapping=aes(x=year,y=N,color=.id,group=.id))+
guides(color="none")+
geom_line()+
theme_bw()
We can superimpose the model predictions on the data as follows.
|>
ofun trajectory(format="data.frame") |>
mutate(
pop=coef(ofun,"b")*N,
.id="prediction"
|>
) select(-N) |>
rbind(
|>
parus mutate(
.id="data",
pop=as.double(pop)
)|>
) pivot_wider(names_from=.id,values_from=pop) |>
ggplot(aes(x=year))+
geom_line(aes(y=prediction))+
geom_point(aes(y=data))+
expand_limits(y=0)+
labs(y="pop")
Very commonly, model parameters must obey certain constraints. For example, the parameters in the two models we’ve looked at so far are all constrained to be positive. In estimating parameters, however, one frequently wants to employ a numerical optimization method that does not respect constraints. One way of accomodating such unconstrained optimizers is to transform the parameter space so that the constraints disappear. For example, by log-transforming the r parameter in the Verhulst-Pearl model (Eq. 3), one obtains the superficially different equation dN=eρN(1−NK)dt+σNdW(t), where ρ=logr. In this model, ρ can take any (positive or negative) values while r remains positive.
We incorporate parameter transformations using the partrans
argument to many pomp functions, specifying them using the parameter_trans
function. In general, the parameter transformations, like other basic model components, can be supplied using R functions or C snippets. If we are merely log-transforming, logit-transforming, or log-barycentric-transforming parameters, however, it is even easier. The following code chunk implements log transformation of all of the parameters of the Verhulst-Pearl and Ricker models.
parameter_trans(log=c("r","K","sigma","b","N_0")) vpr_partrans <-
We could then provide vpr_partrans
as needed to any of the various pomp inference methods, via the partrans
argument. For example, to estimate parameters for the Verhulst-Pearl model on the transformed scale, we do
|>
vpC traj_objfun(
est=c("b","K","N_0"),
params=c(r=0.5,K=2000,sigma=0.1,b=0.1,N_0=2000),
partrans=parameter_trans(log=c("K","N_0","b")),
dmeasure=dmeas, statenames="N", paramnames=c("b","K","N_0")
ofun2
) ->
subplex(log(c(0.1,2000,1500)),fn=ofun2) -> fit
ofun2(fit$par)
## [1] 276.2893
coef(ofun2)
## r K sigma b N_0
## 0.50000000 2506.85237865 0.10000000 0.07852373 2413.59563164
Estimate the parameters for the Verhulst-Pearl model assuming negative binomial errors. Do not attempt, at first, to estimate all parameters simultaneously. Focus on estimating K, N0 and the parameters of the measurement model. It will probably be helpful to make use of parameter transformations to enforce the model constraints.
Estimate some or all of the parameters of the Ricker model using trajectory matching. It is probably a good idea to use parameter transformations.
Let us now turn to the main focus of the pomp package: parameter estimation for fully stochastic models. Iterated filtering is a method for maximizing the likelihood. The method was introduced in its original form by Ionides, Bretó, and King (2006), and was subsequently much improved by Ionides, Nguyen, Atchadé, Stoev, and King (2015). The latter paper rigorously expounds the theory. An updated version of the pomp Journal of Statistical Software paper provides pseudocode and a simple example. Finally, a tutorial on the theory and practice is linked from the pomp documentation index. Here, we confine ourselves to demonstrating how the IF2 algorithm (Ionides et al. 2015) is applied to the toy examples we have been discussing.
The search for the maximum likelihood parameter estimates (MLE) is inherently a global one: the likelihood surface can (and often enough does) possess multiple local maxima, any of which might represent a plausible explanation of the data in hand. IF2, however, is a local search algorithm: essentially, it starts at a user-provided “guess” and attempts to move “uphill” toward the nearest local maximum it can find. To solve the global problem, therefore, it is advisable to run IF2 multiple times, starting from different “guesses”. By studying the results of such a calculation, one hopes to obtain not only the global MLE but, more importantly, an understanding of the shape of the likelihood surface itself.
To illustrate this, let us attempt to determine at which values of the four parameters r, K, σ, and N0 our Verhulst-Pearl model best explains the Parus major data. Because this is only a toy problem, we will restrict ourselves to the assumption that b=1, despite the fact that it might plausibly be otherwise. In a real scientific study, of course, we would be much less cavalier about fixing parameters arbitrarily in this way.
We begin by constructing a large number of “guesses”. Our design is to distribute these around some multidimensional “box” that, we hope, contains the MLE. Importantly, although our hope is that the box we construct here embraces the MLE, our eventual success does not necessarily depend on it doing so: the IF2 algorithm will be free to search parameter space outside the box. We use the sobol_design
function to construct an overdispersed sequence of points within the selected box.
sobol_design(
lower=c(r=0,K=100,sigma=0,N_0=150,b=1),
upper=c(r=5,K=600,sigma=2,N_0=150,b=1),
nseq=100
guesses
) ->
plot(guesses,pch=16)
In the above call, lower
and upper
give the bounds of the box and nseq
specifies the number of guesses we desire. Now, from each of these starting guesses, we will run the IF2 algorithm (implemented as mif2
in pomp). As its name suggests, this algorithm works by iteratively applying the particle filter: Crucially, IF2 adds an extra ingredient to the particle filter. As the filter is applied, IF2 adds random perturbations to the parameters. This serves several purposes, including smoothing the likelihood surface, combating particle depletion, and allowing IF2 to obtain information about the local structure of the surface. However, this augmented stochastic variability represents an alteration of the model. IF2 therefore gradually reduces the intensity of the random perturbations.
Let us examine a single call to mif2
.
|>
vpC mif2(
params=guesses[1,],
Np=1000,
Nmif=20,
dmeasure=dmeas,
partrans=parameter_trans(log=c("r","K","sigma","N_0")),
rw.sd=rw_sd(r=0.02,K=0.02,sigma=0.02,N_0=ivp(0.02)),
cooling.fraction.50=0.5,
paramnames=c("r","K","sigma","N_0","b"),
statenames=c("N")
vpM ) ->
In this call, we specify the starting point with params
: here, we’ve just taken this to be the first of the guesses
. As in pfilter
, Np
sets the number of particles to be used. Nmif
fixes the number of IF2 iterations that will be performed. We furnish the necessary dmeasure component via the dmeasure
argument, and provide parameter transformations via partrans
to enforce positivity of the model parameters. The rw.sd
argument specifies the intensity of the perturbations that will be applied at each observation time. The perturbations are normally distributed on the transformed (here, the log) scale, so these represent roughly 2% perturbations per observation. The ivp
syntax indicates that N_0
is an initial value parameter and accordingly should be treated slightly differently than the regular parameters. The cooling.fraction.50
argument controls the speed at which the magnitude of the perturbations is drawn down. Finally, the paramnames
and statenames
arguments are needed since dmeas
and the parameter transformations are built using C snippets.
What kind of object is vpM
?
vpM
## <object of class 'mif2d_pomp'>
pomp provides various methods for ‘mif2d_pomp’ objects. The plot
method, for example, produces a diagnostic plot.
plot(vpM)
The first plot (“filter diagnostics”) shows the results of the last iteration of the particle filter, comparable to the plot produced by calling plot
on a ‘pfilterd_pomp’ object. The second plot (“convergence diagnostics”) summarizes the results of the sequence of IF2 iterations. We observe that the log likelihood has increased, concomitant with movement in the r, K, and σ parameters and, to a lesser extent, the N0 parameter. We observe that, as intended, the b parameter has remained fixed.
Now, although we have observed the intended improvement in the log likelihood, we should be careful to note that the log likelihood displayed in this plot is the log likelihood of the perturbed model. This model differs from the one we are interested in. To compute the likelihood of our focal model at the parameter returned by mif2
, we need to perform a few particle filter operations:
|> pfilter() |> logLik() |> replicate(n=5) |> logmeanexp(se=TRUE,ess=TRUE) vpM
## est se ess
## -143.9880041 0.1035849 4.7923226
Modern computers have multiple processors (cores). To take advantage of these, we can parallelize the above particle-filter computation using the circumstance and doFuture packages:
library(circumstance)
library(doFuture)
plan(multicore)
|> pfilter(Nrep=5) |> logLik() |> logmeanexp(se=TRUE,ess=TRUE) vpM
## est se ess
## -144.0370140 0.1467573 4.6030137
In the above, the pfilter
function is provided by circumstance. It causes the particle filters to be run in parallel, each using a different core. Internally, circumstance uses the doFuture package to parallelize computations. doFuture provides a number of other ways of parallelizing computations, inclusing multisession
(needed on Windows computers) and cluster
(for parallelizing across multiple machines). These are set using the plan()
function. doFuture also provides parallel random-number generators, so that we can consider each of the parallel computations to be independent.
Use circumstance::pfilter()
to parallelize the estimation of likelihood for your negative-binomial model.
Now we repeat this calculation from each of our 100 guesses. Because these computations are each independent of the others, it is both easy and worthwhile to parallelize them. The circumstance package is also useful for this purpose. In particular, it defines a mif2
method that takes a data frame of guesses of the kind we defined above. For each of these, it performs an iterated filtering computation.
|> mif2(starts=guesses) -> mifs vpM
Notice that, in the above, the settings of mif2
are not explicitly given here. They are inherited from vpM
, which we computed previously. If we had wanted to modify any of them (e.g., Nmif
, Np
, rw.sd
, etc.), we would have simply done so in the call to mif2
. For example, to increase the number of particles and the number of iterations, we would do:
|>
vpM mif2(
starts=guesses,
Nmif=30,
Np=2000
mifs ) ->
Let us examine the results of our mif2
computation. The following code chunk extracts the traces of the IF2 calculations. These are the movements of the parameters and the likelihood plotted against filter iteration.
|>
mifs traces() |>
melt() |>
filter(name!="b") |>
ggplot(aes(x=iteration,y=value,group=.L1,color=.L1))+
geom_line()+
facet_wrap(~name,scales="free_y")+
guides(color="none")
More simply, executing plot(mifs)
, would give us a less fancy plot.
Next, we examine the diagnostics produced in the last iteration of the filter.
|>
mifs as("data.frame") |>
pivot_longer(-c(.L1,year)) |>
ggplot(aes(x=year,y=value,group=.L1,color=.L1))+
geom_line()+
facet_wrap(~name,scales="free_y",ncol=1)+
guides(color="none")
Next, to estimate the log likelihoods, we run a few independent particle filters on the results of our mif2
computations. We combine the resulting log likelihood estimates with the parameter estimates themselves (obtained using coef()
).
|>
mifs pfilter(Nrep=5) |>
logLik() |>
melt() |>
separate(name,into=c(".id","rep")) |>
group_by(.id) |>
reframe(melt(logmeanexp(value,se=TRUE))) |>
ungroup() |>
bind_rows(
|> coef() |> melt()
mifs |>
) pivot_wider() |>
rename(
loglik=est,
loglik.se=se
estimates ) ->
To display the results of such a multi-start search pairwise scatterplot matrices are useful:
|>
estimates bind_rows(guesses) |>
filter(is.na(loglik) | loglik>max(loglik,na.rm=TRUE)-30) |>
mutate(col=if_else(is.na(loglik),"#99999955","#ff0000ff")) |>
{
pairs(~loglik+r+sigma+K+N_0,data=dat,col=dat$col,pch=16)
\(dat) }()
At this point, there’s no particular reason to suspect that our IF2 searches have arrived at their destination. In general, it is hard to know a priori how much effort will be required to find the MLE. Let’s continue the search, starting with the best points we’ve uncovered so far.
|>
estimates filter(!is.na(loglik)) |>
filter(loglik > max(loglik)-30) |>
select(-.id,-loglik,-loglik.se) -> starts
|>
vpM mif2(starts=starts) |>
mif2() -> mf
|>
mf pfilter(Nrep=5) |>
logLik() |>
melt() |>
separate(name,into=c(".id","rep")) |>
group_by(.id) |>
reframe(melt(logmeanexp(value,se=TRUE))) |>
ungroup() |>
bind_rows(
|> coef() |> melt()
mf |>
) pivot_wider() |>
rename(
loglik=est,
loglik.se=se
ests1 ) ->
Note that, in the above, we’ve performed a total of 100 mif2
iterations per starting point. The code above also does the post-mif2
likelihood estimation. It returns just the parameter and likelihood estimates.
We can combine the new estimates with the old ones into a general database:
|> bind_rows(ests1) |> select(-.id) -> estimates
estimates
|> arrange(-loglik) |> head() estimates
loglik | loglik.se | r | K | sigma | N_0 | b |
---|---|---|---|---|---|---|
-141.5831 | 0.1692943 | 4.547132 | 210.9662 | 0.7196171 | 152.3395 | 1 |
-141.5960 | 0.1536001 | 4.373002 | 208.9291 | 0.7153067 | 146.7092 | 1 |
-141.6461 | 0.1787445 | 2.814224 | 214.9345 | 0.6000872 | 145.8843 | 1 |
-141.6527 | 0.1702926 | 2.523582 | 209.8634 | 0.5630302 | 144.4279 | 1 |
-141.6584 | 0.0968557 | 7.079429 | 206.4207 | 0.8655555 | 146.2177 | 1 |
-141.6919 | 0.2298834 | 6.398381 | 214.9552 | 0.8801676 | 149.8789 | 1 |
|>
estimates filter(loglik>max(loglik,na.rm=TRUE)-4) |>
select(loglik,r,sigma,K,N_0) |>
plot_matrix()
In this plot, we begin to see the emergence of structure in the likelihood surface. In particular, what looks like a ridge of high likelihood is visible in the r-σ projection. Such structures are very interesting in that they contain clues as to the manner in which the model is fitting the data. They can also pose challenges to efficient estimation, since climbing up to a ridge is harder than traversing it.
We can improve the quality of our estimates and obtain likelihood-ratio-test-based confidence intervals by constructing profile likelihoods. In a likelihood profile, one varies the focal parameter (or parameters) across some range, maximizing the likelihood over the remaining parameters at each value of the focal parameter. The following codes construct a likelihood profile over r.
|>
estimates filter(loglik>max(loglik)-10) |>
select(r,K,sigma,N_0,b) |>
apply(2,range) -> ranges
ranges
## r K sigma N_0 b
## [1,] 0.06080388 185.6455 0.3044406 136.8871 1
## [2,] 13.18989842 339.3910 1.1530487 162.7820 1
profile_design(
r=10^seq(
from=log10(ranges[1,1]),
to=log10(ranges[2,1]),
length=20
),lower=ranges[1,-1],
upper=ranges[2,-1],
nprof=50
starts
) ->
dim(starts)
## [1] 1000 5
|>
starts select(r,sigma,K,N_0,b) |>
plot_matrix()
|>
vpM mif2(
starts=starts,
partrans=parameter_trans(log=c("K","sigma","N_0")),
rw.sd=rw_sd(K=0.02,sigma=0.02,N_0=ivp(0.02)),
paramnames=c("K","sigma","N_0","b")
|>
) mif2() -> mf
|>
mf pfilter(Nrep=5) |>
logLik() |>
melt() |>
separate(name,into=c(".id","rep")) |>
group_by(.id) |>
reframe(melt(logmeanexp(value,se=TRUE))) |>
ungroup() |>
bind_rows(
|> coef() |> melt()
mf |>
) pivot_wider() |>
select(-.id) |>
rename(
loglik=est,
loglik.se=se
r_prof ) ->
Notice that we’ve changed the mif2
perturbations (rw.sd
): we’ve removed the perturbation on the r parameter, since we want to hold this parameter fixed.
We add these points to our database:
|>
estimates bind_rows(r_prof) -> estimates
Next, we plot the likelihood profile. The following plot shows the top two estimates for each value of r with error bars showing ±2 s.e. and a loess smooth.
|>
r_prof group_by(r) |>
filter(rank(-loglik)<=2) |>
ungroup() |>
ggplot(aes(x=r,y=loglik,
ymin=loglik-2*loglik.se,ymax=loglik+2*loglik.se))+
geom_point()+
geom_errorbar()+
geom_smooth(method="loess",span=0.2)+
scale_x_log10()
We see that the 95% likelihood ratio test confidence interval (CI) appears to be one-sided: the CI is roughly r>0.73.
If we plot the profile trace of σ, we see that, as we increase r, we have to increase the intensity of the environmental stochasticity to maintain a good fit. Why is this?
|>
r_prof group_by(r) |>
filter(rank(-loglik)<=2) |>
ungroup() |>
ggplot(aes(x=r,y=sigma))+
geom_point()+
geom_smooth(method="loess",span=0.2)+
scale_x_log10()+
labs(y=expression(sigma))
Construct a likelihood profile for the K parameter. Plot the profile and profile traces. Comment on your findings.
To this point, we have focused on the particle filter as a means of computing the likelihood, for use in estimation of unknown parameters. It also gives us a way of estimating the likely trajectories of the latent state process. The filter_mean
, pred_mean
, and pred_var
functions give access to the filtering and prediction distributions of the latent state at each observation time. The prediction distribution is that of Xn|Y1,…,Yn−1, i.e., the distribution of the latent state conditional on the observations at all times up to but not including tn. The filter distribution, by contrast, incorporates the information of the n-th observation: it is the distribution of Xn|Y1,…,Yn.
One can think of the filtering distribution as a kind of hindcast, i.e., an estimate of the state of the latent process at earlier times. Although it is computed essentially at no additional cost, it is typically of limited value as a hindcast, since the amount of information used in the estimate decreases as one goes farther back into the past. A better hindcast is afforded by the smoothing distribution, which is the distribution of Xn|Y1,…,YN, i.e., that of the latent state conditional on all the data.
Computation of the smoothing distribution requires more work. Specifically, a single particle chosen at random from a particle fitler, with its entire history, is a draw from the sampling distribution (Andrieu, Doucet, and Holenstein 2010). To build up a picture of the sampling distribution, therefore, one can run some number of independent particle filters, sampling a single particle’s trajectory from each. The following codes accomplish this. Note the use of the filter_traj
function, which extracts the trajectory of a single, randomly chosen, particle.
|>
r_prof group_by(r) |>
filter(rank(-loglik)<=2) |>
ungroup() |>
select(-loglik,-loglik.se) |>
pivot_longer(-r) |>
group_by(name) |>
summarize(value=predict(loess(value~r),newdata=data.frame(r=2))) |>
ungroup() |>
pivot_wider() |>
bind_cols(r=2) -> theta
bake(file="hindcast1.rds",seed=174423157,{
|>
vpM pfilter(params=theta,Np=500,Nrep=200,filter.traj=TRUE) -> pf
list(
loglik=logLik(pf),
traj=filter_traj(pf)
fts
) -> fts
}) ->
$loglik |> logmeanexp(se=TRUE,ess=TRUE) fts
## est se ess
## -141.84236623 0.03764978 156.14471418
$traj |>
fts melt() |>
select(-rep) |>
left_join(
tibble(
chain=as.character(seq_along(fts$loglik)),
loglik=fts$loglik
),by="chain"
|>
) mutate(
year=time(vpM)[time]
|>
) group_by(name,year) |>
reframe(
label=c("lo","med","hi"),
p=c(0.05,0.5,0.95),
q=wquant(value,weights=exp(loglik-max(loglik)),probs=p)
|>
) ungroup() |>
select(-p) |>
pivot_wider(names_from=label,values_from=q) -> quants1
|>
quants1 ggplot()+
geom_line(aes(x=year,y=med),color="darkblue")+
geom_ribbon(aes(x=year,ymin=lo,ymax=hi),color=NA,fill="lightblue",alpha=0.5)+
geom_point(data=parus,aes(x=year,y=pop))+
labs(y="N")
Note also that the individual trajectories must we weighted by their Monte Carlo likelihoods (see the use of wquant
in the above).
The plot above shows the uncertainty surrounding the hindcast. Importantly, this includes uncertainty due to both measurement error and process noise. It does not, however, incorporate parametric uncertainty. To fold this additional uncertainty into the estimates requires further work. We discuss this in a later section.
pmcmc
chainsIf we seek to do full-information Bayesian inference, we can use particle Markov chain Monte Carlo, implemented in pomp in the pmcmc
function. The following codes cause parallel pMCMC chains to be run, each beginning at one of the estimates obtained using mif2
. Note that we add a prior by furnishing a prior probability density evaluation function (dprior=...
). In this case, we assume a product prior: marginally uniform in each of the parameters r, σ, K, and N0.
|>
r_prof group_by(r) |>
filter(loglik==max(loglik)) |>
ungroup() |>
filter(
> 0.75,
r < 6,
r < 2,
sigma > 100,
K < 600,
K 0 > 100,
N_0 < 600
N_|>
) select(-loglik,-loglik.se) -> starts
starts
r | K | sigma | N_0 | b |
---|---|---|---|---|
0.7773284 | 215.4178 | 0.3653957 | 152.1069 | 1 |
1.0317351 | 205.7089 | 0.3627431 | 150.4647 | 1 |
1.3694050 | 214.9155 | 0.4035579 | 148.0049 | 1 |
1.8175887 | 216.0507 | 0.4976989 | 148.4357 | 1 |
2.4124554 | 212.0175 | 0.5278700 | 151.5174 | 1 |
3.2020123 | 208.6416 | 0.6082845 | 148.4935 | 1 |
4.2499780 | 213.5120 | 0.7205008 | 150.5958 | 1 |
5.6409256 | 214.2077 | 0.8187588 | 152.1653 | 1 |
|>
vpM pmcmc(
starts=starts,
Nmcmc=2000,Np=200,
dprior=Csnippet(r"{
lik = dunif(r,0,10,1)+dunif(sigma,0,2,1)+
dunif(K,0,600,1)+dunif(N_0,0,600,1);
lik = (give_log) ? lik : exp(lik);}"),
paramnames=c("K","N_0","r","sigma"),
proposal=mvn_rw_adaptive(
rw.sd=c(r=0.02,sigma=0.02,K=50,N_0=50),
scale.start=100,shape.start=100
) chains
) ->
|>
chains pmcmc(
Nmcmc=20000,
proposal=mvn_rw(covmat(chains))
chains ) ->
In the above, we first subset out those mif2
estimates that are consistent with our prior. At each of these, we then perform a number of MCMC moves, using an adaptive multivariate-normal random-walk proposal distribution mvn.rw.adaptive
. To complete the inference, we do more MCMC iterations using a multivariate random-walk proposal (mvn.rw
) with covariance matrix derived from the preceding computation (covmat
).
Now we investigate the chains, to try and determine whether they have converged.
library(coda)
|> traces() -> traces
chains rejectionRate(traces[,c("r","sigma","K","N_0")])
## r sigma K N_0
## 0.80305 0.80305 0.80305 0.80305
We see that the rejection rate is very good. Let us examine the autocorrelation in the chains.
|> autocorr.diag(lags=c(1,5,10,50,100)) traces
## loglik log.prior r K sigma N_0
## Lag 1 0.921904125 NaN 0.92912857 0.900417161 0.92131389 0.913381363
## Lag 5 0.687302389 NaN 0.70673570 0.612788250 0.67499667 0.643977060
## Lag 10 0.491929039 NaN 0.51514803 0.395923723 0.46792262 0.420396349
## Lag 50 0.053282006 NaN 0.08105866 0.037563084 0.04404668 0.020580567
## Lag 100 -0.004827143 NaN 0.01141717 0.002176966 0.00643389 -0.004917029
## b
## Lag 1 NaN
## Lag 5 NaN
## Lag 10 NaN
## Lag 50 NaN
## Lag 100 NaN
|> effectiveSize() traces
## loglik log.prior r K sigma N_0 b
## 5526.945 0.000 5250.458 7623.421 5968.610 6809.830 0.000
window(traces,thin=100,start=2000)
traces <-|> effectiveSize() traces
## loglik log.prior r K sigma N_0 b
## 1600.959 0.000 1573.956 1391.675 1638.215 1366.734 0.000
The autocorrelation is strong, but drops to small values by lag 100. Accordingly, we thin the chains by a factor of 100. We discard 2000 iterations as a burn-in.
Now let us examine the traces.
|>
traces lapply(as.data.frame) |>
lapply(rowid_to_column,"iter") |>
bind_rows(.id="chain") |>
select(chain,iter,loglik,r,sigma,K,N_0) |>
pivot_longer(c(-chain,-iter)) |>
ggplot(aes(x=iter,group=chain,color=chain,y=value))+
guides(color="none")+
labs(x="iteration",y="")+
geom_line(alpha=0.3)+geom_smooth(method="loess",se=FALSE)+
facet_wrap(name~.,scales="free_y",strip.position="left",ncol=2)+
theme(
strip.placement="outside",
strip.background=element_rect(fill=NA,color=NA)
)
gelman.diag(traces[,c("r","sigma","K","N_0")])
## Potential scale reduction factors:
##
## Point est. Upper C.I.
## r 1.01 1.02
## sigma 1.01 1.03
## K 1.00 1.01
## N_0 1.00 1.01
##
## Multivariate psrf
##
## 1.01
The trace-plots show good mixing and the Gelman-Rubin statistic confirms that the chains are mixing among themselves: a good indicator of convergence. Thus, insofar as we can tell by these diagnostics, the MCMC iterations appear to have converged to their stationary distribution and we are sampling it well after thinning.
The plots of the marginal posterior densities are shown below.
|>
traces lapply(as.data.frame) |>
lapply(rowid_to_column,"iter") |>
bind_rows(.id="chain") |>
select(chain,iter,loglik,r,sigma,K,N_0) |>
pivot_longer(c(-chain,-iter)) |>
ggplot(aes(x=value))+
geom_density()+
geom_rug()+
labs(x="")+
facet_wrap(~name,scales="free",strip.position="bottom")+
theme(
strip.placement="outside",
strip.background=element_rect(fill=NA,color=NA)
)
|> summary() traces
##
## Iterations = 2000:20000
## Thinning interval = 100
## Number of chains = 8
## Sample size per chain = 181
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## loglik -143.0255 1.5496 0.040722 0.039211
## log.prior -15.7896 0.0000 0.000000 0.000000
## r 6.0521 2.5672 0.067465 0.070220
## K 213.0830 13.1067 0.344437 0.352401
## sigma 0.8603 0.2423 0.006369 0.006101
## N_0 149.0913 12.3110 0.323526 0.342936
## b 1.0000 0.0000 0.000000 0.000000
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## loglik -146.6449 -143.8293 -142.8085 -141.981 -140.731
## log.prior -15.7896 -15.7896 -15.7896 -15.790 -15.790
## r 1.2998 3.9063 6.3150 8.312 9.823
## K 191.1281 204.0227 211.8928 220.369 242.887
## sigma 0.3971 0.6795 0.8702 1.022 1.320
## N_0 126.0597 140.5763 148.6913 157.230 173.183
## b 1.0000 1.0000 1.0000 1.000 1.000
|>
traces lapply(as.data.frame) |>
lapply(rowid_to_column,"iter") |>
bind_rows(.id="chain") |>
select(loglik,r,sigma,K,N_0) |>
plot_matrix()
Let us revisit the matter of hindcasting. In a preceding section, using an ensemble of particle filters, we were able to estimate the likely trajectory that the latent state process followed in generating the data we see, conditional on a given parameter vector. Naturally, since our estimate of the parameters is itself uncertain, our actual uncertainty regarding the hindcast is somewhat larger. We can use particle Markov chain Monte Carlo to fold in the parametric uncertainty. The following codes select one randomly-chosen particle trajectory from each iteration of our pMCMC chains. The chains are thinned and the burn-in period is discarded, of course. Note that it is not necessary to weight the samples by the likelihood: if the pMCMC chains have converged, the samples can be taken to be equally weighted samples from the smoothing distribution.
|>
chains filter_traj() |>
melt() |>
filter(rep > 1000, rep %% 100 == 0) |>
mutate(year=time(vpM)[time]) |>
pivot_wider(names_from=name) |>
group_by(year) |>
summarize(
label=c("lo","med","hi"),
p=c(0.025,0.5,0.975),
q=wquant(N,probs=p)
|>
) ungroup() |>
select(-p) |>
pivot_wider(names_from=label,values_from=q) -> quants2
bind_rows(
with=quants2,
without=quants1,
.id="uncert"
|>
) ggplot()+
geom_line(aes(x=year,y=med,color=uncert))+
geom_ribbon(aes(x=year,ymin=lo,ymax=hi,fill=uncert),color=NA,alpha=0.4)+
geom_point(data=parus,aes(x=year,y=pop))+
labs(y="N",fill="parametric\nuncertainty",color="parametric\nuncertainty")
Estimating model parameters by fitting the model to data is typically only a step in the process of trying to understand the processes that generated the data. The next step involves trying to understand how and why the model fits the data the way it does, whether it fits it well, and what scope for improvement there might be. pomp provides a number of tools to facilitate answering these questions through interaction with a fitted model.
Ultimately, since the model is viewed, at least hypothetically, as the process that generated the data, simulation of the fitted model is a central tool we have for model criticism. Let’s plot the data and several simulated realizations of the model process on the same axes.
|>
r_prof filter(loglik==max(loglik)) -> mle
as(mifs[[1]],"pomp")
mlepomp <-coef(mlepomp) <- mle
|>
mlepomp simulate(nsim=8,format="data.frame",include.data=TRUE) |>
ggplot(mapping=aes(x=year,y=pop,group=.id,alpha=(.id=="data")))+
scale_alpha_manual(values=c(`TRUE`=1,`FALSE`=0.2),
labels=c(`FALSE`="simulation",`TRUE`="data"))+
labs(alpha="")+
geom_line()+
theme_bw()
The first lines above simply extract the maximum likelihood estimates (mle
) from our profle computation. The next pair of lines plug these MLE parameters into a ‘pomp’ object (mlepomp
) containing the model and the data. The last set of lines do the simulation and the plotting.
Although it is clear from these plots that the estimated model has more variability and is thus able to explain the data better, it can be hard to read much from spaghetti plots such as this. It’s almost always a good idea to plot the data together with several simulated realizations in order to help assess how similar the two are.
|>
mlepomp simulate(nsim=11,format="data.frame",include.data=TRUE) |>
ggplot(mapping=aes(x=year,y=pop,group=.id,color=(.id=="data")))+
scale_color_manual(values=c(`TRUE`="black",`FALSE`="grey50"),
labels=c(`FALSE`="simulation",`TRUE`="data"))+
labs(color="")+
geom_line()+
facet_wrap(~.id)+
theme_bw()
Visual comparison of simulations and data is always a good idea. An indication that the data are not a plausible realization of the model is evidence for lack of fit. In particular, if we have any set of summary statistics, or probes, we can apply them to both simulated and actual data. The probe
function facilitates this comparison. Let’s perform this operation using several of the summary statistics provided with pomp: we’ll use the mean, several quantiles, and the autocorrelation at lags 1 and 3.
|>
mlepomp probe(nsim=200,probes=list(
mean=probe_mean("pop"),
q=probe_quantile("pop",probs=c(0.05,0.25,0.5,0.75,0.95)),
probe_acf("pop",lags=c(1,3),type="corr",transform=log)
vp_probe
)) ->
vp_probe
## <object of class 'probed_pomp'>
For ‘probed_pomp’ object, there are summary
and plot
methods. There is also an as.data.frame
method.
summary(vp_probe)
## $coef
## loglik loglik.se r K sigma N_0
## -141.4358208 0.3305147 7.4871075 214.7936236 0.9507384 150.9856427
## b
## 1.0000000
##
## $nsim
## [1] 200
##
## $quantiles
## mean q.5% q.25% q.50% q.75% q.95% acf[1] acf[3]
## 0.300 0.465 0.605 0.165 0.190 0.660 0.665 0.785
##
## $pvals
## mean q.5% q.25% q.50% q.75% q.95% acf[1] acf[3]
## 0.6069652 0.9353234 0.7562189 0.3383085 0.3880597 0.6766169 0.6766169 0.4378109
##
## $synth.loglik
## [1] -19.67887
Evidently, summary
returns a list with several elements. The quantiles
element contains, for each probe, what fraction of the nsim
simulations had probe values below the value of the probe applied to the data. The pvals
element contains P-values associated with the two-sided test of the hypothesis that the data were generated by the model.
plot(vp_probe)
The plot depicts the multivariate distribution of the probes under the model, with the data-values superimposed. On the diagonal, we see the marginal distributions of the individual probes, represented as histograms, with the vertical line signifying the value of the corresponding probe on the data. Above the diagonal, the scatterplots show the pairwise distributions of probes and the crosshairs, the corresponding data-values. Below the diagonal, the panels contains the pairwise correlations among the simulated probes.
To this point, we’ve seen how to implement POMP models, simulate them, to compute and maximize likelihoods, and perform certain kinds of diagnostic checks. The pomp website contains more documentation, including the full package manual, and a variety of tutorials and short courses. The package itself contains a number of built-in examples and datasets that can be explored.
pomp provides a large toolbox of different inference methods, only a few of which have been explored here. In particular, the package provides other methods for parameter estimation, both in the frequentist and Bayesian modes. See for example (abc
, pmcmc
, probe.match
, spect.match
, nlf
, enkf
, bsmc2
). It also provides a variety of tools for model checking (spect
, nlf
). It is frequently the case that an approach that makes use of more than one approach has advantages over more “purist” approaches: the main goal of the package is to facilitate effective inference by bringing a variety of tools, with complementary strengths and weaknesses, to the user in a common format.
Although the goal of this document has been to introduce the beginner to the package through a display of the pomp toolbox in the context of a rudimentary and incomplete data analysis of a short time series with toy models, it is important to realize that these tools have proven their utility on some extremely challenging problems, including some for which other existing methods are either less efficient or entirely infeasible. The bibliography has links to peer-reviewed publications that have used these methods.
This document was produced with the following software versions:
R | 4.4.2 |
pomp | 5.11 |
circumstance | 0.0.10.1 |
coda | 0.19.4.1 |
foreach | 1.5.2 |
doFuture | 1.0.1 |
subplex | 1.9 |
tidyverse | 2.0.0 |
Andrieu C, Doucet A, Holenstein R (2010). “Particle Markov Chain Monte Carlo Methods.” Journal of the Royal Statistical Society: Series B, 72(3), 269–342. https://doi.org/10.1111/j.1467-9868.2009.00736.x.
Arulampalam MS, Maskell S, Gordon N, Clapp T (2002). “A Tutorial on Particle Filters for Online Nonlinear, Non-Gaussian Bayesian Tracking.” IEEE Transactions on Signal Processing, 50, 174–188. https://doi.org/10.1109/78.978374.
Ionides EL, Bretó C, King AA (2006). “Inference for Nonlinear Dynamical Systems.” Proceedings of the National Academy of Sciences, 103(49), 18438–18443. https://doi.org/10.1073/pnas.0603181103.
Ionides EL, Nguyen D, Atchadé Y, Stoev S, King AA (2015). “Inference for Dynamic and Latent Variable Models via Iterated, Perturbed Bayes Maps.” Proceedings of the National Academy of Sciences, 112(3), 719–724. https://doi.org/10.1073/pnas.1410597112.
McCleery RH, Perrins CM (1991). “Effects of Predation on the Numbers of Great Tits Parus Major.” In Bird population studies. Relevence to conservation and management 129–147. Oxford University Press, Oxford. https://doi.org/10.1093/oso/9780198577300.003.0006.
Ricker WE (1954). “Stock and Recruitment.” Journal of the Fisheries Research Board of Canada, 11, 559–623. https://doi.org/10.1139/f54-039.