Produced with R version 4.3.2 and pomp version 5.6.
Number of particles
- The initial parameter swarm, {Θ0j,j=1,…,J}, usually consists of J identical replications of some starting parameter vector.
- J is set to be sufficient for particle filtering. Because the addition of random perturbations acts to combat particle depletion, it is typically possible to take J substantially smaller than the value needed to obtain precise likelihood estimates via
pfilter
. By the time of the last iteration (m=M) one should not have effective sample size close to 1.
Perturbations
- Perturbations are usually chosen to be multivariate normal, with σm being a scale factor for iteration m: hn(θ|φ;σ)∼N[φ,σ2mVn].
- Vn is usually taken to be diagonal, Vn=(v21,n00⋯00v22,n0⋯000v23,n⋯0⋮⋮⋮⋱⋮000⋯v2p,n).
- If θi is a parameter that affects the dynamics or observations throughout the time series, it is called a regular parameter, and it is often appropriate to specify vi,n=vi.
- If θj is a parameter that affects only the initial conditions of the dynamic model, it is called an initial value parameter (IVP) and it is appropriate to specify vj,n={vjif n=00if n>0
- If θk is a break-point parameter that models how the system changes at time tq, then θk is like an IVP at time tq and it is appropriate to specify vj,n={vjif n=q0if n≠q
Cooling schedule
- σ1:M is called a cooling schedule, following a thermodynamic analogy popularized by simulated annealing. As σm becomes small, the system cools toward a “freezing point”. If the algorithm is working successfully, the freezing point should be close to the lowest-energy state of the system, i.e., the MLE. Typical choices of the cooling schedule are geometric, σm=αm, and hyperbolic, σm∝1/(1+αm). In
mif2
, the cooling schedule is parameterized by σ50, the cooling fraction after 50 IF2 iterations.
Maximizing the likelihood in stages
- Early on in an investigation, one might take M=100 and σM=0.1. This allows for a relatively broad search of the parameter space.
- As the investigation proceeds, and one finds oneself in the heights of the likelihood surface, one can refine the search. In doing so, it helps to examine diagnostic plots.
- In particular, one typically needs to reduce the magnitude of the perturbations (
rw.sd
) and perhaps adjust the cooling schedule (cooling.fraction.50
) to eke out the last few units of log likelihood.
- Profile likelihood computations are not only valuable as a way of obtaining confidence intervals. It is often the case that by profiling, one simplifies the task of finding the MLE. Of course, the price that is paid is that a profile calculation requires multiple parallel IF2 computations.
Exercises
Assessing and improving algorithmic parameters
Develop your own heuristics to try to improve the performance of mif2
in the Consett measles example. Specifically, for a global optimization procedure carried out using random starting values in the specified box, let ˆΘmax be a random Monte Carlo estimate of the resulting MLE, and let ˆθ be the true (unknown) MLE. We can define the maximization error in the log likelihood to be e=ℓ(ˆθ)−E[ℓ(ˆΘmax)]. We cannot directly evaluate e, since there is also Monte Carlo error in our evaluation of ℓ(θ), but we can compute it up to a known precision. Plan some code to estimates e for a search procedure using a computational effort of JM=2×107, comparable to that used for each mif computation in the global search. Discuss the strengths and weaknesses of this quantification of optimization success. See if you can choose J and M subject to this constraint, together with choices of rw.sd
and the cooling rate, cooling.fraction.50
, to arrive at a quantifiably better procedure. Computationally, you may not be readily able to run your full procedure, but you could run a quicker version of it.
References