R codes for this document


Exercise

In all of the foregoing, we have assumed a fixed value of the dispersion parameter \(k\), of the negative binomial measurement model. We’ve also fixed one or the other of \(\mu_{IR}\), \(\eta\). Now attempt to estimate all the parameters simultaneously. How much is the fit improved?


Solution

Setup

First, we set up the problem, loading packages, initializing a parallel environment, constructing our SIR model, and identifying the putative MLE. The pomp constructed by the source command below is named measSIR. We take for our putative MLE the point in our database with the highest measured likelihood.

library(tidyverse)
library(pomp)
library(iterators)
library(doFuture)
plan(multisession)

source("https://kingaa.github.io/sbied/pfilter/model.R")

read_csv("measles_params.csv") |>
  filter(!is.na(loglik), loglik.se<1) |>
  filter(loglik==max(loglik)) |>
  select(-loglik,-loglik.se) -> coef(measSIR)

The putative MLE is

3.63
2.00
0.06
10.00
0.61
38000.00

A profile calculation

Because this ridge appears flat in the \(\eta\)-direction, we will profile over \(\eta\) to improve our estimates. The following sets up a design of guesses, to be used as starting points for the profile computation.

read_csv("fitall_params.csv") |>
  filter(loglik>max(loglik)-10,loglik.se<1) |>
  sapply(range) -> box

freeze(
  profile_design(
    eta=seq(0.01,0.99,length=40),
    lower=box[1,c("Beta","rho","k","mu_IR")],
    upper=box[2,c("Beta","rho","k","mu_IR")],
    nprof=25, type="runif"
  ),
  seed=1893696051
)-> guesses
foreach(guess=iter(guesses,"row"), .combine=rbind,
  .options.future=list(seed=830007657)
) %dofuture% {
  library(tidyverse)
  library(pomp)
  mf1 |>
    mif2(
      Nmif=100,Np=2000,
      params=c(unlist(guess),fixed_params),
      rw.sd=rw_sd(Beta=0.02, rho=0.02, mu_IR=0.02, k=0.02),
      partrans=parameter_trans(log=c("Beta","mu_IR","k"),logit="rho"),
      paramnames=c("Beta","mu_IR","k","rho")
    ) |>
    mif2(Nmif=100,cooling.fraction.50=0.3) -> mf
  replicate(
    10,
    mf |> pfilter(Np=2000) |> logLik()
  ) |> logmeanexp(se=TRUE) -> ll
  mf |> coef() |> bind_rows() |>
    bind_cols(loglik=ll[1],loglik.se=ll[2])
} -> results

This profile calculation used about \(110\) cpu s per start.

We now examine the profile traces. Note that we include a trace of the basic reproductive number, \(R_0\).

results |>
  filter(
    is.finite(loglik),
    loglik.se<1
  ) |>
  group_by(eta) |>
  filter(rank(-loglik)<=2) |>
  ungroup() |>
  reframe(
    eta=eta,
    `log~L`=loglik,
    `R[0]`=Beta/mu_IR,
    beta=Beta,
    `mu[IR]`=mu_IR,
    k=k,
    rho=rho
  ) |>
  pivot_longer(-`eta`) |>
  ggplot(aes(x=`eta`,y=value))+
  geom_point()+
  labs(y=NULL,x=expression(eta))+
  facet_wrap(~name,scales="free_y",labeller=label_parsed)

Conclusions

Releasing all constraints on the parameters results in only a small improvement in the log likelihood. Indeed, by AIC, the improvement is not judged to be worthwhile. Nevertheless, removing the constraints allows us to evaluate whether our assumptions about known values of certain parameters play a role in the conclusions. In the event, they were not playing much of a role.

Let us return to the central question: How does the model account for the data?

Clearly, the model can account for the data equally well for all but the smallest values of \(\eta\). That is, unless the fraction of susceptibles in the population was too small to allow for an outbreak of the observed size, one can obtain an outbreak that looks like the observed one from an SIR model in a population of sufficient size. This interpretation is confirmed by the fact that, as one decreases \(\eta\), one has to increase the reporting efficiency \(\rho\) to explain the data.

Now, across the confidence interval for \(\eta\), say \((0.03, 1)\), it is clear from the profile trace plots above, that the model needs both \(\beta\) and \(\rho\) to be relatively low and the infectious period to be relatively short (though average infectious periods of as long as \(1.7\) wk are within the confidence interval).

In sum, the model is accounting for the data by saying there was a large and mostly unobserved outbreak in the population, very many of whom were susceptible. As we discussed in the Lesson, the conclusion of a short infectious period is broadly compatible with household studies that suggest the duration of viral shedding is less than a week. On the other hand, the conclusion that \(\rho<0.6\) is more difficult to reconcile with our understanding of measles reporting during this period. Indeed, it would have had to be the case that 1948 was a very sloppy year for the public health authorities in this part of England, since their average reporting efficiency, as we estimated in the Lesson was close to 60%.


Top of this document
Previous page
Back to the lesson
Course homepage
Acknowledgments
CC-BY_NC

Licensed under the Creative Commons Attribution-NonCommercial license. Please share and remix noncommercially, mentioning its origin.