Produced in R version 3.6.1.

We’re going to demonstrate what happens when we attempt to compute the likelihood for the Consett measles outbreak data by direct simulation from.

First, let’s reconstruct the toy SIR model we were working with.

library(tidyverse)
library(pomp)

sir_step <- Csnippet("
double dN_SI = rbinom(S,1-exp(-Beta*I/N*dt));
double dN_IR = rbinom(I,1-exp(-mu_IR*dt));
S -= dN_SI;
I += dN_SI - dN_IR;
R += dN_IR;
H += dN_IR;
")

sir_init <- Csnippet("
S = nearbyint(eta*N);
I = 1;
R = nearbyint((1-eta)*N);
H = 0;
")

dmeas <- Csnippet("
lik = dbinom(reports,H,rho,give_log);
")

rmeas <- Csnippet("
reports = rbinom(H,rho);
")

select(week,reports=cases) %>%
filter(week<=42) %>%
pomp(
times="week",t0=0,
rprocess=euler(sir_step,delta.t=1/7),
rinit=sir_init,
rmeasure=rmeas,
dmeasure=dmeas,
accumvars="H",
statenames=c("S","I","R","H"),
paramnames=c("Beta","mu_IR","eta","rho","N"),
params=c(Beta=15,mu_IR=0.5,rho=0.5,eta=0.06,N=38000)
) -> measSIR

Let’s generate a large number of simulated trajectories at some particular point in parameter space.

measSIR %>%
simulate(nsim=5000,format="arrays") -> x
sims <- coef(measSIR,"rho")*x$states["H",,] matplot(time(measSIR),t(sims[1:50,]),type='l',lty=1, xlab="time",ylab=expression(rho*H),bty='l',col='blue') lines(time(measSIR),obs(measSIR,"reports"),lwd=2,col='black') We can use the function dmeasure to evaluate the log likelihood of the data given the states, the model, and the parameters: ell <- dmeasure(measSIR,y=obs(measSIR),x=x$states,times=time(measSIR),log=TRUE,
params=coef(measSIR))
dim(ell)
## [1] 5000   42

According to the general equation for likelihood by direct simulation, we should sum up the log likelihoods across time:

ell <- apply(ell,1,sum)
summary(ell)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
##    -Inf    -Inf    -Inf    -Inf    -Inf  -285.2
summary(exp(ell))
##       Min.    1st Qu.     Median       Mean    3rd Qu.       Max.
##  0.000e+00  0.000e+00  0.000e+00 4.194e-128  0.000e+00 1.416e-124
• The variability in the individual likelihoods is high and therefore the likelihood esitmate is imprecise. We will need many simulations to get an estimate of the likelihood sufficiently precise to be of any use in parameter estimation or model selection.

• What is the problem?

• Essentially, very few of the trajectories pass anywhere near the data and therefore almost all have extremely bad likelihoods. Moreover, once a trajectory diverges from the data, it almost never comes back. While the calculation is “correct” in that it will converge to the true likelihood as the number of simulations tends to $$\infty$$, we waste a lot of effort investigating trajectories of very low likelihood.

• This is a consequence of the fact that we are proposing trajectories in a way that is completely unconditional on the data.

• The problem will get much worse with longer data sets.

Back