Produced in R version 3.5.2.

• We’re going to demonstrate what happens when we attempt to compute the likelihood for the boarding school flu data by direct simulation from.

• First, let’s reconstruct the toy SIR model we were working with.

rproc <- Csnippet("
double N = 763;
double t1 = rbinom(S,1-exp(-Beta*I/N*dt));
double t2 = rbinom(I,1-exp(-mu_I*dt));
double t3 = rbinom(R1,1-exp(-mu_R1*dt));
double t4 = rbinom(R2,1-exp(-mu_R2*dt));
S  -= t1;
I  += t1 - t2;
R1 += t2 - t3;
R2 += t3 - t4;
")

init <- Csnippet("
S = 762;
I = 1;
R1 = 0;
R2 = 0;
")

dmeas <- Csnippet("
lik = dpois(B,rho*R1+1e-6,give_log);
")

rmeas <- Csnippet("
B = rpois(rho*R1+1e-6);
")

bsflu %>%
select(day,B) %>%
pomp(times="day",t0=0,
rprocess=euler(rproc,delta.t=1/5),
rinit=init,rmeasure=rmeas,dmeasure=dmeas,
statenames=c("S","I","R1","R2"),
paramnames=c("Beta","mu_I","mu_R1","mu_R2","rho")) -> flu

Let’s generate a large number of simulated trajectories at some particular point in parameter space.

flu %>%
simulate(params=c(Beta=3,mu_I=1/2,mu_R1=1/4,mu_R2=1/1.8,rho=0.9),
nsim=5000,format="arrays") -> x
matplot(time(flu),t(x$states["R1",1:50,]),type='l',lty=1, xlab="time",ylab=expression(R[1]),bty='l',col='blue') lines(time(flu),obs(flu,"B"),lwd=2,col='black') We can use the function dmeasure to evaluate the log likelihood of the data given the states, the model, and the parameters: ell <- dmeasure(flu,y=obs(flu),x=x$states,times=time(flu),log=TRUE,
params=c(Beta=3,mu_I=1/2,mu_R1=1/4,mu_R2=1/1.8,rho=0.9))
dim(ell)
## [1] 5000   14

According to the general equation for likelihood by direct simulation, we should sum up the log likelihoods across time:

ell <- apply(ell,1,sum)
summary(exp(ell))
##       Min.    1st Qu.     Median       Mean    3rd Qu.       Max.
##  0.000e+00  0.000e+00  0.000e+00 1.090e-139  0.000e+00 5.446e-136
• The variability in the individual likelihoods is high and therefore the likelihood esitmate is imprecise. We will need many simulations to get an estimate of the likelihood sufficiently precise to be of any use in parameter estimation or model selection.

• What is the problem?

• Essentially, very few of the trajectories pass anywhere near the data and therefore almost all have extremely bad likelihoods. Moreover, once a trajectory diverges from the data, it almost never comes back. While the calculation is “correct” in that it will converge to the true likelihood as the number of simulations tends to $$\infty$$, we waste a lot of effort investigating trajectories of very low likelihood.

• This is a consequence of the fact that we are proposing trajectories in a way that is completely unconditional on the data.

• The problem will get much worse with longer data sets.