This lesson is licensed under the Creative Commons Attribution-NonCommercial license. Please share and remix noncommercially, mentioning its origin.
We have seen that fitting mechanistic models to data is a powerful and general approach to estimating parameters. We saw too that least-squares fitting is a straightforward way to do this. However, several issues arose. First, there was an element of arbitrariness in the choice of discrepancy measure. Second, although we could fairly easily obtain point estimates of model parameters using least-squares, it was not clear how we could obtain concomitant estimates of parameter uncertainty (e.g., confidence intervals). Finally, we began to see that there are limits to our ability to estimate parameters. In this lab, we’ll explore these issues, and see that likelihood offers an attractive resolution to the first and second of these, but that the third is a fundamental challenge.
Likelihood has many advantages:
and some disadvantages:
Likelihood is the probability of a given set of data \(D\) having occurred under a particular hypothesis \(H\): \[{\mathcal{L}}(H,D)={\mathbb{P}\left[{D|H}\right]}\]
A simple example: suppose \(n\) individuals participate in a serological survey and \(k\) of these individuals are found to be seropositive. One parameter of interest is the true fraction, \(p\), of the population that has seroconverted. Assuming the sample was drawn at random and the population is large, then the probability of the data (\(m\) of \(n\) individuals seropositive) given the hypothesis that the true probability is \(p\) is \[{\mathbb{P}\left[{D|H}\right]} = \binom{n}{k}\,p^k\,(1-p)^{n-k}.\]
If the true seroprevalence was, say, \(p=0.3\), what does the probability of observing \(k\) seropositives in a sample of size \(n=50\) look like?
p <- 0.3
n <- 50
k <- seq(0,50,by=1)
prob <- dbinom(x=k,size=n,prob=p)
plot(k,prob,type='h',lwd=5,lend=1,
ylab="probability")
The likelihood is a function of the unknown parameters. In this case, if we assume \(n\) is known, then the likelihood is a function of \(p\) alone: \[\\lik(p) = \binom{n}{k}\,p^k\,(1-p)^{n-k}\] Typically the logarithm of this function is more interesting than \(\\lik\) itself. Looking at this function for each of two different surveys:
k1 <- 18
n1 <- 50
p <- seq(0,1,by=0.001)
plot(p,dbinom(x=k1,size=n1,prob=p,log=TRUE),
ylim=c(-10,-2),ylab="log-likelihood",
type='l')
abline(v=k1/n1,col='blue')
k2 <- 243
n2 <- 782
p <- seq(0,1,by=0.001)
plot(p,dbinom(x=k2,size=n2,prob=p,log=TRUE),
ylim=c(-10,-2),ylab="log-likelihood",
type='l')
abline(v=k2/n2,col='blue')
In the above two plots, the likelihood is a function of the model parameter \(p\). Vertical lines show the maximum likelihood estimate (MLE) of \(p\).
How do the two curves just plotted differ from one another? What features of the data are responsible for the differences?
Let’s suppose we have three samples, \(D_1, D_2, D_3\), taken by three different researchers, for the same large population. If these samples are independent, then \[{\mathbb{P}\left[{D|H}\right]} = {\mathbb{P}\left[{D_1|H}\right]}\times{\mathbb{P}\left[{D_2|H}\right]}\times{\mathbb{P}\left[{D_3|H}\right]},\] which means that the likelihood of the full data set is the product of the likelihoods from each of the samples. In other words, the likelihood gives a general recipe for combining data from independent studies. We’d compute the likelihood as follows:
n <- c(13,484,3200)
k <- c(4,217,1118)
dbinom(x=k,size=n,prob=0.2,log=TRUE)
## [1] -1.873761 -79.243371 -197.561806
sum(dbinom(x=k,size=n,prob=0.2,log=TRUE))
## [1] -278.6789
ll.fn <- function (p) {
sum(dbinom(x=k,size=n,prob=p,log=TRUE))
}
p <- seq(0,1,by=0.001)
loglik <- sapply(p,ll.fn)
plot(p,loglik,type='l',ylim=max(loglik)+c(-10,0))