Think about possible improvements on the assignment of randomized starting values for the parameter estimation searches. Propose and try out a modification of the procedure. Does it make a difference?


The starting points are determined by the specification of the region in which they are chosen, and how they are selected within this region. We have followed a fairly simple and natural choice to make independent uniform draws from a specified box. profileDesign() also supports a pseudo-random Sobol design rather than independent draws. Whether a specified box has a reasonable extent can be checked by plotting the starting values on a scatterplot with the candidate maximizations obtained, as demonstrated in the global search example from the iterated filtering tutorial.

The transformed scale may be more reasonable for selecting uniformly distributed starting values. For example, if we are not sure about the order of magnitude of a non-negative parameter, and we set its box interval to \([0.1,10]\), we likely want to spend half our search effort with initial values in the interval \([0.1,1]\). Sampling unformly on the log scale achieves this. We can change our code as follows to implement this for the polio profile calculation.

idx <- colnames(box)!="rho"
) -> starts


idx <- which(colnames(box)!="rho")
) |>
starts <- data.frame(t(partrans(polio,t(trans_starts),dir="fromEst")))

A comparable change was also made for the global search. The results from running the modified version are at starting-values-exercise/main.Rout.save. The maximized log likelihood of \(-794.7\) is no substantial improvement on the previous value of \(-794.7\).

However, the pairs plot of estimates from the global search does reveal an advantage for the modified scale. The distinct mode with low \(\sigma_{\mathrm{env}}\) is better explored with the modified starting value distribution. When making uniform draws on an untransformed scale, very few initial values explore a region where a parameter has small order of magnitude. In this case, the small \(\sigma_{\mathrm{env}}\) mode has likelihood around 15 log units short of the maximum so it is not a competitive explanation of the data. However, in some other situation, this shows how transforming the starting value distribution could be useful.