One commonly requested feature is to be able to run a post hoc Markov chain Monte Carlo analysis based on the results of a frequentist fit. This is often a reasonable shortcut for computing confidence intervals and p-values that allow for finite-sized samples rather than relying on asymptotic sampling distributions. This vignette shows examples of such an analysis. Some caveats:
coda
package) before drawing conclusions. tmbstan
does some checking automatically, so it produces warnings where the DIY sampler below does not.The first MCMC sampler illustrated below is conceptually simple (Metropolis with a multivariate normal candidate distribution). Users who want to do MCMC sampling with less code or on a regular basis should consider the tmbstan package, which does more efficient hybrid/Hamiltonian Monte Carlo sampling, and can take full advantage of glmmTMB
's ability to provide gradients of the log-posterior with respect to the parameters. More info on tmbstan
and the methods that are available to use on the samples can be found on Github (e.g. methods(class="stanfit")
). It is safe to skip the Metropolis-Hastings (DIY) section of the salamander example below if the user prefers to use tmbstan
.
In general, you can plug the objective function from glmmTMB
(i.e., fitted_model$obj$fn
) into any general-purpose MCMC package (e.g. adaptMCMC, MCMCpack (using MCMCmetrop1R()
), ensemblemcmc, ...). However, most of these algorithms do not take advantage of glmmTMB
's automatic computation of gradients; for that, we will demonstrate the use of tmbstan, which uses glmmTMB
's objective and gradient functions in conjunction with the sampling algorithms from Stan.
Load packages:
library(glmmTMB)
library(coda) ## MCMC utilities
library(reshape2) ## for melt()
## graphics
library(lattice)
library(ggplot2); theme_set(theme_bw())
Fit basic model:
fm1 <- glmmTMB(count ~ mined + (1|site),
zi=~mined,
family=poisson, data=Salamanders)
This is a basic block Metropolis sampler, based on Florian Hartig's code here.
##' @param start starting value
##' @param V variance-covariance matrix of MVN candidate distribution
##' @param iterations total iterations
##' @param nsamp number of samples to store
##' @param burnin number of initial samples to discard
##' @param thin thinning interval
##' @param tune tuning parameters; expand/contract V
##' @param seed random-number seed
run_MCMC <- function(start,
V,
logpost_fun,
iterations = 10000,
nsamp = 1000,
burnin = iterations/2,
thin = floor((iterations-burnin)/nsamp),
tune = NULL,
seed=NULL
) {
## initialize
if (!is.null(seed)) set.seed(seed)
if (!is.null(tune)) {
tunesq <- if (length(tune)==1) tune^2 else outer(tune,tune)
V <- V*tunesq
}
chain <- matrix(NA, nsamp+1, length(start))
chain[1,] <- cur_par <- start
postval <- logpost_fun(cur_par)
j <- 1
for (i in 1:iterations) {
proposal = MASS::mvrnorm(1,mu=cur_par, Sigma=V)
newpostval <- logpost_fun(proposal)
accept_prob <- exp(newpostval - postval)
if (runif(1) < accept_prob) {
cur_par <- proposal
postval <- newpostval
}
if ((i>burnin) && (i %% thin == 1)) {
chain[j+1,] <- cur_par
j <- j + 1
}
}
chain <- na.omit(chain)
colnames(chain) <- names(start)
chain <- coda::mcmc(chain)
return(chain)
}
Set up for MCMC: define scaled log-posterior function (in this case the log-likelihood function); extract coefficients and variance-covariance matrices as starting points.
## FIXME: is there a better way for user to extract full coefs?
rawcoef <- with(fm1$obj$env,last.par[-random])
names(rawcoef) <- make.names(names(rawcoef), unique=TRUE)
## log-likelihood function
## (run_MCMC wants *positive* log-lik)
logpost_fun <- function(x) -fm1$obj$fn(x)
## check definitions
stopifnot(all.equal(c(logpost_fun(rawcoef)),
c(logLik(fm1)),
tolerance=1e-7))
V <- vcov(fm1,full=TRUE)
Run the chain:
t1 <- system.time(m1 <- run_MCMC(start=rawcoef,
V=V, logpost_fun=logpost_fun,
seed=1001))
(running this chain takes 7.7 seconds)
Add more informative names and transform correlation parameter (see vignette on covariance structures and parameters):
colnames(m1) <- colnames(vcov(fm1, full = TRUE))
colnames(m1)[ncol(m1)] <- "sd_site"
We could look at a traceplot (redacted for space, but appears OK):
lattice::xyplot(m1,layout=c(2,3),asp="fill")
print(effectiveSize(m1),digits=3)
## (Intercept) minedno zi~(Intercept) zi~minedno sd_site
## 262 243 258 259 249
These effective sizes are probably still too small. In a real analysis we would stop and make sure we had addressed the mixing/convergence problems before proceeding; for this simple sampler, some of our choices would be (1) simply run the chain for longer; (2) tune the candidate distribution (e.g. by using tune
to scale some parameters, or perhaps by switching to a multivariate Student t distribution [see the mvtnorm
package]); (3) add regularizing priors.
Ignoring the problems and proceeding, we can compute column-wise quantiles or highest posterior density intervals (coda::HPDinterval
) to get confidence intervals. Plotting posterior distributions, omitting the intercept because it's on a very different scale.
The tmbstan
package allows direct, simple access to a hybrid/Hamiltonian Monte Carlo algorithm for sampling from a TMB object; the $obj
component of a glmmTMB
fit is such an object. (To run this example you'll need to install the tmbstan
package and its dependencies.)
library(tmbstan)
t2 <- system.time(m2 <- tmbstan(fm1$obj, seed = 101))
Running this command, which creates 4 chains, takes 5.6 seconds, with no parallelization. If you're going to do this a lot, you can use argument cores=n
for running chains in parallel. The argument is sent to rstan::sampling
so you should look at the helpfile ?rstan::sampling
. Running bayestestR::diagnostic_posterior()
on the fit gives the following results:
Parameter | ESS | MCSE | Rhat |
---|---|---|---|
beta[1] | 736 | 0.010 | 1.001 |
beta[2] | 709 | 0.011 | 1.001 |
betazi[1] | 998 | 0.008 | 1.001 |
betazi[2] | 1137 | 0.008 | 1.001 |
theta | 556 | 0.015 | 1.012 |
A trace plot (rstan::traceplot(m2, pars=c("beta","betazi","theta"))
):
(also redacted for space, also looks OK)
Pairs plot (pairs(m2, pars = c("beta", "betazi"), gap = 0)
): (also redacted for space, also looks OK)
Now using the (now) classic sleepstudy
data set included with the lme4
package:
data("sleepstudy", package = "lme4")
fm2 <- glmmTMB(Reaction ~ Days + (Days | Subject), data = sleepstudy)
t3 <- system.time(m3 <- tmbstan(fm2$obj, seed = 101))
Running this chain produces many warnings about divergent transitions, low effective sample size, etc.; the diagnostic table confirms this.
Diagnostics:
dp3 <- bayestestR::diagnostic_posterior(m3)
Parameter | ESS | MCSE | Rhat |
---|---|---|---|
beta[1] | 11 | 2.339 | 1.138 |
beta[2] | 14 | 0.473 | 1.164 |
betadisp | 8 | 0.026 | 1.201 |
theta[1] | 18 | 0.074 | 1.157 |
theta[2] | 23 | 0.056 | 1.143 |
theta[3] | 3 | 205.252 | 2.073 |
Most of the trace plot (redacted) looks OK. However, one sub-panel shows that on of the parameters (theta[3]
, controlling the intercept/slope correlation) is problematic for one of the chains:
One way to address these problems is to add bounds that prevent the chains from going outside a range of \(\pm 5\) standard errors from the MLE (we originally used \(\pm 3 \sigma\), but it looked like these bounds were being hit too frequently - if possible we want bounds that will prevent crazy behaviour but not otherwise constrain the chains ...)
sdrsum <- summary(fm2$sdr)
par_est <- sdrsum[,"Estimate"]
par_sd <- sdrsum[,"Std. Error"]
t4 <- system.time(m4 <- tmbstan(fm2$obj,
lower = par_est - 5*par_sd,
upper = par_est + 5*par_sd,
seed = 101))
The diagnostics and the trace plot look much better now (only theta[3]
shown) ...
dp4 <- bayestestR::diagnostic_posterior(m4)
Parameter | ESS | MCSE | Rhat |
---|---|---|---|
beta[1] | 2782 | 0.078 | 1.000 |
beta[2] | 514 | 0.070 | 1.006 |
betadisp | 2410 | 0.001 | 1.001 |
theta[1] | 445 | 0.021 | 1.004 |
theta[2] | 1275 | 0.006 | 1.002 |
theta[3] | 1184 | 0.014 | 1.002 |
The last step (for now) is to extract the sampled values, give them interpretable names, and transform the correlation parameter back to a more interpretable scale (see the "Covariance structures" vignette):
samples4 <- as.data.frame(extract(m4, pars=c("beta","betadisp","theta")))
colnames(samples4) <- c(names(fixef(fm2)$cond),
"log(sigma)",
c("log(sd_Intercept)", "log(sd_Days)", "cor"))
samples4$cor <- sapply(samples4$cor, get_cor)
m4_long <- reshape2::melt(as.matrix(samples4))
ggplot(m4_long, aes(x = value)) + geom_histogram(bins = 50) + facet_wrap(~Var2, scale = "free")
A more principled, properly Bayesian way to handle this problem would be to add sensible regularizing priors. (To be done/tested; as of this writing, priors can be set on fixed effects and on random effect standard deviations, but not on random effect correlations ...)