Post-hoc MCMC with glmmTMB

Ben Bolker and Mollie Brooks

2025-04-02

Overview

One commonly requested feature is to be able to run a post hoc Markov chain Monte Carlo analysis based on the results of a frequentist fit. This is often a reasonable shortcut for computing confidence intervals and p-values that allow for finite-sized samples rather than relying on asymptotic sampling distributions. This vignette shows examples of such an analysis. Some caveats:

The first MCMC sampler illustrated below is conceptually simple (Metropolis with a multivariate normal candidate distribution). Users who want to do MCMC sampling with less code or on a regular basis should consider the tmbstan package, which does more efficient hybrid/Hamiltonian Monte Carlo sampling, and can take full advantage of glmmTMB's ability to provide gradients of the log-posterior with respect to the parameters. More info on tmbstan and the methods that are available to use on the samples can be found on Github (e.g. methods(class="stanfit")). It is safe to skip the Metropolis-Hastings (DIY) section of the salamander example below if the user prefers to use tmbstan.

In general, you can plug the objective function from glmmTMB (i.e., fitted_model$obj$fn) into any general-purpose MCMC package (e.g. adaptMCMC, MCMCpack (using MCMCmetrop1R()), ensemblemcmc, ...). However, most of these algorithms do not take advantage of glmmTMB's automatic computation of gradients; for that, we will demonstrate the use of tmbstan, which uses glmmTMB's objective and gradient functions in conjunction with the sampling algorithms from Stan.

Setup

Load packages:

library(glmmTMB)
library(coda)     ## MCMC utilities
library(reshape2) ## for melt()
## graphics
library(lattice)
library(ggplot2); theme_set(theme_bw())

Salamander example

Fit basic model:

fm1 <- glmmTMB(count ~ mined + (1|site),
    zi=~mined,
    family=poisson, data=Salamanders)

Metropolis-Hastings (DIY)

This is a basic block Metropolis sampler, based on Florian Hartig's code here.

##' @param start starting value
##' @param V variance-covariance matrix of MVN candidate distribution
##' @param iterations total iterations
##' @param nsamp number of samples to store
##' @param burnin number of initial samples to discard
##' @param thin thinning interval
##' @param tune tuning parameters; expand/contract V
##' @param seed random-number seed
run_MCMC <- function(start,
                     V,   
                     logpost_fun,
                     iterations = 10000,
                     nsamp = 1000,
                     burnin = iterations/2,
                     thin = floor((iterations-burnin)/nsamp),
                     tune = NULL,
                     seed=NULL
                     ) {
    ## initialize
    if (!is.null(seed)) set.seed(seed)
    if (!is.null(tune)) {
        tunesq <- if (length(tune)==1) tune^2 else outer(tune,tune)
        V <-  V*tunesq
    }
    chain <- matrix(NA, nsamp+1, length(start))
    chain[1,] <- cur_par <- start
    postval <- logpost_fun(cur_par)
    j <- 1
    for (i in 1:iterations) {
        proposal = MASS::mvrnorm(1,mu=cur_par, Sigma=V)
        newpostval <- logpost_fun(proposal)
        accept_prob <- exp(newpostval - postval)
        if (runif(1) < accept_prob) {
            cur_par <- proposal
            postval <- newpostval
        }
        if ((i>burnin) && (i %% thin == 1)) {
            chain[j+1,] <- cur_par
            j <- j + 1
        }
    }
    chain <- na.omit(chain)
    colnames(chain) <- names(start)
    chain <- coda::mcmc(chain)
    return(chain)
}

Set up for MCMC: define scaled log-posterior function (in this case the log-likelihood function); extract coefficients and variance-covariance matrices as starting points.

## FIXME: is there a better way for user to extract full coefs?
rawcoef <- with(fm1$obj$env,last.par[-random])
names(rawcoef) <- make.names(names(rawcoef), unique=TRUE)
## log-likelihood function 
## (run_MCMC wants *positive* log-lik)
logpost_fun <- function(x) -fm1$obj$fn(x)
## check definitions
stopifnot(all.equal(c(logpost_fun(rawcoef)),
                    c(logLik(fm1)),
          tolerance=1e-7))
V <- vcov(fm1,full=TRUE)

Run the chain:

t1 <- system.time(m1 <- run_MCMC(start=rawcoef,
                                 V=V, logpost_fun=logpost_fun,
                                 seed=1001))

(running this chain takes 7.7 seconds)

Add more informative names and transform correlation parameter (see vignette on covariance structures and parameters):

colnames(m1) <- colnames(vcov(fm1, full = TRUE))
colnames(m1)[ncol(m1)] <- "sd_site"

We could look at a traceplot (redacted for space, but appears OK):

lattice::xyplot(m1,layout=c(2,3),asp="fill")
print(effectiveSize(m1),digits=3)
##    (Intercept)        minedno zi~(Intercept)     zi~minedno        sd_site 
##            262            243            258            259            249

These effective sizes are probably still too small. In a real analysis we would stop and make sure we had addressed the mixing/convergence problems before proceeding; for this simple sampler, some of our choices would be (1) simply run the chain for longer; (2) tune the candidate distribution (e.g. by using tune to scale some parameters, or perhaps by switching to a multivariate Student t distribution [see the mvtnorm package]); (3) add regularizing priors.

Ignoring the problems and proceeding, we can compute column-wise quantiles or highest posterior density intervals (coda::HPDinterval) to get confidence intervals. Plotting posterior distributions, omitting the intercept because it's on a very different scale.

tmbstan

The tmbstan package allows direct, simple access to a hybrid/Hamiltonian Monte Carlo algorithm for sampling from a TMB object; the $obj component of a glmmTMB fit is such an object. (To run this example you'll need to install the tmbstan package and its dependencies.)

library(tmbstan)
t2 <- system.time(m2 <- tmbstan(fm1$obj, seed = 101))

Running this command, which creates 4 chains, takes 5.6 seconds, with no parallelization. If you're going to do this a lot, you can use argument cores=n for running chains in parallel. The argument is sent to rstan::sampling so you should look at the helpfile ?rstan::sampling. Running bayestestR::diagnostic_posterior() on the fit gives the following results:

Parameter ESS MCSE Rhat
beta[1] 736 0.010 1.001
beta[2] 709 0.011 1.001
betazi[1] 998 0.008 1.001
betazi[2] 1137 0.008 1.001
theta 556 0.015 1.012

A trace plot (rstan::traceplot(m2, pars=c("beta","betazi","theta"))):

(also redacted for space, also looks OK)

Pairs plot (pairs(m2, pars = c("beta", "betazi"), gap = 0)): (also redacted for space, also looks OK)

Sleep study example

Now using the (now) classic sleepstudy data set included with the lme4 package:

data("sleepstudy", package = "lme4")
fm2 <- glmmTMB(Reaction ~ Days + (Days | Subject), data = sleepstudy)
t3 <- system.time(m3 <- tmbstan(fm2$obj, seed = 101))

Running this chain produces many warnings about divergent transitions, low effective sample size, etc.; the diagnostic table confirms this.

Diagnostics:

dp3 <- bayestestR::diagnostic_posterior(m3)
Parameter ESS MCSE Rhat
beta[1] 11 2.339 1.138
beta[2] 14 0.473 1.164
betadisp 8 0.026 1.201
theta[1] 18 0.074 1.157
theta[2] 23 0.056 1.143
theta[3] 3 205.252 2.073

Most of the trace plot (redacted) looks OK. However, one sub-panel shows that on of the parameters (theta[3], controlling the intercept/slope correlation) is problematic for one of the chains:

One way to address these problems is to add bounds that prevent the chains from going outside a range of \(\pm 5\) standard errors from the MLE (we originally used \(\pm 3 \sigma\), but it looked like these bounds were being hit too frequently - if possible we want bounds that will prevent crazy behaviour but not otherwise constrain the chains ...)

sdrsum <- summary(fm2$sdr)
par_est <- sdrsum[,"Estimate"]
par_sd <- sdrsum[,"Std. Error"]
t4 <- system.time(m4 <- tmbstan(fm2$obj,
                                lower = par_est - 5*par_sd,
                                upper = par_est + 5*par_sd,
                                seed = 101))

The diagnostics and the trace plot look much better now (only theta[3] shown) ...

dp4 <- bayestestR::diagnostic_posterior(m4)
Parameter ESS MCSE Rhat
beta[1] 2782 0.078 1.000
beta[2] 514 0.070 1.006
betadisp 2410 0.001 1.001
theta[1] 445 0.021 1.004
theta[2] 1275 0.006 1.002
theta[3] 1184 0.014 1.002

The last step (for now) is to extract the sampled values, give them interpretable names, and transform the correlation parameter back to a more interpretable scale (see the "Covariance structures" vignette):

samples4 <- as.data.frame(extract(m4, pars=c("beta","betadisp","theta")))
colnames(samples4) <- c(names(fixef(fm2)$cond),
                  "log(sigma)",
                  c("log(sd_Intercept)", "log(sd_Days)", "cor"))
samples4$cor <- sapply(samples4$cor, get_cor)
m4_long <- reshape2::melt(as.matrix(samples4))
ggplot(m4_long, aes(x = value)) + geom_histogram(bins = 50) + facet_wrap(~Var2, scale = "free")

A more principled, properly Bayesian way to handle this problem would be to add sensible regularizing priors. (To be done/tested; as of this writing, priors can be set on fixed effects and on random effect standard deviations, but not on random effect correlations ...)