Specification document - a mathematical description of models used by bage.
Note: some features described here have not been implemented yet.
Let yi be a count of events in cell i=1,⋯,n and let wi be the corresponding exposure measure, with the possibility that wi≡1. The likelihood under the Poisson model is then yi∼Poisson(γiwi)γi∼Gamma(ξ−1,(μiξ)−1), using the shape-rates parameterisation of the Gamma distribution. Parameter ξ governs dispersion, with var(γi∣μi,ξ)=ξμ2i and var(yi∣μi,ξ,wi)=(1+ξμiwi)×μiwi. We allow ξ to equal 0, in which case the model reduces to yi∼Poisson(μiwi).
For ξ>0, Equations (3.1) and (3.2) are equivalent to yi∼NegBinom(ξ−1,(1+μiwiξ)−1) (Norton, Christen, and Fox 2018; Simpson 2022). This is the format we use internally for estimation. When values for γi are needed, we generate them on the fly, using the fact that γi∣yi,wi,μi,ξ∼Gamma(yi+ξ−1,wi+(ξμi)−1).
The likelihood under the binomial model is yi∼Binomial(wi,γi)γi∼Beta(ξ−1μi,ξ−1(1−μi)). Parameter ξ again governs dispersion, with var(γi∣μi,ξ)=ξ1+ξ×μi(1−μi) and var(yi∣wi,μi,ξ)=ξwi+1ξ+1×wiμi(1−μi).
We allow ξ to equal 0, in which case the model reduces to yi∼Binom(wi,μi). Equations (3.3) and (3.4) are equivalent to yi∼BetaBinom(wi,ξ−1μi,ξ−1(1−μi)), which is what we use internally. Values for γi can be generated using γi∣yi,wi,μi,ξ∼Beta(yi+ξ−1μi,wi−yi+ξ−1(1−μi)).
The normal model is ˜yi∼N(μi,˜w−1iξ2). where ˜yi is a standardized version of outcome yi, and ˜wi is a standardized version of weight wi. The standardization is carried out using ˜yi=yi−ˉys˜wi=wiˉw. where ˉy=∑ni=1yins=√∑ni=1(yi−ˉy)2n−1ˉw=∑ni=1win.
Standardizing allows us to apply the same priors as we use for the Poisson and binomial models.
Let μμ=(μ1,⋯,μn)⊤. Our model for μμ is μμ=M∑m=0XX(m)ββ(m)+ZZζζ where
Each ββ(m), m>0, can be a main effect, involving a single dimension, or an interaction, involving two dimensions. Some priors, when applied to an interaction, treat one dimension, referred to as the ‘along’ dimension, differently from the remaining dimensions, referred to as ‘by’ dimensions. A random walk prior (Section 5.4), for instance, consists of an independent random walk along the ‘along’ dimension, within each combination of the ‘by’ dimensions.
We use v=1,⋯,Vm to denote position within the ‘along’ dimension, and u=1,⋯,Vm to denote position within a classification formed by the ‘by’ dimensions. When there are no sum-to-zero constraints (see below), Um=∏kdk where dk is the number of elements in the kth ‘by’ variable. When there are sum-to-zero constraints, Um=∏k(dk−1).
If a prior involves an ‘along’ dimension but the user does not specify one, the procedure for choosing a dimension is as follows:
Priors that involve ‘along’ dimensions can optionally include sum-to-zero constraints. If these constrains are applied, then within each element v of the ‘along’ dimension, the sum of the β(m)j across each ‘by’ dimension is zero. For instance, if ββ(m) is an interaction between time, region, and sex, with time as the ‘along’ variable, then within each combination of time and region, the values for females and males sum to zero, and within each combination of time and sex, the values for regions sum to zero.
Except in the case of dynamic SVD-based priors (eg Sections 5.15), the sum-to-zero constraints are implemented internally by drawing values within an unrestricted lower-dimensional space, and then transforming to the restricted higher-dimensional space. For instance, a random walk prior for a time-region interaction with R regions consists of R−1 unrestricted random walks along time, which are converted into R random walks that sum to zero across region. Matrices for transforming between the unrestricted and restricted spaces are constructed using the QR decomposition, as described in Section 1.8.1 of Wood (2017). With dynamic SVD-based priors, we draw values for the SVD coefficients with no constraints, convert these to unconstrained values for ββ(m), and then subtract means.
The intercept term ββ(0) can only be given a fixed-normal prior (Section 5.3) or a Known prior (Section 5.19).
Exchangeable normal
β(m)j∼N(0,τ2m)τm∼N+(0,A(m)2τ)
N(τm∣0,A(m)2τ)Jm∏j=1N(β(m)j∣0,τ2m)
β(m)Jm+h+1∼N(0,τ2m)
N(s = 1)
s
is A(m)τ. Defaults to 1.Exchangeable normal, with fixed standard deviation
β(m)j∼N(0,A(m)2β)
Jm∏j=1N(β(m)j∣0,A(m)2β)
β(m)Jm+h+1∼N(0,A(m)2β)
NFix(sd = 1)
sd
is A(m)τ. Defaults to 1.Random walk
β(m)u,1∼N(0,(A(m)0)2)β(m)u,v∼N(β(m)u,v−1,τ2m),v=2,⋯,Vmτm∼N+(0,(A(m)τ)2)
A(m)0 can be 0, implying that β(m)u,1 is fixed at 0.
When Um>1, sum-to-zero constraints (Section 5.1.2) can be applied.
N(τm∣0,A(m)2τ)Um∏u=1N(β(m)u,1∣0,(A(m)0)2)Vm∏v=2N(β(m)u,v∣β(m)u,v−1,τ2m)
β(m)u,Vm+h∼N(β(m)u,Vm+h−1,τ2m)
If the prior includes sum-to-zero constraints, means are subtracted from the forecasted values within each combination of ‘along’ and ‘by’ variables.
RW(s = 1, along = NULL, zero_sum = TRUE)
s
is A(m)τ. Defaults to 1.sd
is A(m)0. Defaults to 1.along
used to identify ‘along’ and ‘by’ dimensions.zero_sum
is TRUE
, sum-to-zero constraints are applied.Second-order random walk
β(m)u,1∼N(0,(A(m)0)2)β(m)u,2∼N(βu,1,(A(m)η)2)β(m)u,v∼N(2β(m)u,v−1−β(m)u,v−2,τ2m),v=3,⋯,Vmτm∼N+(0,(A(m)τ)2)
A(m)0 can be 0, implying that β(m)u,1 is fixed at 0.
When Um>1, sum-to-zero constraints (Section 5.1.2) can be applied.
N(τm∣0,A(m)2τ)Um∏u=1N(β(m)u,1∣0,(A(m)0)2)N(β(m)u,2∣β(m)u,1,(A(m)η)2)Vm∏v=3N(β(m)u,v∣2β(m)u,v−1−β(m)u,v−2,τ2m)
β(m)u,Vm+h∼N(2β(m)u,Vm+h−1−β(m)u,Vm+h−2,τ2m)
If the prior includes sum-to-zero constraints, means are subtracted from the forecasted values within each combination of ‘along’ and ‘by’ variables.
RW2(s = 1, sd = 1, sd_slope = 1, along = NULL, zero_sum = TRUE)
s
is A(m)τsd
is A(m)0sd_slope
is A(m)ηalong
used to identify ‘along’ and ‘by’ dimensionszero_sum
is TRUE
, sum-to-zero constraints are appliedRandom walk with seasonal effect
β(m)u,v=αu,v+λu,v,v=1,⋯,Vmα(m)u,1∼N(0,(A(m)0)2)α(m)u,v∼N(α(m)u,v−1,τ2m),v=2,⋯,Vmλ(m)u,v∼N(0,A(m)λ),v=1,⋯,Sm−1λ(m)u,v=−Sm−1∑s=1λ(m)u,v−s,v=Sm,2Sm,⋯λ(m)u,v∼N(λ(m)u,v−Sm,ω2m),otherwiseτm∼N+(0,A(m)2τ)ωm∼N+(0,A(m)2ω)
A(m)0 can be 0, implying that α(m)u,1 is fixed at 0.
A(m)2ω can be set to zero, implying that seasonal effects are fixed over time.
When Um>1, sum-to-zero constraints (Section 5.1.2) can be applied.
N(τm∣0,A(m)2τ)N(ωm∣0,A(m)2ω)×Um∏u=1(N(α(m)u,1∣0,(A(m)0)2)Vm∏v=2N(α(m)u,v∣α(m)u,v−1,τ2m)Sm−1∏v=1N(λ(m)u,v∣0,(A(m)λ)2)∏v>Sm,(v−1)mod
\begin{align} \alpha_{J_m+h}^{(m)} & \sim \text{N}(\alpha_{J_m+h-1}^{(m)}, \tau_m^2) \\ \lambda_{J_m+h}^{(m)} & \sim \text{N}(\lambda_{J_m+h-S_m}^{(m)}, \omega_m^2) \\ \beta_{J_m+h}^{(m)} & = \alpha_{J_m+h}^{(m)} + \lambda_{J_m+h}^{(m)} \end{align}
RW_Seas(n_seas, s = 1, sd = 1, s_seas = 0, sd_seas = 1, along = NULL, zero_sum = FALSE)
n_seas
is S_ms
is A_{\tau}^{(m)}sd
is A_0^{(m)}s_seas
is A_{\omega}^{(m)}sd_seas
is A_{\lambda}^{(m)}along
used to identify ‘along’ and ‘by’ dimensionszero_sum
is TRUE
, sum-to-zero constraints are appliedSecond-order random work, with seasonal effect
\begin{align} \beta_{u,v}^{(m)} & = \alpha_{u,v} + \lambda_{u,v}, \quad v = 1, \cdots, V_m \\ \alpha_{u,1}^{(m)} & \sim \text{N}\left(0, (A_0^{(m)})^2 \right) \\ \alpha_{u,2}^{(m)} & \sim \text{N}\left(\alpha_{u,1}^{(m)}, (A_{\eta}^{(m)})^2\right) \\ \alpha_{u,v}^{(m)} & \sim \text{N}(2 \alpha_{u,v-1}^{(m)} - \alpha_{u,v-2}^{(m)}, \tau_m^2), \quad v = 3, \cdots, V_m \\ \lambda_{u,v}^{(m)} & \sim \text{N}(0, A_{\lambda}^{(m)}), \quad v = 1, \cdots, S_m - 1 \\ \lambda_{u,v}^{(m)} & = -\sum_{s=1}^{S_m-1} \lambda_{u,v}^{(m)}, \quad v = S_m,\; 2S_m, \cdots \\ \lambda_{u,v}^{(m)} & \sim \text{N}(\lambda_{u,v-S_m}^{(m)}, \omega_m^2), \quad \text{otherwise} \\ \tau_m & \sim \text{N}^+\left(0, A_{\tau}^{(m)2}\right) \\ \omega_m & \sim \text{N}^+\left(0, A_{\omega}^{(m)2}\right) \end{align}
A_0^{(m)} can be 0, implying that \alpha_{u,1}^{(m)} is fixed at 0.
A_{\omega}^{(m)2} can be set to zero, implying that seasonal effects are fixed over time.
When U_m > 1, sum-to-zero constraints (Section 5.1.2) can be applied.
\begin{equation} \begin{split} & \text{N}(\tau_m \mid 0, A_{\tau}^{(m)2}) \text{N}(\omega_m \mid 0, A_{\omega}^{(m)2}) \\ \quad \times & \prod_{u=1}^{U_m} \left( \text{N}(\alpha_{u,1}^{(m)} \mid 0, (A_0^{(m)})^2 ) \text{N}(\alpha_{u,2}^{(m)} \mid \alpha_{u,1}^{(m)}, (A_{\eta}^{(m)})^2 ) \prod_{v=3}^{V_m} \text{N}(\alpha_{u,v}^{(m)} \mid 2 \alpha_{u,v-1}^{(m)} - \alpha_{u,v-2}^{(m)}, \tau_m^2 ) \\ \prod_{v=1}^{S_m-1} \text{N}(\lambda_{u,v}^{(m)} \mid 0, (A_{\lambda}^{(m)})^2) \prod_{v > S_m, (v-1) \bmod S_m \neq 0}^{V_m} \text{N}(\lambda_{u,v}^{(m)} \mid \lambda_{u,v-S_m}^{(m)}, \omega_m^2) \right) \end{split} \end{equation}
\begin{align} \alpha_{J_m+h}^{(m)} & \sim \text{N}(2 \alpha_{J_m+h-1}^{(m)} - \alpha_{J_m+h-2}^{(m)}, \tau_m^2) \\ \lambda_{J_m+h}^{(m)} & \sim \text{N}(\lambda_{J_m+h-S_m}^{(m)}, \omega_m^2) \\ \beta_{J_m+h}^{(m)} & = \alpha_{J_m+h}^{(m)} + \lambda_{J_m+h}^{(m)} \end{align}
RW2_Seas(n_seas, s = 1, sd = 1, sd_slope = 1, s_seas = 0, sd_seas = 1, along = NULL, zero_sum = FALSE)
n_seas
is S_ms
is A_{\tau}^{(m)}sd
is A_0^{(m)}sd_slope
is A_{\eta}^{(m)}s_seas
is A_{\omega}^{(m)}sd_seas
is A_{\lambda}^{(m)}along
used to identify ‘along’ and ‘by’ dimensionszero_sum
is TRUE
, sum-to-zero constraints are applied\begin{equation} \beta_{u,v}^{(m)} \sim \text{N}\left(\phi_1^{(m)} \beta_{u,v-1}^{(m)} + \cdots + \phi_{K_m}^{(m)} \beta_{u,v-{K_m}}^{(m)}, \omega_m^2\right), \quad v = K_m + 1, \cdots, V_m. \end{equation} Internally, TMB derives values for \beta_{u,v}^{(m)}, v = 1, \cdots, K_m, and for \omega_m, that imply a stationary distribution, and that give every term \beta_{u,v}^{(m)} the same marginal variance. We denote this marginal variance \tau_m^2, and assign it a prior \begin{equation} \tau_m \sim \text{N}^+(0, A_{\tau}^{(m)2}). \end{equation} Each of the \phi_k^{(m)} has prior \begin{equation} \frac{\phi_k^{(m)} + 1}{2} \sim \text{Beta}(S_1^{(m)}, S_2^{(m)}). \end{equation}
\begin{equation} \text{N}^+\left(\tau_m \mid 0, A_{\tau}^{(m)2} \right) \prod_{k=1}^{K_m} \text{Beta}\left(\tfrac{1}{2} \phi_k^{(m)} + \tfrac{1}{2} \mid 2, 2 \right) \prod_{u=1}^{U_m} p\left( \beta_{u,1}^{(m)}, \cdots, \beta_{u,V_m}^{(m)} \mid \phi_1^{(m)}, \cdots, \phi_{K_m}^{(m)}, \tau_m \right) \end{equation} where p\left( \beta_{u,1}^{(m)}, \cdots, \beta_{u,V_m}^{(m)} \mid \phi_1^{(m)}, \cdots, \phi_{K_m}^{(m)}, \tau_m \right) is calculated internally by TMB.
\begin{equation} \beta_{u,V_m + h}^{(m)} \sim \text{N}\left(\phi_1^{(m)} \beta_{u,V_m + h - 1}^{(m)} + \cdots + \phi_{K_m}^{(m)} \beta_{u,V_m+h-K_m}^{(m)}, \tau_m^2\right) \end{equation}
AR(n_coef = 2,
s = 1,
shape1 = 5,
shape2 = 5,
along = NULL,
zero_sum = FALSE)
n_coef
is K_ms
is A_{\tau}^{(m)}shape1
is S_1^{(m)}shape2
is S_2^{(m)}along
is used to indentify the ‘along’ and ‘by’ dimensionsSpecial case or AR()
, with extra options for autocorrelation coefficient.
\begin{align} \beta_{u,1}^{(m)} & \sim \text{N}(0, \tau_m^2), \quad u = 1, \cdots, U_m \\ \beta_{u,v}^{(m)} & \sim \text{N}(\phi_m \beta_{u,v-1}^{(m)}, (1 - \phi_m^2) \tau_m^2), \quad u = 1, \cdots, U_m, \quad v = 2, \cdots, V_m \\ \phi_m & = a_{0,m} + (a_{1,m} - a_{0,m}) \phi_m^{\prime} \\ \phi_m^{\prime} & \sim \text{Beta}(S_1^{(m)}, S_2^{(m)}) \\ \tau_m & \sim \text{N}^+\left(0, A_{\tau}^{(m)2}\right). \end{align} This is adapted from the specification used for AR1 densities in TMB. It implies that the marginal variance of all \beta_{u,v}^{(m)} is \tau_m^2. We require that -1 < a_{0m} < a_{1m} < 1.
\begin{equation} \text{N}(\tau_m \mid 0, A_{\tau}^{(m)2}) \text{Beta}( \phi_m^{\prime} \mid S_1^{(m)}, S_2^{(m)}) \prod_{u=1}^{U_m} \text{N}\left(\beta_{u,1}^{(m)} \mid 0, \tau_m^2 \right) \prod_{u=1}^{U_m} \prod_{j=2}^{V_m} \text{N}\left(\beta_{u,v}^{(m)} \mid \phi_m \beta_{u,v-1}^{(m)}, (1 - \phi_m^2) \tau_m^2 \right) \end{equation}
\begin{equation} \beta_{J_m + h}^{(m)} \sim \text{N}\left(\phi_m \beta_{J_m + h - 1}^{(m)}, (1 - \phi_m^2) \tau_m^2\right) \end{equation}
AR1(s = 1,
shape1 = 5,
shape2 = 5,
min = 0.8,
max = 0.98,
along = NULL,
zero_sum = TRUE)
s
is A_{\tau}^{(m)}shape1
is S_1^{(m)}shape2
is S_2^{(m)}min
is a_{0m}max
is a_{1m}along
is used to identify ‘along’ and ‘by’ dimensionsThe defaults for min
and max
are based on the defaults for function ets()
in R package forecast (Hyndman and Khandakar 2008).
\begin{align} \beta_{u,v}^{(m)} & = \alpha_{u,v}^{(m)} + \epsilon_{u,v}^{(m)} \\ \alpha_{u,v}^{(m)} & = \left(v - \frac{V_m + 1}{2}\right) \eta_u^{(m)} \\ \eta_u^{(m)} & \sim \text{N}\left(B_{\eta}^{(m)}, (A_{\eta}^{(m)})^2\right)\\ \epsilon_{u,v}^{(m)} & \sim \text{N}(0, \tau_m^2) \\ \tau_m & \sim \text{N}^+\left(0, (A_{\tau}^{(m)})^2\right) \end{align}
Note that \sum_{v=1}^{V_m} \alpha_{u,v}^{(m)} = 0.
\begin{equation} \text{N}(\tau_m \mid 0, A_{\tau}^{(m)2}) \prod_{u=1}^{U_m} \text{N}(\eta_u^{(m)} \mid B_{\eta}^{(m)}, A_{\eta}^{(m)2}) \prod_{u=1}^{U_m} \prod_{v=1}^{V_m} \text{N}\left(\beta_{u,v}^{(m)} \mid v - \frac{V_m + 1}{2}, \tau_m^2 \right) \end{equation}
\begin{equation} \beta_{u,V_m + h}^{(m)} \sim \text{N}\left(\left(\frac{V_m - 1}{2}+ h\right) \eta_u^{(m)}, \tau_m^2\right) \end{equation}
Lin(s = 1, mean_slop = 0, sd_slope = 1, along = NULL, zero_sum = FALSE)
s
is A_{\tau}^{(m)}mean_slope
is B_{\eta}^{(m)}sd_slope
is A_{\eta}^{(m)}along
is used to indentify ‘along’ and ‘by’ dimensions\begin{align} \beta_{u,v}^{(m)} & = \alpha_{u,v}^{(m)} + \epsilon_{u,v}^{(m)} \\ \alpha_{u,v}^{(m)} & = \left(v - \frac{V_m + 1}{2}\right) \eta_u^{(m)} \\ \eta_u^{(m)} & \sim \text{N}\left(B_{\eta}^{(m)}, (A_{\eta}^{(m)})^2\right)\\ \epsilon_{u,v}^{(m)} & \sim \text{N}\left(\phi_1^{(m)} \epsilon_{u,v-1}^{(m)} + \cdots + \phi_{K_m}^{(m)} \epsilon_{u,v-{K_m}}^{(m)}, \omega_m^2\right), \quad v = K_m + 1, \cdots, V_m \end{align}
Note that \sum_{v=1}^{V_m} \alpha_{u,v}^{(m)} = 0.
Internally, TMB derives values for \epsilon_{u,v}^{(m)}, v = 1, \cdots, K_m, and for \omega_m, that provide the \epsilon_{u,v}^{(m)} with a stationary distribution in which each term has the same marginal variance. We denote this marginal variance \tau_m^2, and assign it a prior \begin{equation} \tau_m \sim \text{N}^+(0, A_{\tau}^{(m)2}). \end{equation} Each of the individual \phi_k^{(m)} has prior \begin{equation} \frac{\phi_k^{(m)} + 1}{2} \sim \text{Beta}(S_1^{(m)}, S_2^{(m)}). \end{equation}
\begin{equation} \begin{split} & \text{N}^+\left(\tau_m \mid 0, A_{\tau}^{(m)2} \right) \prod_{k=1}^{K_m} \text{Beta}\left( \tfrac{1}{2} \phi_k^{(m)} + \tfrac{1}{2} \mid 2, 2 \right) \\ & \quad \times \prod_{u=1}^{U_m} \text{N}(\eta_u^{(m)} \mid 0, A_{\eta}^{(m)2}) p\left( \epsilon_{u,1}^{(m)}, \cdots, \epsilon_{u,V_m}^{(m)} \mid \phi_1^{(m)}, \cdots, \phi_{K_m}^{(m)}, \tau_m \right) \end{split} \end{equation} where p\left( \epsilon_{u,1}^{(m)}, \cdots, \epsilon_{u,V_m}^{(m)} \mid \phi_1^{(m)}, \cdots, \phi_{K_m}^{(m)}, \tau_m \right) is calculated internally by TMB.
\begin{align} \beta_{u, V_m + h}^{(m)} & = \left(\frac{V_m - 1}{2}+ h\right) \eta_u^{(m)} + \epsilon_{u,V_m+h}^{(m)} \\ \epsilon_{u,V_m+h}^{(m)} & \sim \text{N}\left(\phi_1^{(m)} \epsilon_{u,V_m + h - 1}^{(m)} + \cdots + \phi_{K_m}^{(m)} \epsilon_{u,V_m+h-K_m}^{(m)}, \omega_m^2\right) \end{align}
Lin_AR(n_coef = 2,
s = 1,
shape1 = 5,
shape2 = 5,
mean_slope = 0,
sd_slope = 1,
along = NULL,
zero_coef = FALSE)
n_coef
is K_ms
is A_{\tau}^{(m)}shape1
is S_1^{(m)}shape2
is S_2^{(m)}mean_slope
is B_{\eta}^{(m)}sd_slope
is A_{\eta}^{(m)}along
is used to indentify ‘along’ and ‘by’ variablesTODO - WRITE DETAILS
Penalised spline (P-spline)
\begin{equation} \pmb{\beta}_u^{(m)} = \pmb{B}^{(m)} \pmb{\alpha}_u^{(m)}, \quad u = 1, \cdots, U_m \end{equation} where \pmb{\beta}_u^{(m)} is the subvector of \pmb{\beta}^{(m)} composed of elements from the uth combination of the ‘by’ variables, \pmb{B}^{(m)} is a V_m \times K_m matrix of B-splines, and \pmb{\alpha}_u^{(m)} has a second-order random walk prior (Section 5.5).
\pmb{B}^{(m)} = (\pmb{b}_1^{(m)}(\pmb{v}), \cdots, \pmb{b}_{K_m}^{(m)}(\pmb{v})), with \pmb{v} = (1, \cdots, V_m)^{\top}. The B-splines are centered, so that \pmb{1}^{\top} \pmb{b}_k^{(m)}(\pmb{v}) = 0, k = 1, \cdots, K_m.
\begin{equation} \text{N}(\tau_m \mid 0, A_{\tau}^{(m)2}) \prod_{u=1}^{U_m} \prod_{k=1}^2 \text{N}(\alpha_{u,k}^{(m)} \mid 0, 1) \prod_{u=1}^{U_m}\prod_{k=3}^{K_m} \text{N}\left(\alpha_{u,k}^{(m)} - 2 \alpha_{u,k-1}^{(m)} + \alpha_{u,k-2}^{(m)} \mid 0, \tau_m^2 \right) \end{equation}
Terms with a P-Spline prior cannot be forecasted.
Sp(n = NULL, s = 1)
n
is K_m. Defaults to \max(0.7 J_m, 4).s
is the A_{\tau}^{(m)} from the second-order random walk prior. Defaults to 1.along
is used to identify ‘along’ and ‘by’ variablesAge but no sex or gender
Let \pmb{\beta}_u be the age effect for the uth combination of the ‘by’ variables. With an SVD prior, \begin{equation} \pmb{\beta}_u^{(m)} = \pmb{F}^{(m)} \pmb{\alpha}_u^{(m)} + \pmb{g}^{(m)}, \quad u = 1, \cdots, U_m \end{equation} where \pmb{F}^{(m)} is a V_m \times K_m matrix, and \pmb{g}^{(m)} is a vector with V_m elements, both derived from a singular value decomposition (SVD) of an external dataset of age-specific values for all sexes/genders combined. The construction of \pmb{F}^{(m)} and \pmb{g}^{(m)} is described in Appendix 13.2. The centering and scaling used in the construction allow use of the simple prior \begin{equation} \alpha_{u,k}^{(m)} \sim \text{N}(0, 1), \quad u = 1, \cdots, U_m, k = 1, \cdots, K_m. \end{equation}
Joint model of age and sex/gender
In the joint model, vector \pmb{\beta}_u represents the interaction between age and sex/gender for the uth combination of the ‘by’ variables. Matrix \pmb{F}^{(m)} and vector \pmb{g}^{(m)} are calculated from data that separate sexes/genders. The model is otherwise unchanged.
Independent models for each sex/gender
In the independent model, vector \pmb{\beta}_{s,u} represents age effects for sex/gender s and the uth combination of the ‘by’ variables, and we have \begin{equation} \pmb{\beta}_{s,u}^{(m)} = \pmb{F}_s^{(m)} \pmb{\alpha}_{s,u}^{(m)} + \pmb{g}_s^{(m)}, \quad s = 1, \cdots, S; \quad u = 1, \cdots, U_m \end{equation} Matrix \pmb{F}_s^{(m)} and vector \pmb{g}_s^{(m)} are calculated from data that separate sexes/genders. The prior is \begin{equation} \alpha_{s,u,k}^{(m)} \sim \text{N}(0, 1), \quad s = 1, \cdots, S; \quad u = 1, \cdots, U_m; \quad k = 1, \cdots, K_m. \end{equation}
\begin{equation} \prod_{u=1}^{U_m}\prod_{k=1}^{K_m} \text{N}\left(\alpha_{uk}^{(m)} \mid 0, 1 \right) \end{equation} for the age-only and joint models, and \begin{equation} \prod_{s=1}^S \prod_{u=1}^{U_m}\prod_{k=1}^{K_m} \text{N}\left(\alpha_{s,u,k}^{(m)} \mid 0, 1 \right) \end{equation} for the independent model
Terms with an SVD prior cannot be forecasted.
SVD(ssvd, n_comp = NULL, indep = TRUE)
where
- ssvd
is an object containing \pmb{F} and \pmb{g}
- n_comp
is the number of components to be used (which defaults to ceiling(n/2)
, where n
is the number of components in ssvd
- indep
determines whether and independent or joint model will be used if the term being modelled contains a sex or gender variable.
The SVD_RW()
prior is identical to the SVD()
prior except that the coefficients evolve over time, following independent random walks. For instance, in the combined-sex/gender and joint models with K_m SVD components,
\begin{align} \pmb{\beta}_{u,t}^{(m)} & = \pmb{F}^{(m)} \pmb{\alpha}_{u,t}^{(m)} + \pmb{g}^{(m)} \\ \alpha_{u,k,1}^{(m)} & \sim \text{N}(0, (A_0^{(m)})^2), \\ \alpha_{u,k,t}^{(m)} & \sim \text{N}(\alpha_{u,k,t-1}^{(m)}, \tau_m^2), \quad t = 2, \cdots, T \\ \tau_m & \sim \text{N}^+\left(0, (A_{\tau}^{(m)})^2\right) \end{align}
In the combined-sex/gender and joint models,
\begin{equation} \text{N}(\tau_m \mid 0, A_{\tau}^{(m)2}) \prod_{u=1}^{U_m} \prod_{k=1}^{K_m} \text{N}(\alpha_{u,k,1}^{(m)} \mid 0, (A_0^{(m)})^2) \prod_{t=2}^{T} \text{N}\left(\alpha_{u,k,t}^{(m)} \mid \alpha_{u,k,t-1}^{(m)}, \tau_m^2 \right), \end{equation}
and in the independent model,
\begin{equation} \text{N}(\tau_m \mid 0, A_{\tau}^{(m)2}) \prod_{u=1}^{U_m} \prod_{s=1}^{S} \prod_{k=1}^{K_m} \text{N}(\alpha_{u,s,k,1}^{(m)} \mid 0, (A_0^{(m)})^2) \prod_{t=2}^{T} \text{N}\left(\alpha_{u,s,k,t}^{(m)} \mid \alpha_{u,s,k,t-1}^{(m)}, \tau_m^2 \right) \end{equation}
\begin{align} \alpha_{u,k,T+h}^{(m)} & \sim \text{N}(\alpha_{u,k,T+h-1}^{(m)}, \tau_m^2) \\ \pmb{\beta}_{u,T+h}^{(m)} & = \pmb{F}^{(m)} \pmb{\alpha}_{u,T+h}^{(m)} + \pmb{g}^{(m)} \end{align}
SVD_RW(ssvd,
n_comp = NULL,
indep = TRUE,
s = 1,
sd = 1,
zero_sum = FALSE)
where
- ssvd
is an object containing \pmb{F} and \pmb{g}
- n_comp
is K_m
- indep
determines whether and independent or joint model will be used if the term being modelled contains a sex or gender variable.
- s
is A_{\tau}^{(m)}
- sd
is A_0^{(m)}
Same structure as SVD_RW()
. TODO - write details.
Same structure as SVD_RW()
. TODO - write details.
Same structure as SVD_RW()
. TODO - write details.
Elements of \pmb{\beta}^{(m)} are treated as known with certainty.
Known priors make no contribution to the posterior density.
Main effects with a known prior cannot be forecasted.
Known(values)
values
is a vector containing the \beta_j^{(m)}.The columns of matrix \pmb{Z} are assumed to be standardised to have mean 0 and standard deviation 1. \pmb{Z} does not contain a column for an intercept.
We implement two priors for coefficient vector \pmb{\zeta}. The first prior is designed for the case where P, the number of colums of \pmb{Z}, is small, and most \zeta_p are likely to distinguishable from zero. The second prior is designed for the case where P is large, and only a few \zeta_p are likely to be distinguishable from zero.
\begin{align} \zeta_p \mid \varphi & \sim \text{N}(0, \varphi^2) \\ \varphi & \sim \text{N}^+(0, 1) \end{align}
Regularized horseshoe prior (Piironen and Vehtari 2017)
\begin{align} \zeta_p \mid \vartheta_p, \varphi & \sim \text{N}(0, \vartheta_p^2 \varphi^2) \\ \vartheta_p & \sim \text{Cauchy}^+(0, 1) \\ \varphi & \sim \text{Cauchy}^+(0, A_{\varphi}^2) \\ A_{\varphi} & = \frac{p_0}{p_0 + P} \frac{\hat{\sigma}}{\sqrt{n}} \end{align} where p_0 is an initial guess at the number of \zeta_p that are non-zero, and \hat{\sigma} is obtained as follows:
The quantities used for Poisson and binomial likelihoods are derived from normal approximations to GLMs (Piironen and Vehtari 2017; Gelman et al. 2014, sec. 16.2).
TODO - write
TODO - write
set_covariates(formula, data = NULL, n_coef = NULL)
formula
is a one-sided R formula describing the covariates to be useddata
A data frame. If a value for data
is supplied, then formula
is interpreted in the context of this data frame. If a value for data
is not supplied, then formula
is interpreted in the context of the data frame used for the original call to mod_pois()
, mod_binom()
, or mod_norm()
.n_coef
is the effective number of non-zero coefficients. If a value is supplied, the shrinkage prior is used; otherwise the standard prior is used.Examples:
set_covariates(~ mean_income + distance * employment)
set_covariates(~ ., data = cov_data, n_coef = 5)
Use exponential distribution, parameterised using mean, \begin{equation} \xi \sim \text{Exp}(\mu_{\xi}) \end{equation}
\begin{equation} p(\xi) = \frac{1}{\mu_{\xi}} \exp\left(\frac{-\xi}{\mu_{\xi}}\right) \end{equation}
set_disp(mean = 1)
mean
is \mu_{\xi}Random rounding to base 3 (RR3) is a confidentialization method used by some statistical agencies. It is applied to counts data. Each count x is rounded randomly as follows:
RR3 data models can be used with Poisson or binomial likelihoods. Let y_i denote the observed value for the outcome, and y_i^* the true value. The likelihood with a RR3 data model is then
\begin{align} p(y_i | \gamma_i, w_i) & = \sum_{y_i^*} p(y_i | y_i^*) p(y_i^* | \gamma_i, w_i) \\ & = \sum_{k = -2, -1, 0, 1, 2} p(y_i | y_i + k) p(y_i + k | \gamma_i, w_i) \\ & = \tfrac{1}{3} p(y_i - 2 | \gamma_i, w_i) + \tfrac{2}{3} p(y_i - 1 | \gamma_i, w_i) + p(y_i | \gamma_i, w_i) + \tfrac{2}{3} p(y_i + 1 | \gamma_i, w_i) + \tfrac{1}{3} p(y_i + 2 | \gamma_i, w_i) \end{align}
The data that we supply to TMB is a a filtered and aggregated version of the data that the user provides through the data
argument.
In the filtering stage, we remove any rows where (i) the offset is 0 or NA, or (ii) the outcome variable is NA.
In the aggregation stage, we identify any rows in the data that duplicated combinations of classification variables. For instance, if the classification variables are age
and sex
, and we have two rows where age
is "20-24"
and sex is "Female"
, then these rows would count as duplicated combinations. We aggregate offset and outcome variables across these duplicates. With Poisson and binomial models, the aggregation formula for outcomes is
\begin{equation}
y^{\text{new}} = \sum_{i=1}^D y_i^{\text{old}},
\end{equation}
and the aggregation formula for exposure/size is
\begin{equation}
w^{\text{new}} = \sum_{i=1}^D w_i^{\text{old}},
\end{equation}
where D is the number of times a particular combination is duplicated. With normal models, the aggregation formula for outcomes is
\begin{equation}
y^{\text{new}} = \frac{\sum_{i=1}^D y_i^{\text{old}}}{D},
\end{equation}
and the aggregation formuala for weights is
\begin{equation}
w^{\text{new}} = \frac{1}{\sum_{i=1}^D \frac{1}{w_i^{\text{old}}}}.
\end{equation}
Select variables to be used in inner model. By default, these are the age, sex, and time variables in the model. All remaining variables are ‘outer’ variables.
Aggregate the data using the classification formed by the inner variables. (See Section 9.1 on aggregation procedures.) Remove all terms not involving ‘inner’ variables, other than the intercept term, from the model. Set dispersion to 0. Fit the resulting model.
Let \hat{\mu}_i^{\text{in}} be point estimates for the linear predictor \mu_i obtained from the inner model.
Poisson model
Aggregate the data using the classification formed by the outer variables. Remove all terms involving the ‘inner’ variables, plus the intercept, from the model. Set dispersion to 0. Set exposure to w_i^{\text{out}} = \hat{\mu}^{\text{in}} w_i. Fit the model.
Binomial model
Fit the original model, but set dispersion to 0, and for all terms from the ‘inner’ model, use Known priors using point estimates from the inner model.
Normal model
Aggregate the data using the classification formed by the outer variables. Remove all terms involving the ‘inner’ variables, plus the intercept, from the model. Set the outcome variable to to $y_i^{} = y_i - ^{}. Fit the model.
Concatenate posterior distributions for the inner terms from the inner model to posterior distributions for the outer terms from the outer model.
If the original model includes a dispersion term, then estimate dispersion. Let \hat{\mu}_i^{\text{comb}} be point estimates for the linear predictor obtained from the concatenated estimates.
Poisson model
Use the original disaggregated data, or, if the original data contains more then 10,000 rows, select 10,000 rows at random from the original data. Remove all terms from the original model except for the intercept. Set exposure to w_i^{\text{out}} = \hat{\mu}^{\text{comb}} w_i.
Binomial model
Fit the the original model, but with all terms except the intercept having Known priors, where the values are obtained from point estimates from the concatenated estimates.
Normal model
Use the original disaggregated data, or, if the original data contains more then 10,000 rows, select 10,000 rows at random from the original data. Remove all terms from the original model except for the intercept. Set the outcome to y_i^{\text{out}} = y_i - \hat{\mu}^{\text{comb}}.
Running TMB yields a set of means \pmb{m}, and a precision matrix \pmb{Q}^{-1}, which together define the approximate joint posterior distribution of
We use \tilde{\pmb{\theta}} to denote a vector containing all these quantities.
We perform a Cholesky decomposition of \pmb{Q}^{-1}, to obtain \pmb{R} such that \begin{equation} \pmb{R}^{\top} \pmb{R} = \pmb{Q}^{-1} \end{equation} We store \pmb{R} as part of the model object.
We draw generate values for \tilde{\pmb{\theta}} by generating a vector of standard normal variates \pmb{z}, back-solving the equation \begin{equation} \pmb{R} \pmb{v} = \pmb{z} \end{equation} and setting \begin{equation} \tilde{\pmb{\theta}} = \pmb{v} + \pmb{m}. \end{equation}
Next we convert any transformed hyper-parameters back to the original units, and insert values for \pmb{\beta}^{(m)} for terms that have Known priors. We denote the resulting vector \pmb{\theta}.
Finally we draw from the distribution of \pmb{\gamma} \mid \pmb{y}, \pmb{\theta} using the methods described in Sections 3.1-3.3.
To generate one set of simulated values, we start with values for exposure, trials, or weights, \pmb{w}, and possibly covariates \pmb{Z}, then go through the following steps:
\begin{equation} y_i^{\text{rep}} \sim \text{Poisson}(\gamma_i w_i) \end{equation}
\begin{align} y_i^{\text{rep}} & \sim \text{Poisson}(\gamma_i^{\text{rep}} w_i) \\ \gamma_i^{\text{rep}} & \sim \text{Gamma}(\xi^{-1}, (\xi \mu_i)^{-1}) \end{align} which is equivalent to \begin{equation} y_i^{\text{rep}} \sim \text{NegBinom}\left(\xi^{-1}, (1 + \mu_i w_i \xi)^{-1}\right) \end{equation}
\begin{equation} y_i^{\text{rep}} \sim \text{Binomial}(w_i, \gamma_i) \end{equation}
\begin{align} y_i^{\text{rep}} & \sim \text{Binomial}(w_im \gamma_i^{\text{rep}}) \\ \gamma_i^{\text{rep}} & \sim \text{Beta}\left(\xi^{-1} \mu_i, \xi^{-1}(1 - \mu_i)\right) \end{align}
\begin{equation} y_i^{\text{rep}} \sim \text{N}(\gamma_i, \xi^2 / w_i) \end{equation}
If the overall model includes a data model for the outcome, then a further set of draws is made, deriving values for the observed outcomes, given values for the true outcomes.
replicate_data(x, condition_on = c("fitted", "expected"), n = 20)
Quantity | Definition |
---|---|
i | Index for cell, i = 1, \cdots, n. |
y_i | Value for outcome variable. |
w_i | Exposure, number of trials, or weight. |
\gamma_i | Super-population rate, probability, or mean. |
\mu_i | Cell-specific mean. |
\xi | Dispersion parameter. |
g() | Log, logit, or identity function. |
m | Index for intercept, main effect, or interaction. m = 0, \cdots, M. |
j | Index for element of a main effect or interaction. |
u | Index for combination of ‘by’ variables for an interaction. u = 1, \cdots U_m. U_m V_m = J_m |
v | Index for the ‘along’ dimension of an interaction. v = 1, \cdots V_m. U_m V_m = J_m |
\beta^{(0)} | Intercept. |
\pmb{\beta}^{(m)} | Main effect or interaction. m = 1, \cdots, M. |
\beta_j^{(m)} | jth element of \pmb{\beta}^{(m)}. j = 1, \cdots, J_m. |
\pmb{X}^{(m)} | Matrix mapping \pmb{\beta}^{(m)} to \pmb{y}. |
\pmb{Z} | Matrix of covariates. |
\pmb{\zeta} | Parameter vector for covariates \pmb{Z}^{(m)}. |
A_0 | Scale parameter in prior for intercept \beta^{(0)} or initial value. |
\tau_m | Standard deviation parameter for main effect or interaction. |
A_{\tau}^{(m)} | Scale parameter in prior for \tau_m. |
\pmb{\alpha}^{(m)} | Parameter vector for P-spline and SVD priors. |
\alpha_k^{(m)} | kth element of \pmb{\alpha}^{(m)}. k = 1, \cdots, K_m. |
\pmb{V}^{(m)} | Covariance matrix for multivariate normal prior. |
h_j^{(m)} | Linear covariate |
\eta^{(m)} | Parameter specific to main effect or interaction \pmb{\beta}^{(m)}. |
\eta_u^{(m)} | Parameter specific to uth combination of ‘by’ variables in interaction \pmb{\beta}^{(m)}. |
A_{\eta}^{(m)} | Standard deviation in normal prior for \eta_m. |
\omega_m | Standard deviation of parameter \eta_c in multivariate priors. |
\phi_m | Correlation coefficient in AR1 densities. |
a_{0m}, a_{1m} | Minimum and maximum values for \phi_m. |
\pmb{B}^{(m)} | B-spline matrix in P-spline prior. |
\pmb{b}_k^{(m)} | B-spline. k = 1, \cdots, K_m. |
\pmb{F}^{(m)} | Matrix in SVD prior. |
\pmb{g}^{(m)} | Offset in SVD prior. |
\pmb{\beta}_{\text{trend}} | Trend effect. |
\pmb{\beta}_{\text{cyc}} | Cyclical effect. |
\pmb{\beta}_{\text{seas}} | Seasonal effect. |
\varphi | Global shrinkage parameter in shrinkage prior. |
A_{\varphi} | Scale term in prior for \varphi. |
\vartheta_p | Local shrinkage parameter in shrinkage prior. |
p_0 | Expected number of non-zero coefficients in \pmb{\zeta}. |
\hat{\sigma} | Empirical scale estimate in prior for \varphi. |
\pi | Vector of hyper-parameters |
Let \pmb{A} be a matrix of age-specific estimates from an international database, transformed to take values in the range (-\infty, \infty). Each column of \pmb{A} represents one set of age-specific estimates, such as log mortality rates in Japan in 2010, or logit labour participation rates in Germany in 1980.
Let \pmb{U}, \pmb{D}, \pmb{V} be the matrices from a singular value decomposition of \pmb{A}, where we have retained the first K components. Then \begin{equation} \pmb{A} \approx \pmb{U} \pmb{D} \pmb{V}. \tag{13.1} \end{equation}
Let m_k and s_k be the mean and sample standard deviation of the elements of the kth row of \pmb{V}, with \pmb{m} = (m_1, \cdots, m_k)^{\top} and \pmb{s} = (s_1, \cdots, s_k)^{\top}. Then \begin{equation} \tilde{\pmb{V}} = (\text{diag}(\pmb{s}))^{-1} (\pmb{V} - \pmb{m} \pmb{1}^{\top}) \end{equation} is a standardized version of \pmb{V}.
We can rewrite (13.1) as \begin{align} \pmb{A} & \approx \pmb{U} \pmb{D} (\text{diag}(\pmb{s}) \tilde{\pmb{V}} + \pmb{m} \pmb{1}^{\top}) \\ & = \pmb{F} \tilde{\pmb{V}} + \pmb{g} \pmb{1}^{\top}, \tag{13.2} \end{align} where \pmb{F} = \pmb{U} \pmb{D} \text{diag}(\pmb{s}) and \pmb{g} = \pmb{U} \pmb{D} \pmb{m}.
Let \tilde{\pmb{v}}_l be a randomly-selected column from \tilde{\pmb{V}}. From the construction of \tilde{\pmb{V}} we have \text{E}[\tilde{v}_{kl}] = 0 and \text{var}[\tilde{v}_{kl}] = 1. If \pmb{z} is a vector of standard normal variables, then \begin{equation} \pmb{F} \pmb{z} + \pmb{g} \end{equation} should look approximately like a randomly-selected column from the original data matrix \pmb{A}.