In efficiency analysis, we often study firms that operate under fundamentally different technologies. Steel producers using electric arc furnaces (EAF) face a different production possibility set than those using the blast furnace-basic oxygen furnace (BF-BOF) route. Hospitals in rural areas face different constraints than urban ones. Banks in developing economies operate under different regulatory and technological environments than those in advanced economies.
Standard stochastic frontier analysis (SFA) or data envelopment analysis (DEA) applied to the pooled sample implicitly assumes all firms share the same technology – an assumption that may be unrealistic. Estimating separate frontiers for each group solves this problem but makes efficiency scores incomparable across groups: a firm that is 90% efficient relative to a less advanced group frontier may actually be less productive than a firm that is 70% efficient relative to a more advanced frontier.
The metafrontier framework, introduced by Battese, Rao, and O’Donnell (2004) and extended by Huang, Huang, and Liu (2014) and O’Donnell, Rao, and Battese (2008), resolves this by:
\[TE^*_i = TE_i \times TGR_i\]
where:
The metafrontier package provides a unified interface for estimating metafrontier models using both SFA and DEA approaches.
library(metafrontier)The package includes simulate_metafrontier() for generating data from a known data-generating process. This is useful for Monte Carlo studies and for learning the package.
sim <- simulate_metafrontier(
n_groups = 3,
n_per_group = 200,
beta_meta = c(1.0, 0.5, 0.3), # intercept, elasticity_1, elasticity_2
tech_gap = c(0, 0.25, 0.5), # intercept shifts (0 = best technology)
sigma_u = c(0.2, 0.3, 0.4), # inefficiency SD per group
sigma_v = 0.15, # noise SD
seed = 42
)
str(sim$data[, c("log_y", "log_x1", "log_x2", "group")])
#> 'data.frame': 600 obs. of 4 variables:
#> $ log_y : num 4.05 3.99 3.16 4.04 2.52 ...
#> $ log_x1: num 4.57 4.69 1.43 4.15 3.21 ...
#> $ log_x2: num 4.426 2.586 4.26 2.214 0.789 ...
#> $ group : Factor w/ 3 levels "G1","G2","G3": 1 1 1 1 1 1 1 1 1 1 ...
table(sim$data$group)
#>
#> G1 G2 G3
#> 200 200 200The simulation generates a Cobb-Douglas frontier: \[\ln y_i = \beta_0^{(j)} + \beta_1 \ln x_{1i} + \beta_2 \ln x_{2i} + v_i - u_i\] where the intercept \(\beta_0^{(j)} = \beta_0^* - \delta_j\) is shifted down from the metafrontier by the technology gap \(\delta_j\) for group \(j\).
fit <- metafrontier(
log_y ~ log_x1 + log_x2,
data = sim$data,
group = "group",
method = "sfa",
meta_type = "deterministic"
)
fit
#>
#> Metafrontier Model
#> ------------------
#> Method: sfa
#> Metafrontier: deterministic
#> Groups: G1, G2, G3
#> Total obs: 600
#> G1: 200 obs
#> G2: 200 obs
#> G3: 200 obs
#>
#> Group log-likelihoods:
#> G1: 35.49
#> G2: 0.84652
#> G3: -25.399
#>
#> Mean TGR by group:
#> G1: 1
#> G2: 0.7593
#> G3: 0.6047The deterministic metafrontier is estimated in two stages:
This is the default method:
fit_det <- metafrontier(
log_y ~ log_x1 + log_x2,
data = sim$data,
group = "group",
meta_type = "deterministic"
)
summary(fit_det)
#>
#> Metafrontier Model Summary
#> ==========================
#>
#> Call:
#> metafrontier(formula = log_y ~ log_x1 + log_x2, data = sim$data,
#> group = "group", meta_type = "deterministic")
#>
#> Method: sfa
#> Metafrontier: deterministic
#>
#> --- Group: G1 (n = 200) ---
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) 1.025588 0.060881 16.85 <2e-16 ***
#> log_x1 0.493508 0.009658 51.10 <2e-16 ***
#> log_x2 0.294569 0.009805 30.04 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Log-likelihood: 35.49
#>
#> --- Group: G2 (n = 200) ---
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) 0.76703 0.06478 11.84 <2e-16 ***
#> log_x1 0.48428 0.01239 39.10 <2e-16 ***
#> log_x2 0.29703 0.01210 24.55 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Log-likelihood: 0.84652
#>
#> --- Group: G3 (n = 200) ---
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) 0.53665 0.05513 9.734 <2e-16 ***
#> log_x1 0.49950 0.01238 40.352 <2e-16 ***
#> log_x2 0.28205 0.01186 23.782 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Log-likelihood: -25.399
#>
#> --- Metafrontier ---
#> Estimate
#> (Intercept) 1.0256
#> log_x1 0.4935
#> log_x2 0.2946
#>
#> --- Efficiency Decomposition ---
#> Group Mean_TE Mean_TGR Mean_TE_star
#> G1 0.8571 1.0000 0.8571
#> G2 0.8249 0.7593 0.6263
#> G3 0.7401 0.6047 0.4475
#>
#> --- Technology Gap Ratio Summary ---
#> Group N Mean SD Min Q1 Median Q3 Max
#> G1 200 1.0000 0.0000 1.0000 1.0000 1.0000 1.0000 1.0000
#> G2 200 0.7593 0.0099 0.7386 0.7518 0.7592 0.7679 0.7799
#> G3 200 0.6047 0.0125 0.5769 0.5947 0.6056 0.6143 0.6289The stochastic metafrontier replaces the LP in Stage 2 with a second-stage SFA, using the fitted group frontier values as the dependent variable:
\[\ln \hat{f}(x_i; \hat\beta_j) = x_i'\beta^* + v^*_i - u^*_i\]
where \(u^*_i \ge 0\) captures the technology gap stochastically. This provides a distributional framework for the TGR, enabling standard errors and hypothesis testing.
fit_sto <- metafrontier(
log_y ~ log_x1 + log_x2,
data = sim$data,
group = "group",
meta_type = "stochastic"
)
summary(fit_sto)
#>
#> Metafrontier Model Summary
#> ==========================
#>
#> Call:
#> metafrontier(formula = log_y ~ log_x1 + log_x2, data = sim$data,
#> group = "group", meta_type = "stochastic")
#>
#> Method: sfa
#> Metafrontier: stochastic
#>
#> --- Group: G1 (n = 200) ---
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) 1.025588 0.060881 16.85 <2e-16 ***
#> log_x1 0.493508 0.009658 51.10 <2e-16 ***
#> log_x2 0.294569 0.009805 30.04 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Log-likelihood: 35.49
#>
#> --- Group: G2 (n = 200) ---
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) 0.76703 0.06478 11.84 <2e-16 ***
#> log_x1 0.48428 0.01239 39.10 <2e-16 ***
#> log_x2 0.29703 0.01210 24.55 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Log-likelihood: 0.84652
#>
#> --- Group: G3 (n = 200) ---
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) 0.53665 0.05513 9.734 <2e-16 ***
#> log_x1 0.49950 0.01238 40.352 <2e-16 ***
#> log_x2 0.28205 0.01186 23.782 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Log-likelihood: -25.399
#>
#> --- Metafrontier ---
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) 0.784379 0.186836 4.198 2.69e-05 ***
#> log_x1 0.493270 0.005918 83.349 < 2e-16 ***
#> log_x2 0.289361 0.005832 49.613 < 2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Log-likelihood: 96.292
#>
#> --- Efficiency Decomposition ---
#> Group Mean_TE Mean_TGR Mean_TE_star
#> G1 0.8571 1.2894 1.1051
#> G2 0.8249 0.9794 0.8078
#> G3 0.7401 0.7797 0.5771
#>
#> --- Technology Gap Ratio Summary ---
#> Group N Mean SD Min Q1 Median Q3 Max
#> G1 200 1.2894 0.0099 1.2735 1.2808 1.2887 1.2973 1.3073
#> G2 200 0.9794 0.0157 0.9428 0.9687 0.9796 0.9906 1.0185
#> G3 200 0.7797 0.0111 0.7537 0.7717 0.7798 0.7876 0.8027The stochastic metafrontier provides a variance-covariance matrix:
vcov(fit_sto)
#> (Intercept) log_x1 log_x2
#> (Intercept) 3.490785e-02 -8.889894e-05 -7.974475e-05
#> log_x1 -8.889894e-05 3.502404e-05 -4.372711e-07
#> log_x2 -7.974475e-05 -4.372711e-07 3.401693e-05For a nonparametric approach, set method = "dea":
fit_dea <- metafrontier(
log_y ~ log_x1 + log_x2,
data = sim$data,
group = "group",
method = "dea",
rts = "vrs"
)
#> Warning: LP infeasible for a DMU.
#> Warning: LP infeasible for a DMU.
#> Warning: LP infeasible for a DMU.
#> Warning: LP infeasible for a DMU.
fit_dea
#>
#> Metafrontier Model
#> ------------------
#> Method: dea
#> Metafrontier: deterministic
#> Groups: G1, G2, G3
#> Total obs: 600
#> G1: 200 obs
#> G2: 200 obs
#> G3: 200 obs
#>
#> Mean TGR by group:
#> G1: 0.9959
#> G2: 0.9221
#> G3: NAThe DEA metafrontier computes:
Use efficiencies() to extract the three components of the decomposition:
te <- efficiencies(fit_det, type = "group")
tgr <- efficiencies(fit_det, type = "tgr")
te_star <- efficiencies(fit_det, type = "meta")
# Verify the fundamental identity: TE* = TE x TGR
all.equal(te_star, te * tgr)
#> [1] TRUEThe technology_gap_ratio() function returns TGR values grouped by technology:
tgr_by_group <- technology_gap_ratio(fit_det)
lapply(tgr_by_group, summary)
#> $G1
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 1 1 1 1 1 1
#>
#> $G2
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 0.7386 0.7518 0.7592 0.7593 0.7679 0.7799
#>
#> $G3
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 0.5769 0.5947 0.6056 0.6047 0.6143 0.6289For a formatted summary table:
tgr_summary(fit_det)
#> Group N Mean SD Min Q1 Median Q3
#> 1 G1 200 1.0000000 0.000000000 1.0000000 1.0000000 1.0000000 1.0000000
#> 2 G2 200 0.7592884 0.009947071 0.7385586 0.7517742 0.7592422 0.7679385
#> 3 G3 200 0.6047017 0.012456784 0.5769063 0.5947118 0.6056158 0.6142562
#> Max
#> 1 1.0000000
#> 2 0.7798918
#> 3 0.6289392# Metafrontier coefficients
coef(fit_det, which = "meta")
#> (Intercept) log_x1 log_x2
#> 1.0255883 0.4935078 0.2945688
# Group-specific coefficients
coef(fit_det, which = "group")
#> $G1
#> (Intercept) log_x1 log_x2
#> 1.0255883 0.4935078 0.2945688
#>
#> $G2
#> (Intercept) log_x1 log_x2
#> 0.7670252 0.4842818 0.2970328
#>
#> $G3
#> (Intercept) log_x1 log_x2
#> 0.5366478 0.4994989 0.2820525# Log-likelihood (sum of group log-likelihoods for deterministic)
logLik(fit_det)
#> 'log Lik.' 10.9376 (df=3)
# Number of observations
nobs(fit_det)
#> [1] 600
# AIC and BIC (available automatically via logLik method)
AIC(fit_det)
#> [1] -15.87521The package provides four built-in plot types:
plot(fit_det, which = "tgr")plot(fit_det, which = "efficiency")Points below the 45-degree line indicate a technology gap (TE* < TE). The vertical distance from the line reflects the TGR.
plot(fit_det, which = "decomposition")Side-by-side boxplots of TE, TGR, and TE* by group.
The poolability test evaluates whether group-specific frontiers are statistically different from a single pooled frontier:
poolability_test(fit_det)
#>
#> Likelihood Ratio Test for Poolability of Group Frontiers
#>
#> data: metafrontier(formula = log_y ~ log_x1 + log_x2, data = sim$data, group = "group", meta_type = "deterministic")
#> LR = 504.71, df = 10, p-value < 2.2e-16A significant result (small p-value) indicates that the technology groups have genuinely different production technologies, justifying the metafrontier approach.
The package supports three distributional assumptions for the one-sided inefficiency term \(u_i\) in SFA:
# Half-normal (default): u ~ |N(0, sigma_u^2)|
fit_hn <- metafrontier(log_y ~ log_x1 + log_x2,
data = sim$data, group = "group",
dist = "hnormal")
# Truncated normal: u ~ N+(mu, sigma_u^2)
fit_tn <- metafrontier(log_y ~ log_x1 + log_x2,
data = sim$data, group = "group",
dist = "tnormal")
# Exponential: u ~ Exp(1/sigma_u)
fit_exp <- metafrontier(log_y ~ log_x1 + log_x2,
data = sim$data, group = "group",
dist = "exponential")Since we used simulated data, we can compare estimated values against the truth:
# True vs estimated metafrontier coefficients
cbind(
True = sim$params$beta_meta,
Estimated = coef(fit_det, which = "meta")
)
#> True Estimated
#> (Intercept) 1.0 1.0255883
#> log_x1 0.5 0.4935078
#> log_x2 0.3 0.2945688
# True vs estimated mean TGR by group
true_tgr <- tapply(sim$data$true_tgr, sim$data$group, mean)
est_tgr <- tapply(fit_det$tgr, fit_det$group_vec, mean)
cbind(True = true_tgr, Estimated = est_tgr)
#> True Estimated
#> G1 1.0000000 1.0000000
#> G2 0.7788008 0.7592884
#> G3 0.6065307 0.6047017
# Correlation between true and estimated efficiency
cor(sim$data$true_te, fit_det$te_group)
#> [1] 0.80659
cor(sim$data$true_te_star, fit_det$te_meta)
#> [1] 0.9400828The package supports panel data via the Battese-Coelli (1992) and (1995) models. Use the panel argument:
# Simulate panel data
panel_sim <- simulate_panel_metafrontier(
n_groups = 2, n_firms_per_group = 20, n_periods = 5, seed = 42
)
# BC92: time-varying inefficiency u_it = u_i * exp(-eta*(t-T))
fit_panel <- metafrontier(
log_y ~ log_x1 + log_x2,
data = panel_sim$data,
group = "group",
panel = list(id = "firm", time = "year"),
panel_dist = "bc92"
)
summary(fit_panel)
# The eta parameter captures time-varying inefficiency
# eta > 0: inefficiency decreasing over time
# eta < 0: inefficiency increasing over timeThe boot_tgr() function provides parametric and nonparametric bootstrap confidence intervals for the technology gap ratio:
sim <- simulate_metafrontier(n_groups = 2, n_per_group = 100, seed = 42)
fit <- metafrontier(log_y ~ log_x1 + log_x2, data = sim$data,
group = "group", meta_type = "stochastic")
# Nonparametric bootstrap (case resampling within groups)
boot <- boot_tgr(fit, R = 499, type = "nonparametric", seed = 1)
print(boot)
# Observation-level CIs
ci <- confint(boot)
head(ci)
# Group-level mean TGR CIs
boot$ci_group
# Parametric bootstrap (resample from estimated error distributions)
boot_par <- boot_tgr(fit, R = 499, type = "parametric", seed = 1)The stochastic metafrontier is a two-stage estimator where Stage 2 uses fitted values from Stage 1 as regressors. This “generated regressor” problem means naive standard errors understate uncertainty. The Murphy-Topel (1985) correction adjusts for this:
fit <- metafrontier(log_y ~ log_x1 + log_x2, data = sim$data,
group = "group", meta_type = "stochastic")
# Naive (uncorrected) standard errors
vcov(fit)
# Murphy-Topel corrected standard errors
vcov(fit, correction = "murphy-topel")
# Corrected confidence intervals
confint(fit, correction = "murphy-topel")When group membership is unobserved, use latent_class_metafrontier():
sim <- simulate_metafrontier(n_groups = 2, n_per_group = 100, seed = 42)
# Fit with 2 latent classes
lc <- latent_class_metafrontier(
log_y ~ log_x1 + log_x2,
data = sim$data, n_classes = 2,
n_starts = 5, seed = 123
)
print(lc)
summary(lc)
# Select optimal number of classes via BIC
bic_table <- select_n_classes(
log_y ~ log_x1 + log_x2, data = sim$data,
n_classes_range = 2:4, n_starts = 3, seed = 42
)
print(bic_table) # choose n_classes with lowest BICFor additive efficiency decomposition, use DDF-based metafrontier:
sim <- simulate_metafrontier(n_groups = 2, n_per_group = 50, seed = 42)
# Use raw (non-log) data for DEA
sim$data$y <- exp(sim$data$log_y)
sim$data$x1 <- exp(sim$data$log_x1)
sim$data$x2 <- exp(sim$data$log_x2)
fit_ddf <- metafrontier(
y ~ x1 + x2, data = sim$data, group = "group",
method = "dea", type = "directional", direction = "output"
)
summary(fit_ddf)
# Additive decomposition: beta_meta = beta_group + ddf_tgr
head(data.frame(
beta_meta = fit_ddf$beta_meta,
beta_group = fit_ddf$beta_group,
ddf_tgr = fit_ddf$ddf_tgr
))Battese, G.E., Rao, D.S.P. and O’Donnell, C.J. (2004). A metafrontier production function for estimation of technical efficiencies and technology gaps for firms operating under different technologies. Journal of Productivity Analysis, 21(1), 91–103.
Huang, C.J., Huang, T.-H. and Liu, N.-H. (2014). A new approach to estimating the metafrontier production function based on a stochastic frontier framework. Journal of Productivity Analysis, 42(3), 241–254.
O’Donnell, C.J., Rao, D.S.P. and Battese, G.E. (2008). Metafrontier frameworks for the study of firm-level efficiencies and technology ratios. Empirical Economics, 34(2), 231–255.