---
title: Introduction to `hesim`
date: "`r Sys.Date()`"
output:
html_vignette:
toc: yes
toc_depth: 2
number_sections: TRUE
pkgdown:
as_is: false
vignette: >
%\VignetteIndexEntry{Introduction to `hesim`}
%\VignetteEngine{knitr::rmarkdown}
\usepackage[utf8]{inputenc}
bibliography: references.bib
---
```{r, include = FALSE}
library("knitr")
opts_chunk$set(cache = TRUE)
```
# Overview
This article provides an overview of the `hesim` package and a quick example. The other articles provide more in depth examples.
`hesim` supports three types of health economic models: (i) cohort discrete time state transition models (cDTSTMs), (ii) N-state partitioned survival models (PSMs), and (iii) individual-level continuous time state transition models (iCTSTMs). cDTSTMs are Markov cohort models and can be time-homogeneous or time-inhomogeneous. iCTSTMs are individual-level simulations that can encompass both Markov and semi-Markov processes. All models are implemented as [R6](https://r6.r-lib.org/index.html) classes and have methods for simulating disease progression, QALYs, and costs.
```{r echo = FALSE, message = FALSE, warning = FALSE}
library("kableExtra")
psm <- c("N-state partitioned survival model (PSM)", "`hesim::Psm`")
cdtstm <- c("Cohort discrete time state transition model (cDTSTM)", "`hesim::CohortDtstm`")
ictstm <- c("Individual-level continuous time state transition model (iCTSTM)", "`hesim::IndivCtstm`")
tbl <- rbind(psm, cdtstm, ictstm)
colnames(tbl) <- c("Economic model", "R6 class")
knitr::kable(tbl, row.names = FALSE) %>% # Pipe imported from kableExtra
kableExtra::kable_styling()
```
Each economic model consists of submodels for disease progression, utility, and costs (usually for multiple cost categories). As shown in the figure, a typical analysis proceeds in a 3-steps:
```{r, out.width = "600px", echo = FALSE}
knitr::include_graphics("econ-eval-process-hesim.png")
```
1. **Parameterization**: An economic model is parameterized by estimating statistical models for disease progression, utilities, and costs using "estimation" datasets, such as individual patient data (IPD) from a single study or aggregate data from multiple studies.
2. **Simulation**: The statistical models estimated in Step 1 are combined to construct an economic model. For a given model structure, disease progression, QALYs, and costs are simulated from "input data", based on the target population and treatment strategies of interest.
3. **Decision analysis**: Simulated outcomes from Step 2 are used to perform decision analysis using approaches such as [cost-effectiveness analysis (CEA)](https://en.wikipedia.org/wiki/Cost-effectiveness_analysis) and [multi-criteria decision analysis (MCDA)](https://en.wikipedia.org/wiki/Multiple-criteria_decision_analysis), although only CEA is currently supported.
The entire analysis is inherently Bayesian, as uncertainty in the parameters from the statistical models is propagated throughout the economic model and decision analysis with probabilistic sensitivity analysis (PSA). Furthermore, since the statistical and economic models are integrated, patient heterogeneity can be easily introduced with patient level covariates.
# Treatment strategies, target population, and model structure
Before beginning an analysis, it is necessary to define the treatment strategies of interest, the target population, and the model structure. This can be done in `hesim` by creating a `hesim_data` object with the function `hesim_data()`. Integer valued identification (ID) variables are used to uniquely identify strategies (`strategy_id`), patients (`patient_id`), non-death health-states (`state_id`), and (if applicable) health-state transitions (`transition_id`). Subgroups can optionally be identified with `grp_id`.
Let's consider an example where we use an iCTSTM to evaluate two competing treatment strategies, the *standard of care (SOC)* and a *New* treatment. We will consider a generic model of disease progression with three health states (*stage 1*, *stage 2*, and *death*) with four transitions (*stage 1 -> stage 2*, *stage 2 -> stage 1*, *stage 1 -> death*, and *stage 2 -> death*). Since we are using an individual-level model, we must simulate a target population that is sufficiently large so that uncertainty reflects uncertainty in the model parameters, rather than variability across simulated individuals. For the sake of illustration, we will create subgroups stratified by sex.
```{r warning = FALSE, message = FALSE}
library("hesim")
library("data.table")
# Treatment strategies
strategies <- data.table(strategy_id = c(1, 2),
strategy_name = c("SOC", "New"))
# Patients
n_patients <- 1000
patients <- data.table(patient_id = 1:n_patients,
age = rnorm(n_patients, mean = 45, sd = 7),
female = rbinom(n_patients, size = 1, prob = .51))
patients[, grp_id := ifelse(female == 1, 1, 2)]
patients[, grp_name := ifelse(female == 1, "Female", "Male")]
# (Non-death) health states
states <- data.table(state_id = c(1, 2),
state_name = c("Stage 1", "Stage 2"))
# Transitions
tmat <- rbind(c(NA, 1, 2),
c(3, NA, 4),
c(NA, NA, NA))
colnames(tmat) <- rownames(tmat) <- c("Stage 1", "Stage 2", "Death")
transitions <- create_trans_dt(tmat)
transitions[, trans := factor(transition_id)]
# Combining
hesim_dat <- hesim_data(strategies = strategies,
patients = patients,
states = states,
transitions = transitions)
print(hesim_dat)
```
When presenting results, it may be preferable to have more informative labels that the ID variables. These can be generated from a `hesim_data` object using `get_labels()`.
```{r}
labs <- get_labels(hesim_dat)
print(labs)
```
# Parameterization
Each submodel contains fields for the model parameters and the input data. Models can be parameterized by either fitting statistical models using `R`, inputting values directly, or from a combination of the two. There are two types of parameter objects, standard parameter objects prefixed by “params” and “transformed” parameter objects prefixed by “tparams”. The former contain the underlying parameters of a statistical model and are used alongside the input data to make predictions. The latter contain parameters more immediate to prediction that have already been transformed as function of the input data. The regression coefficients of a logistic regression are an example of a parameter objects while the predicted probabilities are examples of a transformed parameter object.
## Disease progression
As shown in the table below, the statistical model used to parameterize the disease model varies by the type of economic model. For example, multinomial logistic regressions can be used to parameterize a cDTSTM, a set of *N-1* independent survival models are used to parameterize an *N*-state partitioned survival model, and multi-state models can be used to parameterize an iCTSTM.
```{r echo = FALSE, message = FALSE, warning = FALSE}
tbl <- rbind(
c("`hesim::CohortDtstm`",
"Custom",
"`hesim::tparams_transprobs`",
"`msm::msm`"),
c("`hesim::CohortDtstm`",
"Multinomial logistic regressions",
"`hesim::params_mlogit_list`",
"`hesim::multinom_list`"),
c("`hesim::Psm`",
"Independent survival models",
"`hesim::params_surv_list`",
"`hesim::flexsurvreg_list`"),
c("`hesim::IndivCtstm`",
"Multi-state model (joint likelihood)",
"`hesim::params_surv`",
"`flexsurv::flexsurvreg`"),
c("`hesim::IndivCtstm`",
"Multi-state model (transition-specific)",
"`hesim::params_surv_list`",
"`hesim::flexsurvreg_list`")
)
colnames(tbl) <- c("Economic model (R6 class)", "Statistical model",
"Parameter object", "Model object")
knitr::kable(tbl, row.names = FALSE) %>%
kableExtra::kable_styling() %>%
kableExtra::collapse_rows(columns = 1, valign = "top")
```
The parameters of a survival model are stored in a `params_surv` object and a `params_surv_list` can be used to store the parameters of multiple survival models. The latter is useful for storing the parameters of a multi-state model or the independent survival models required for a PSM. The parameters of a multinomial logistic regression are stored in a `params_mlogit` object and can be created by fitting a model for each row in a transition probability matrix with `nnet::multinom()`. `tparams_transprobs` objects are examples of transformed parameter objects that store transition probability matrices. They can be predicted from a fitted multi-state model using the `msm` package or constructed "by hand" in a custom manner.
We illustrate an example of a statistical model of disease progression fit with `R` by estimating a multi-state model with a joint likelihood using `flexsurv::flexsurvreg()`.
```{r, message = FALSE, warning = FALSE}
library("flexsurv")
mstate_data <- data.table(mstate3_exdata$transitions)
mstate_data[, trans := factor(trans)]
fit_wei <- flexsurv::flexsurvreg(Surv(years, status) ~ trans +
factor(strategy_id):trans +
age:trans +
female: trans +
shape(trans),
data = mstate_data,
dist = "weibull")
```
## Costs and utility
State values (i.e., utilities and costs) do not depend on the choice of disease model. They can currently either be modeled using a linear model or with predicted means.
```{r echo = FALSE, message = FALSE, warning = FALSE}
tbl <- rbind(
c("Predicted means",
"`hesim::tparams_mean`",
"`hesim::stateval_tbl`"),
c("Linear model",
"`hesim::params_lm`",
"`stats::lm`")
)
colnames(tbl) <- c("Statistical model", "Parameter object", "Model object")
knitr::kable(tbl, row.names = FALSE) %>%
kableExtra::kable_styling() %>%
kableExtra::collapse_rows(columns = 1, valign = "top")
```
The most straightforward way to construct state values is with `stateval_tbl()`, which creates a special object used to assign values (i.e. predicted means) to health states that can vary across PSA samples, treatment strategies, patients, and/or time intervals. State values can be specified either as moments (e.g., mean and standard error) or parameters (e.g., shape and scale of gamma distribution) of a probability distribution, or by pre-simulating values from a suitable probability distribution (e.g., from a Bayesian model). Here we will use `stateval_tbl` objects for utility and two cost categories (drug and medical).
```{r}
# Utility
utility_tbl <- stateval_tbl(
data.table(state_id = states$state_id,
mean = mstate3_exdata$utility$mean,
se = mstate3_exdata$utility$se),
dist = "beta"
)
# Costs
drugcost_tbl <- stateval_tbl(
data.table(strategy_id = strategies$strategy_id,
est = mstate3_exdata$costs$drugs$costs),
dist = "fixed"
)
medcost_tbl <- stateval_tbl(
data.table(state_id = states$state_id,
mean = mstate3_exdata$costs$medical$mean,
se = mstate3_exdata$costs$medical$se),
dist = "gamma"
)
```
```{r}
print(utility_tbl)
print(drugcost_tbl)
print(medcost_tbl)
```
# Simulation
## Constructing an economic model
The utility and cost models are always `hesim::StateVals` objects, whereas the disease models vary by economic model. The disease model is used to simulate survival curves in a PSM and health state transitions in a cDTSTM and iCTSTM.
```{r echo = FALSE, message = FALSE, warning = FALSE}
dtstm <- c("`hesim::CohortDtstm`", "`hesim::CohortDtstmTrans`",
"`hesim::StateVals`", "`hesim::StateVals`")
psm <- c("`hesim::Psm`", "`hesim::PsmCurves`",
"`hesim::StateVals`", "`hesim::StateVals`")
ictstm <- c("`hesim::IndivCtstm`", "`hesim::IndivCtstmTrans`",
"`hesim::StateVals`", "`hesim::StateVals`")
tbl <- rbind(dtstm, psm, ictstm)
colnames(tbl) <- c("Economic model", "Disease model", "Utility model", "Cost model(s)")
knitr::kable(tbl, row.names = FALSE) %>%
kableExtra::kable_styling()
```
The submodels are constructed from (i) parameter or model objects and (ii) input data (if a transformed parameter object is not used). They can be instantiated using `S3` generic methods prefixed by "`create`" or with the `R6` constructor method `$new()`. We illustrate use of the former below.
In all cases, it is necessary to specify the number of parameter samples to use for the PSA.
```{r}
n_samples <- 1000
```
### Disease model
The disease model is constructed as a function of the fitted multi-state model (using the stored regression coefficients) and input data. The input data must be an object of class `expanded_hesim_data`, which is a [`data.table`](https://rdatatable.gitlab.io/data.table/) containing the covariates for the statistical model. In our multi-state model, each row is a unique treatment strategy, patient, and health-state transition.
An `expanded_hesim_data` object can be created by expanding an object of class `hesim_data` using `expand.hesim_data()`.
```{r warning = FALSE, message = FALSE}
transmod_data <- expand(hesim_dat,
by = c("strategies", "patients", "transitions"))
head(transmod_data)
```
The disease model is instantiated using the `create_IndivCtstmTrans()` generic method. Parameters for the PSA are, by default, drawn from the multivariate normal distribution of the maximum likelihood estimate of the regression coefficients, although we make this explicit with the uncertainty argument.
```{r}
transmod <- create_IndivCtstmTrans(fit_wei, transmod_data,
trans_mat = tmat, n = n_samples,
uncertainty = "normal")
class(transmod)
```
### Cost and utility models
Since we are using predicted means for utilities and costs, we do not need to specify input data. Instead, we can construct the utility and cost models directly from the `stateval_tbl` objects.
```{r}
# Utility
utilitymod <- create_StateVals(utility_tbl, n = n_samples, hesim_data = hesim_dat)
# Costs
drugcostmod <- create_StateVals(drugcost_tbl, n = n_samples, hesim_data = hesim_dat)
medcostmod <- create_StateVals(medcost_tbl, n = n_samples, hesim_data = hesim_dat)
costmods <- list(drugs = drugcostmod,
medical = medcostmod)
```
### Combining the disease progression, cost, and utility models
Once the disease, utility, and cost models have been constructed, we combine them to create the full economic model using `$new()`.
```{r}
ictstm <- IndivCtstm$new(trans_model = transmod,
utility_model = utilitymod,
cost_models = costmods)
```
## Simulating outcomes
Each economic model contains methods (i.e., functions) for simulating disease progression, QALYs, and costs.
```{r echo = FALSE, message = FALSE, warning = FALSE}
cdtstm_methods <- c("`hesim::CohortDtstm`", "$sim_stateprobs()", "$sim_qalys()", "$sim_costs()")
psm_methods <- c("`hesim::Psm`", "$sim_survival() and $sim_stateprobs()", "$sim_qalys()", "$sim_costs()")
ictstm_methods <- c("`hesim::IndivCtstm`", "$sim_disease() and $sim_stateprobs()", "$sim_qalys()", "$sim_costs()")
tbl <- rbind(cdtstm_methods, psm_methods, ictstm_methods)
colnames(tbl) <- c("Economic model (R6 class)", "Disease progression", "QALYs", "Costs")
knitr::kable(tbl, row.names = FALSE) %>%
kableExtra::kable_styling()
```
Although all models simulate state probabilities, they do so in different ways. The cDTSTM uses discrete time Markov chains, the PSM calculates differences in probabilities from simulated survival curves, and the iCTSTM aggregates individual trajectories simulated using random number generation. The individual-level simulation is advantageous because it can be used for semi-Markov processes where transition rates depend on time since entering a health state (rather than time since the start of the model).
The utility and cost models always simulate QALYs and costs from the simulated progression of disease with the methods `$sim_qalys()` and `$sim_costs()`, respectively. In the cohort models, QALYs and costs are computed as a function of the state probabilities whereas in individual-level models they are based on the simulated individual trajectories. Like the disease model, the individual-level simulation is more flexible because QALYs and costs can depend on time since entering the health state.
We illustrate with the iCTSTM. The first step is to simulate disease progression for each patient.
```{r}
ictstm$sim_disease()
head(ictstm$disprog_)
```
The disease trajectory is summarized with `$sim_stateprobs()`.
```{r}
ictstm$sim_stateprobs(t = c(0:10))
head(ictstm$stateprobs_)
```
Finally, we compute QALYs and costs (using a discount rate of 3 percent).
```{r}
# QALYs
ictstm$sim_qalys(dr = .03)
head(ictstm$qalys_)
# Costs
ictstm$sim_costs(dr = .03)
head(ictstm$costs_)
```
# Decision analysis
Once output has been simulated with an economic model, a decision analysis can be performed. CEAs can be conducted using other `R` packages such as [BCEA](https://sites.google.com/a/statistica.it/gianluca/bcea) or directly with `hesim`.
To perform a CEA, simulated QALYs and costs are summarized and a `ce` object is created, which contains mean QALYs and costs for each sample from the PSA by treatment strategy. QALYs and costs can either be summarized by subgroup (`by_grp = TRUE`) or aggregated across all patients (`by_grp = FALSE`).
```{r}
ce <- ictstm$summarize(by_grp = FALSE)
print(ce)
```
The functions `cea()` and `cea_pw()` are used to perform a CEA. The former simultaneously accounts for all treatment strategies while the latter makes pairwise comparisons between interventions and a chosen comparator.
```{r}
cea_out <- cea(ce, dr_qalys = .03, dr_costs = .03)
cea_pw_out <- cea_pw(ce, dr_qalys = .03, dr_costs = .03, comparator = 1)
```
Summary and plotting functions are available to analyze the output. For instance, we can use `plot_ceac()` to quickly plot a cost-effectiveness acceptability curve (CEAC), which displays the probability that each treatment strategy is the most cost-effective at a given willingness to pay for a QALY. The labels we constructed earlier are used to give the treatment strategies informative names.
```{r ceac_plot, warning = FALSE, message = FALSE, fig.width = 6, fig.height = 4}
library("ggplot2")
plot_ceac(cea_out, labels = labs) +
theme_minimal()
```
# Next steps
This article provided an overview of the `hesim` package. We recommend exploring the examples in the other articles to learn more.
cDTSTMs (i.e., Markov cohort models) are probably the most commonly used models in health economics and there are examples demonstrating multiple ways to build them with `hesim`. One approach that has not yet been discussed is a functional one that allows users to define a model (with `define_model()`) in terms of expressions that transform underlying parameter draws from a PSA into relevant transformed parameters (e.g., transition probability matrices, mean state values) as a function of input data.
Other relevant topics include more through treatments of CEA, multi-state modeling, individual-level simulations based on aggregate data, and partitioned survival analysis. As the examples illustrate, any analysis can be performed either for a single group or in the context of multiple subgroups.