Competing Risks Survival Analysis with jointCompRisk

Methods, Implementation, and Applications

2025-10-08

1 Introduction

In clinical studies, patients often face multiple mutually exclusive outcomes of equal clinical importance. Traditional survival analysis focusing on a single endpoint can provide incomplete insights, particularly when competing events like recovery and death, disease-specific versus non-disease mortality, or treatment success versus treatment-related toxicity occur simultaneously. This jointCompRisk package implements advanced methods for analyzing competing risks data with joint inference capabilities, providing both traditional competing risks analysis and novel severity-weighted approaches that incorporate longitudinal ordinal severity scores.

Key Features: 1. Simultaneous analysis of multiple disease outcomes - Compares cumulative incidence functions across different competing risks in a single statistical model 2. Disease severity integration - Incorporates repeated measurements of disease progression over time to weight the analysis by clinical severity 3. Clinically interpretable results - Provides outcomes in meaningful time units such as days of life gained or lost, rather than abstract statistical measures 4. Methodologically validated - Statistical approaches verified and published in peer-reviewed literature (Wen, Wang & Hu, 2023, Statistics in Medicine)

2 Clinical Applications and Rationale

Hospital-based Treatment Studies: patients either recover (treatment success, resource availability for new patients) or experience adverse outcomes including death (treatment failure).

Cancer Research: Multiple disease-related events (local recurrence, distant metastasis, secondary cancers, death) may respond differently to treatment interventions.

Cardiovascular Studies: Various clinical events may show differential treatment responses across mortality and morbidity outcomes.

The package enables simultaneous analysis of multiple competing outcomes while respecting the clinical importance of each endpoint.

3 Theoretical Background

3.0.1 Cumulative Incidence Functions

The cumulative incidence function (CIF) describes the probability of experiencing a specific event type over time while accounting for competing events. Unlike standard survival analysis that treats competing events as censored, CIF provides realistic estimates by properly incorporating all competing risks.

Mathematically, the CIF for event type \(k\) is defined as

\[ F_{k}(t) = P\bigl(T \le t,\ \delta = k \bigr) = \int_{0}^{t} S\!\bigl(u^{-}\bigr)\, d\Lambda_{k}(u) \] where
- \(T\) is the time to the first event of any type,
- \(\delta\) indicates the type of event that occurred,
- \(S(t) = P(T > t)\) is the overall survival function (the probability no event has occurred by time \(t\)),
- \(\Lambda_k(t)\) is the cumulative hazard function for event type \(k\),
- \(S(u^-) = \lim_{s \uparrow u} S(s)\) denotes the left-continuous limit of the survival function at time \(u\).

This formulation shows that the CIF for event type \(k\) accumulates over time as the hazard for that event type increases, weighted by the probability of surviving free of any event up to that point.

3.0.2 Restricted Mean Survival Time

The restricted mean survival time (RMST) up to a specified time \(\tau\) represents the average survival time accumulated by that point. Unlike the median survival, RMST uses the entire survival curve up to \(\tau\) and is particularly useful when the survival curve does not reach 50%.

It is defined as

\[ \text{RMST}(\tau) = E\bigl[\min(T, \tau)\bigr] = \int_0^\tau S(t) \, dt \]

where \(S(t)\) is the overall survival function.

3.0.3 Endpoint-Specific Restricted Mean Times

In competing risks settings, we can define endpoint-specific restricted mean times:

where \(F_k(t)\) is the cumulative incidence function for event type \(k\).

4 Installation and Data Requirements

Load the package

# Install package
# devtools::install_github("cathyzzzhang/jointCompRisk")
library(jointCompRisk)

Sample Data Throughout this vignette, we use simulated data that follows the structure of Adaptive COVID-19 Treatment Trials (ACTT). The dataset represents a clinical trial of 150 patients with competing risks survival data (recovery vs. death) and longitudinal ordinal severity scores measured over a 30-day follow-up period. 1. Treatment group: 1.5× faster recovery times, 1.8× improved survival 2. Severity scores: 1-8 scale (mild to severe illness) measured at scheduled visits 3. Realistic treatment effects demonstrating significant benefits across standard and weighted analyses

4.1 Load the sample data

#Sample datasets included with package
main_df <- read.csv("main_df.csv")
long_df <- read.csv("long_df.csv")

#View data structure
head(main_df)
##            ID TimeToRecovery TimeToDeath RecoveryCensoringIndicator
## 1 Patient_001              3          12                          0
## 2 Patient_002             13           7                          1
## 3 Patient_003              2          30                          0
## 4 Patient_004             15          30                          1
## 5 Patient_005              9          14                          0
## 6 Patient_006             11          30                          0
##   DeathCensoringIndicator BaselineScore Treatment
## 1                       0             5         1
## 2                       1             6         0
## 3                       1             4         1
## 4                       1             6         0
## 5                       0             4         1
## 6                       1             6         1
head(long_df)
##      PersonID OrdinalScore RelativeDay
## 1 Patient_001            5           0
## 2 Patient_001            6           1
## 3 Patient_001            6           3
## 4 Patient_001            5           5
## 5 Patient_001            5           7
## 6 Patient_001            5          10
#Access help documentation
?main_df
?long_df

main_df contains baseline demographics, treatment assignment, censoring indicators, and time-to-event outcomes (TimeToRecovery, TimeToDeath). long_df contains longitudinal severity measurements with one row per visit per participant, enabling joint modeling of how baseline characteristics and evolving severity scores predict competing outcomes.

4.2 Data Structure Requirements

4.2.1 For standard CIF analysis: your dataset needs:

  1. Patient ID: Unique identifier for each patient
  2. Time to Recovery: Time from enrollment to recovery/discharge
  3. Time to Death: Time from enrollment to death
  4. Recovery Censoring: Indicator for recovery censoring (0=event, 1=censored)
  5. Death Censoring: Indicator for death censoring (0=event, 1=censored)
  6. Treatment: Treatment group indicator (0=control, 1=treatment)

4.2.2 For Severity-Weighted CIF Analysis, your dataset needs:

In addition to the above columns, we need an additional column 1. Baseline Score: baseline disease severity As well as we need an additional longitudinal dataset with 1. Patient ID: matching the main dataset 2. Time Point: Days since treatment starts 3. Severity Score: Disease severity score at each time point

4.3 Part 1: Standard CIF Analysis:

Standard CIF analysis provides traditional competing risks analysis with joint inference capabilities.

4.3.1 Step 1: Data preparation

The prep_data_cif() function preprocesses your data by excluding patients with zero survival time, handling “discharge-to-die” cases (recoded as censored for recovery), creating event time variables (etime, estatus, etype2), and splitting data by treatment group.

Customize column names to match your dataset:

mydata_std <- prep_data_cif(
  data             = main_df,
  ID               = "ID",                             #change this
  TimeToRecovery   = "TimeToRecovery",                   #change this
  TimeToDeath      = "TimeToDeath",                      #change this
  Recov_Censoring  = "RecoveryCensoringIndicator",       #change this
  Death_Censoring  = "DeathCensoringIndicator",          #change this
  Treatment        = "Treatment"                         #change this
)

4.3.1.1 Step 2: Analysis

The do_cif_analysis() function performs two analyses: RMLT1 (time lost through death1) and RMLT2 (time lost through death2). The tau parameter sets the time horizon (e.g., tau=15 analyzes the first 15 days).

Each analysis returns formatted results with group-specific estimates and treatment contrasts, containing estimates, standard errors, confidence intervals, and p-values for both treatment groups and their difference. The function provides comprehensive statistical inference for competing risk endpoints in an easy-to-interpret format with groups (individual group results) and contrast (treatment difference) components. For RMLT1, positive differences typically indicate beneficial recovery outcomes, while for RMLT2, negative differences indicate beneficial survival outcomes (less time lost to death).

res_std <- do_cif_analysis(mydata_std, tau=15)

# Access RMLT1 results (recovery/discharge)
res_std$RMLT1$groups     # Group estimates
##            Group Estimate        se   Lower95  Upper95
## 1 Group1 (trt=1) 3.760347 0.4769287 2.8255670 4.695128
## 2 Group2 (trt=0) 1.167820 0.3530140 0.4759129 1.859728
res_std$RMLT1$contrast   # Treatment difference
##     Contrast Estimate        se  Lower95  Upper95      p_value
## 1 Difference 2.592527 0.5933632 1.429535 3.755519 1.246981e-05
# Access RMLT2 results (death)
res_std$RMLT2$groups     # Group estimates  
##            Group Estimate        se   Lower95  Upper95
## 1 Group1 (trt=1) 1.082678 0.3039021 0.4870301 1.678326
## 2 Group2 (trt=0) 2.743811 0.4912417 1.7809770 3.706644
res_std$RMLT2$contrast   # Treatment difference
##     Contrast  Estimate       se   Lower95    Upper95  p_value
## 1 Difference -1.661133 0.577646 -2.793319 -0.5289465 1.995969

4.3.2 Step 3: Result Interpretation

According to the results above, using the simulated example data, we can interpret the result as below:

RMLT1 (Recovery Analysis): Treatment group gains 2.59 more days through recovery (3.76 vs 1.17 days, p < 0.001). The positive difference indicates that treated patients experience significantly more recovery events and accumulate more time in recovered states, demonstrating enhanced recovery outcomes with strong statistical significance.

RMLT2 (Death Analysis): Treatment group loses 1.66 fewer days to death (1.08 vs 2.74 days, p = 0.004). The negative difference indicates that treated patients experience reduced mortality risk, with fewer deaths occurring and delayed time to death when it does occur, representing a clear survival benefit with strong statistical significance.

Clinical Conclusion: Both standard competing risks measures consistently show treatment benefits - more time gained through recovery (beneficial) and less time lost to death (beneficial), demonstrating that treatment enhances recovery while reducing mortality across both competing endpoints. The treatment effect is statistically significant for both outcomes, indicating robust clinical efficacy.

4.4 Part 2: Weighted CIF Analysis:

Weighted analysis incorporates disease severity changes over time, providing insights when patient conditions vary during follow-up.

4.4.1 Step 1: Data Preparation

The function merges time-to-event data with longitudinal severity measurements, calculating time spent in different severity states and applying clinical weights. Severity scores are flexible (example uses scores 4-7, but any range can be used to meet your dataset’s needs) with matching weights required for each state. In practice, death weights often increase with severity (example - State 4: 2.0, State 5: 1.5, State 6: 1.0, State 7: 0.5) to emphasize concerning states, while discharge weights (if applicable) often decrease with severity (example - State 4: 0.5, State 5: 1.0, State 6: 1.5, State 7: 2.0) to emphasize favorable recovery states. This produces separate datasets enabling computation of weighted restricted mean lifetime 1 and 2. Weights can also be customized based on stakeholder priorities—hospital administrators might prioritize shorter stays, while patients might weight quality of life differently than clinical severity.

Handling Missing Measurements in Weighted Analysis: Here the weighted CIF analysis uses a “last observation carried forward” approach for gaps between measurement time points. When longitudinal severity scores are missing between recorded visits (e.g., measurements on day 4 and day 7 with no intermediate assessments), the algorithm assumes the patient’s severity state remains constant at the last observed value throughout the entire interval. Specifically, if a patient has a severity score of 5 on day 4 and the next measurement is on day 7, regardless of the severity score on day 7, the algorithm credits all 3 intervening days (days 4-6) to severity state 5. This approach calculates interval_length = day_7 - day_4 = 3 days and adds this entire duration to the appropriate severity state counter without interpolation or imputation of intermediate values. While this method may not capture dynamic changes within measurement intervals, it provides a conservative and computationally tractable approach for incorporating longitudinal severity information into competing risks analysis. Researchers should consider the frequency of their measurement schedule when interpreting weighted results, as longer intervals between assessments may reduce the precision of severity-weighted estimates.

prepped_w <- prep_data_weighted_cif2(
  data_main = main_df,
  data_long = long_df,

  wID_main              = "ID",
  wTimeToRecovery_main  = "TimeToRecovery",
  wTimeToDeath_main     = "TimeToDeath",
  wRecov_Censoring_main = "RecoveryCensoringIndicator",
  wDeath_Censoring_main = "DeathCensoringIndicator",
  wTreatment_main       = "Treatment",
  wBaselineScore_main   = "BaselineScore",
  
  wID_long              = "PersonID",
  wADY_long             = "RelativeDay",
  wScore_long           = "OrdinalScore",

  wStates_death         = c(4,5,6,7,8), 
  wWeights_death        = c(2,1.5,1,0.5,0.2),
  wStates_discharge     = c(4,5,6,7,8),
  wWeights_discharge    = c(0.2,0.5,1,1.5,2)
)

4.4.2 Step 2: Analysis

The function do_weighted_cif_analysis() performs weighted competing risks analysis using the datasets prepared by prep_data_weighted_cif2(), computing Weighted RMLT1 (in our simulated example it would be recovery/discharge analysis with severity-weighted time) and Weighted RMLT2 (in our simulated example it would be death analysis with severity-weighted time) at the specified time horizon tau. Unlike standard analysis that treats all time equally, this approach emphasizes clinically relevant periods - time spent in better states before discharge receives higher weight in WRMLT1 calculations, while time spent in severe states before death receives higher weight in WRMLT2 calculations. The function returns formatted results with $groups and $contrast components for both weighted measures, containing estimates, standard errors, confidence intervals, and p-values that reflect the severity-adjusted treatment effects on competing outcomes.

Different tau values reveal treatment effects over time: short-term (e.g., tau=15) shows immediate treatment effects, long-term (e.g., tau=29) reveals sustained or delayed benefits, and multiple horizons demonstrate the evolution of treatment effects. Treatment benefits may be immediate, delayed, or diminishing, so analyzing multiple time points provides comprehensive insights.

For weighted analysis, here in this example, WRMLT1 represents recovery/discharge time weighted by clinical severity states, while WRMLT2 shows death time weighted by severity progression. Both measures capture timing and severity of clinical outcomes simultaneously. Positive differences in WRMLT1 indicate more severity-weighted recovery time (beneficial), while negative differences in WRMLT2 indicate less severity-weighted time lost to death (beneficial). Results provide clinically meaningful metrics in time units, facilitating evidence-based treatment decisions.

#Analysis at 15 days
res_w15 <- do_weighted_cif_analysis(prepped_w, tau=15)

# Display WRMLT1 results (discharge-focused analysis)  
#Group estimates with standard errors
res_w15$WRMLT1$groups
##            Group Estimate        se   Lower95  Upper95
## 1 Group1 (trt=1) 3.955174 0.6654671 2.6508587 5.259490
## 2 Group2 (trt=0) 1.674784 0.5287521 0.6384295 2.711138
#Treatment difference: Treatment1 - Treatment0
res_w15$WRMLT1$contrast
##     Contrast Estimate       se  Lower95  Upper95     p_value
## 1 Difference 2.280391 0.849956 0.614477 3.946304 0.007297557
# Display WRMLT2 results (discharge-focused analysis)  
res_w15$WRMLT2$groups
##            Group Estimate        se   Lower95  Upper95
## 1 Group1 (trt=1) 1.587423 0.4903069 0.6264214 2.548424
## 2 Group2 (trt=0) 3.007981 0.6181803 1.7963481 4.219615
res_w15$WRMLT2$contrast
##     Contrast  Estimate        se   Lower95   Upper95  p_value
## 1 Difference -1.420559 0.7890169 -2.967032 0.1259145 1.928205

4.4.3 Step 3: Result Interpretation

WRMLT1 (Recovery/Discharge Analysis): Treatment group shows 2.28 more severity-weighted recovery days (3.96 vs 1.67 days, p = 0.007). The positive difference indicates that treated patients accumulate significantly more time in recovery-related severity states, representing enhanced recovery outcomes weighted by clinical improvement trajectories.

WRMLT2 (Death Analysis): Treatment group shows 1.42 fewer severity-weighted death days (1.59 vs 3.01 days, p = 0.072). The negative difference indicates that treated patients lose less time to death when accounting for disease severity progression, representing a survival benefit, though this approaches but does not reach statistical significance at the 0.05 level.

Clinical Conclusion: Both weighted measures consistently show treatment benefits - more severity-weighted recovery time (beneficial) and less severity-weighted time lost to death (beneficial), confirming that treatment advantages persist even when accounting for disease severity trajectories.