Generalized Additive Models with Hyper Column

Tingting Zhan

Introduction

This vignette of documents applications based on package hyper.gam (Github, RPubs).

Prerequisite

New features are first implemented on Github.

devtools::install_github('tingtingzhan/hyper.gam')

And eventually make their way to CRAN.

utils::install.packages('hyper.gam')

Getting Started

Examples in this vignette require that the search path has

library(hyper.gam)
library(survival)

Terms and Abbreviations

Term / Abbreviation Description Reference
Forward pipe operator ?base::pipeOp introduced in R 4.1.0
attr Attributes base::attr; base::attributes
contour Contours graphics::contour; hyper.gam::contour.hyper_gam
coxph Cox model survival::coxph
gam Generalized additive models mgcv::gam
groupedHyperframe, hypercolumn (Hyper column of) (grouped) hyper data frame groupedHyperframe::as.groupedHyperframe; spatstat.geom::hyperframe
htmlwidget HTML Widgets ?htmlwidgets::`htmlwidgets-package`; plotly::plotly
persp Perspective plot graphics::persp; hyper.gam::persp.hyper_gam
PFS Progression/recurrence free survival https://en.wikipedia.org/wiki/Progression-free_survival
quantile Quantile stats::quantile
S3, generic, methods S3 object oriented system base::UseMethod; utils::methods; utils::getS3method; https://adv-r.hadley.nz/s3.html
Surv Survival object survival::Surv

Acknowledgement

The authors thank Erjia Cui for his contribution to function hyper_gam().

This work is supported by NCI R01CA222847 (I. Chervoneva, T. Zhan, and H. Rui) and R01CA253977 (H. Rui and I. Chervoneva).

Quantile Index Predictor

Publications include (Yi et al. 2023b, 2023a).

Data Preparation

Data set groupedHyperframe::Ki67 is a groupedHyperframe with a numeric-hypercolumn logKi67 and a nested grouping structure ~patientID/tissueID

data(Ki67, package = 'groupedHyperframe')
Ki67
Grouped Hyperframe: ~patientID/tissueID

645 tissueID nested in
622 patientID

Preview of first 10 (or less) rows:

     logKi67 tissueID Tstage  PFS recfreesurv_mon recurrence adj_rad adj_chemo
1  (numeric) TJUe_I17      2 100+             100          0   FALSE     FALSE
2  (numeric) TJUe_G17      1   22              22          1   FALSE     FALSE
3  (numeric) TJUe_F17      1  99+              99          0   FALSE        NA
4  (numeric) TJUe_D17      1  99+              99          0   FALSE      TRUE
5  (numeric) TJUe_J18      1  112             112          1    TRUE      TRUE
6  (numeric) TJUe_N17      4   12              12          1    TRUE     FALSE
7  (numeric) TJUe_J17      2  64+              64          0   FALSE     FALSE
8  (numeric) TJUe_F19      2  56+              56          0   FALSE     FALSE
9  (numeric) TJUe_P19      2  79+              79          0   FALSE     FALSE
10 (numeric) TJUe_O19      2   26              26          1   FALSE      TRUE
   histology  Her2   HR  node  race age patientID
1          3  TRUE TRUE  TRUE White  66   PT00037
2          3 FALSE TRUE FALSE Black  42   PT00039
3          3 FALSE TRUE FALSE White  60   PT00040
4          3 FALSE TRUE  TRUE White  53   PT00042
5          3 FALSE TRUE  TRUE White  52   PT00054
6          2  TRUE TRUE  TRUE Black  51   PT00059
7          3 FALSE TRUE  TRUE Asian  50   PT00062
8          2  TRUE TRUE  TRUE White  37   PT00068
9          3  TRUE TRUE FALSE White  68   PT00082
10         2  TRUE TRUE FALSE Black  55   PT00084

Analysis in the next section is based on the aggregated quantiles by patientID. Users are encouraged to learn more about the groupedHyperframe class and the function aggregate_quantile() from package groupedHyperframe vignettes.

s = Ki67 |>
  aggregate_quantile(by = ~ patientID, probs = seq.int(from = .01, to = .99, by = .01))
s |> head()
Hyperframe:
  Tstage  PFS recfreesurv_mon recurrence adj_rad adj_chemo histology  Her2   HR
1      2 100+             100          0   FALSE     FALSE         3  TRUE TRUE
2      1   22              22          1   FALSE     FALSE         3 FALSE TRUE
3      1  99+              99          0   FALSE        NA         3 FALSE TRUE
4      1  99+              99          0   FALSE      TRUE         3 FALSE TRUE
5      1  112             112          1    TRUE      TRUE         3 FALSE TRUE
6      4   12              12          1    TRUE     FALSE         2  TRUE TRUE
   node  race age patientID logKi67.quantile
1  TRUE White  66   PT00037        (numeric)
2 FALSE Black  42   PT00039        (numeric)
3 FALSE White  60   PT00040        (numeric)
4  TRUE White  53   PT00042        (numeric)
5  TRUE White  52   PT00054        (numeric)
6  TRUE Black  51   PT00059        (numeric)

Linear Quantile Index

We fit a linear hyper_gam model, i.e., a gam model with the numeric-hypercolumn logKi67.quantile using mgcv::s smooth.

m0 = hyper_gam(PFS ~ logKi67.quantile, data = s)

Visualization

Function integrandSurface() creates an interactive htmlwidget by package plotly to illustrate the integrand surface of linear quantile indices p\in[0,1] and q\in\text{range}\big(Q_i(p)\big) for all subjects i=1,\cdots,n.

\hat{S}_0(p,q) = \hat{\beta}(p)\cdot q

as well as the projections of the integrand curves of selected subjects onto the (p,q)- and (S,p)-plane. Note that this htmlwidget is suppressed in the vignette on CRAN due to package size limit. This htmlwidget can be viewed and interacted with on RPubs.

integrandSurface(m0)

Less fancy illustrations of the integrand surface include perspective and contour plots provided by package graphics shipped with vanilla R.

par(mar = c(2, 2, 0, 0))
persp(m0)

par(op)
par(mar = c(4, 5, 1, 0))
contour(m0)

par(op)

k-Fold Prediction

Linear quantile index is the k-fold prediction of the linear hyper_gam model m0.

set.seed(145); QI = m0 |> 
  kfoldPredict.hyper_gam(k = 10L, mc.cores = 1L)

Diagnosis of linear quantile index QI

par(mar = c(4, 5, 1, 0))
boxplot(QI ~ attr(QI, 'fold'), xlab = 'Fold')

par(op)

Regression Model using Linear Quantile Index

suppressWarnings(sQI <- cbind(s, spatstat.geom::hyperframe(QI = QI)) |> 
                   as.data.frame())
coxph(PFS ~ QI, data = sQI) |> summary()
Call:
coxph(formula = PFS ~ QI, data = sQI)

  n= 622, number of events= 118 

      coef exp(coef) se(coef)    z Pr(>|z|)
QI 0.07052   1.07306  0.18550 0.38    0.704

   exp(coef) exp(-coef) lower .95 upper .95
QI     1.073     0.9319     0.746     1.544

Concordance= 0.514  (se = 0.027 )
Likelihood ratio test= 0.14  on 1 df,   p=0.7
Wald test            = 0.14  on 1 df,   p=0.7
Score (logrank) test = 0.14  on 1 df,   p=0.7

Nonlinear Quantile Index

We fit a nonlinear hyper_gam model, i.e., a gam model with the numeric-hypercolumn logKi67.quantile using tensor product interaction mgcv::ti smooth.

m1 = hyper_gam(PFS ~ logKi67.quantile, data = s, nonlinear = TRUE)

Visualization

Function integrandSurface() illustrates the integrand surface of nonlinear quantile indices p\in[0,1] and q\in\text{range}\big(Q_i(p)\big) for all subjects i=1,\cdots,n.

\hat{S}_0(p,q) = \hat{F}(p,q) as well as the projections of the integrand curves of selected subjects onto the (p,q)- and (S,p)-plane. Note that this htmlwidget is suppressed in the vignette on CRAN due to package size limit. This htmlwidget can be viewed and interacted with on RPubs.

integrandSurface(m1)

Less fancy illustrations of the integrand surface include perspective and contour plots provided by package graphics shipped with vanilla R.

par(mar = c(2, 2, 0, 0))
persp(m1)

par(op)
par(mar = c(4, 5, 1, 0))
contour(m1)

par(op)

k-Fold Prediction

Nonlinear quantile index is the k-fold prediction of the nonlinear hyper_gam model m1.

set.seed(145); nlQI = m1 |> kfoldPredict.hyper_gam(k = 10L, mc.cores = 1L)

Diagnosis of nonlinear quantile index nlQI.

par(mar = c(4, 5, 1, 0))
boxplot(nlQI ~ attr(nlQI, 'fold'), xlab = 'Fold')

par(op)

Regression Model using Linear Quantile Index

suppressWarnings(s_nlQI <- cbind(s, spatstat.geom::hyperframe(nlQI = nlQI)) |> 
                   as.data.frame())
coxph(PFS ~ nlQI, data = s_nlQI) |> summary()
Call:
coxph(formula = PFS ~ nlQI, data = s_nlQI)

  n= 622, number of events= 118 

       coef exp(coef) se(coef)    z Pr(>|z|)
nlQI 0.0735    1.0763   0.1669 0.44     0.66

     exp(coef) exp(-coef) lower .95 upper .95
nlQI     1.076     0.9291    0.7759     1.493

Concordance= 0.527  (se = 0.029 )
Likelihood ratio test= 0.2  on 1 df,   p=0.7
Wald test            = 0.19  on 1 df,   p=0.7
Score (logrank) test = 0.19  on 1 df,   p=0.7

References

Yi, Misung, Tingting Zhan, Amy R. Peck, Jeffrey A. Hooke, Albert J. Kovatich, Craig D. Shriver, Hai Hu, Yunguang Sun, Hallgeir Rui, and Inna Chervoneva. 2023a. “Quantile Index Biomarkers Based on Single-Cell Expression Data.” Laboratory Investigation 103 (8): 100158. https://doi.org/10.1016/j.labinv.2023.100158.
———. 2023b. “Selection of Optimal Quantile Protein Biomarkers Based on Cell-Level Immunohistochemistry Data.” BMC Bioinformatics 24 (1): 298. https://doi.org/10.1186/s12859-023-05408-8.