Grouped Hyper Data Frame

Tingting Zhan

Introduction

This vignette of package groupedHyperframe (CRAN, Github, RPubs) documents the creation of groupedHyperframe object, the batch processes for a groupedHyperframe, and aggregations of various statistics over multi-level grouping structure.

Prerequisite

Package groupedHyperframe may require the development versions of the spatstat family.

devtools::install_github('spatstat/spatstat')
devtools::install_github('spatstat/spatstat.data')
devtools::install_github('spatstat/spatstat.explore')
devtools::install_github('spatstat/spatstat.geom')
devtools::install_github('spatstat/spatstat.linnet')
devtools::install_github('spatstat/spatstat.model')
devtools::install_github('spatstat/spatstat.random')
devtools::install_github('spatstat/spatstat.sparse')
devtools::install_github('spatstat/spatstat.univar')
devtools::install_github('spatstat/spatstat.utils')

Note to Users

Examples in this vignette require that the search path has

library(groupedHyperframe)
library(spatstat.data)
library(survival) # to help hyperframe understand Surv object

Users should remove the parameter mc.cores = 1L from all examples to engage all CPU cores on the current host under macOS. The authors of package groupedHyperframe are forced to have mc.cores = 1L in this vignette to pass CRAN’s submission check.

Terms and Abbreviations

Term / Abbreviation Description Reference
Forward pipe operator ?base::pipeOp introduced in R 4.1.0
attr Attributes base::attr; base::attributes
CRAN, R The Comprehensive R Archive Network https://cran.r-project.org
data.frame Data frame base::data.frame
formula Formula stats::formula
fv, fv.object, fv.plot (Plot of) function value table spatstat.explore::fv.object, spatstat.explore::plot.fv
groupedData, ~ g1/.../gm Grouped data frame; nested grouping structure nlme::groupedData; nlme::lme
hypercolumns, hyperframe (Hyper columns of) hyper data frame spatstat.geom::hyperframe
inherits Class inheritance base::inherits
kerndens Kernel density stats::density.default()$y
mc.cores Number of CPU cores to use parallel::mclapply; parallel::detectCores
multitype Multitype object spatstat.geom::is.multitype
object.size Memory allocation utils::object.size
pmean, pmedian Parallel mean and median groupedHyperframe::pmean; groupedHyperframe::pmedian
pmax, pmin Parallel maxima and minima base::pmax; base::pmin
ppp, ppp.object (Marked) point pattern spatstat.geom::ppp.object
quantile Quantile stats::quantile
save, xz Save with xz compression base::save(., compress = 'xz'); base::saveRDS(., compress = 'xz'); https://en.wikipedia.org/wiki/XZ_Utils
S3, generic, methods S3 object oriented system base::UseMethod; utils::methods; utils::getS3method; https://adv-r.hadley.nz/s3.html
search Search path base::search
Surv Survival object survival::Surv
trapz, cumtrapz (Cumulative) trapezoidal integration pracma::trapz; pracma::cumtrapz; https://en.wikipedia.org/wiki/Trapezoidal_rule

Acknowledgement

This work supported by NCI R01CA222847 (I. Chervoneva, T. Zhan, and H. Rui) and R01CA253977 (H. Rui and I. Chervoneva).

groupedHyperframe Class

The S3 class groupedHyperframe inherits from the hyperframe class, in a similar fashion as the groupedData class inherits from the data.frame class.

A groupedHyperframe object, in addition to a hyperframe object, has attribute(s)

Create a groupedHyperframe

From a hyperframe

The S3 method dispatch as.groupedHyperframe.hyperframe() converts a hyperframe to groupedHyperframe. Data set spatstat.data::osteo has the serial number of sampling volume brick nested in the bone sample id,

osteo |> as.groupedHyperframe(group = ~ id/brick)
#> Grouped Hyperframe: ~id/brick
#> 
#> 40 brick nested in
#> 4 id
#> 
#>        id shortid brick   pts depth
#> 1  c77za4       4     1 (pp3)    45
#> 2  c77za4       4     2 (pp3)    60
#> 3  c77za4       4     3 (pp3)    55
#> 4  c77za4       4     4 (pp3)    60
#> 5  c77za4       4     5 (pp3)    85
#> 6  c77za4       4     6 (pp3)    90
#> 7  c77za4       4     7 (pp3)    95
#> 8  c77za4       4     8 (pp3)    65
#> 9  c77za4       4     9 (pp3)   100
#> 10 c77za4       4    10 (pp3)   100

From a data.frame

The S3 method dispatch as.groupedHyperframe.data.frame() converts a data.frame to a groupedHyperframe. This function inspects the input by the (nested) grouping structure, identifies the column(s) with elements not identical within the lowest group, and converts them into hypercolumns. Data set Ki67. in this package has non-identical column logKi67 in the nested grouping structure ~ patientID/tissueID.

(Ki67g = Ki67. |> as.groupedHyperframe(group = ~ patientID/tissueID, mc.cores = 1L))
#> Grouped Hyperframe: ~patientID/tissueID
#> 
#> 6 tissueID nested in
#> 6 patientID
#> 
#>     logKi67 tissueID Tstage  PFS recfreesurv_mon recurrence adj_rad adj_chemo
#> 1 (numeric) TJUe_I17      2 100+             100          0   FALSE     FALSE
#> 2 (numeric) TJUe_G17      1   22              22          1   FALSE     FALSE
#> 3 (numeric) TJUe_F17      1  99+              99          0   FALSE        NA
#> 4 (numeric) TJUe_D17      1  99+              99          0   FALSE      TRUE
#> 5 (numeric) TJUe_J18      1  112             112          1    TRUE      TRUE
#> 6 (numeric) TJUe_N17      4   12              12          1    TRUE     FALSE
#>   histology  Her2   HR  node  race age patientID
#> 1         3  TRUE TRUE  TRUE White  66   PT00037
#> 2         3 FALSE TRUE FALSE Black  42   PT00039
#> 3         3 FALSE TRUE FALSE White  60   PT00040
#> 4         3 FALSE TRUE  TRUE White  53   PT00042
#> 5         3 FALSE TRUE  TRUE White  52   PT00054
#> 6         2  TRUE TRUE  TRUE Black  51   PT00059

Converting a data.frame with cell intensities, etc., into a groupedHyperframe reduces memory allocation, but does not reduce much the saved files size if xz compression is used.

unclass(object.size(Ki67g)) / unclass(object.size(Ki67.))
#> [1] 0.1148083
f_g = tempfile(fileext = '.rds')
Ki67g |> saveRDS(file = f_g, compress = 'xz')
f = tempfile(fileext = '.rds')
Ki67. |> saveRDS(file = f, compress = 'xz')
file.size(f_g) / file.size(f) # not much reduction
#> [1] 0.9629481

Create a groupedHyperframe with ppp-hypercolumn

Function grouped_ppp() creates a groupedHyperframe with one-and-only-one ppp-hypercolumn. In the following example, the argument formula specifies

(s = grouped_ppp(formula = hladr + phenotype ~ OS + gender + age | patient_id/image_id, 
                 data = wrobel_lung, mc.cores = 1L))
#> Grouped Hyperframe: ~patient_id/image_id
#> 
#> 25 image_id nested in
#> 5 patient_id
#> 
#>       OS gender age    patient_id          image_id  ppp.
#> 1  3488+      F  85 #01 0-889-121 [40864,18015].im3 (ppp)
#> 2  3488+      F  85 #01 0-889-121 [42689,19214].im3 (ppp)
#> 3  3488+      F  85 #01 0-889-121 [42806,16718].im3 (ppp)
#> 4  3488+      F  85 #01 0-889-121 [44311,17766].im3 (ppp)
#> 5  3488+      F  85 #01 0-889-121 [45366,16647].im3 (ppp)
#> 6   1605      M  66 #02 1-037-393 [56576,16907].im3 (ppp)
#> 7   1605      M  66 #02 1-037-393 [56583,15235].im3 (ppp)
#> 8   1605      M  66 #02 1-037-393 [57130,16082].im3 (ppp)
#> 9   1605      M  66 #02 1-037-393 [57396,17896].im3 (ppp)
#> 10  1605      M  66 #02 1-037-393 [57403,16934].im3 (ppp)

Batch Process on ppp-hypercolumn

In this section, we outline the batch processes of spatial point pattern analyses applicable to the one-and-only-one ppp-hypercolumn of a hyperframe. These batch processes are not intended for a hyperframe with multiple ppp-hypercolumns in the foreseeable future, as that would require checking for name clashes in the $marks from multiple ppp-hypercolumns.

… which adds a fv-hypercolumn

Batch Process Workhorse in spatstat.explore Applicable To fv-hypercolumn Suffix
Emark_() Emark() numeric marks .E
Vmark_() Vmark() numeric marks .V
markcorr_() markcorr() numeric marks .k
markvario_() markvario() numeric marks .gamma
Gcross_() Gcross() multitype marks .G
Kcross_() Kcross() multitype marks .K
Jcross_() Jcross() multitype marks .J

… which adds a numeric-hypercolumn

Batch Process Workhorse in spatstat.geom Applicable To numeric-hypercolumn Suffix
nncross_() nncross.ppp(., what = 'dist') multitype marks .nncross

Pipe operator compatible

Multiple batch processes may be applied to a hyperframe (or groupedHyperframe) in a pipeline.

r = seq.int(from = 0, to = 250, by = 10)
out = s |>
  Emark_(r = r, correction = 'best', mc.cores = 1L) |> # slow
  # Vmark_(r = r, correction = 'best', mc.cores = 1L) |> # slow
  # markcorr_(r = r, correction = 'best', mc.cores = 1L) |> # slow
  # markvario_(r = r, correction = 'best', mc.cores = 1L) |> # slow
  Gcross_(i = 'CK+.CD8-', j = 'CK-.CD8+', r = r, correction = 'best', mc.cores = 1L) |> # fast
  # Kcross_(i = 'CK+.CD8-', j = 'CK-.CD8+', r = r, correction = 'best', mc.cores = 1L) |> # fast
  nncross_(i = 'CK+.CD8-', j = 'CK-.CD8+', correction = 'best', mc.cores = 1L) # fast
#> 

The returned hyperframe (or groupedHyperframe) has

out
#> Grouped Hyperframe: ~patient_id/image_id
#> 
#> 25 image_id nested in
#> 5 patient_id
#> 
#>       OS gender age    patient_id          image_id  ppp. hladr.E phenotype.G
#> 1  3488+      F  85 #01 0-889-121 [40864,18015].im3 (ppp)    (fv)        (fv)
#> 2  3488+      F  85 #01 0-889-121 [42689,19214].im3 (ppp)    (fv)        (fv)
#> 3  3488+      F  85 #01 0-889-121 [42806,16718].im3 (ppp)    (fv)        (fv)
#> 4  3488+      F  85 #01 0-889-121 [44311,17766].im3 (ppp)    (fv)        (fv)
#> 5  3488+      F  85 #01 0-889-121 [45366,16647].im3 (ppp)    (fv)        (fv)
#> 6   1605      M  66 #02 1-037-393 [56576,16907].im3 (ppp)    (fv)        (fv)
#> 7   1605      M  66 #02 1-037-393 [56583,15235].im3 (ppp)    (fv)        (fv)
#> 8   1605      M  66 #02 1-037-393 [57130,16082].im3 (ppp)    (fv)        (fv)
#> 9   1605      M  66 #02 1-037-393 [57396,17896].im3 (ppp)    (fv)        (fv)
#> 10  1605      M  66 #02 1-037-393 [57403,16934].im3 (ppp)    (fv)        (fv)
#>    phenotype.nncross
#> 1          (numeric)
#> 2          (numeric)
#> 3          (numeric)
#> 4          (numeric)
#> 5          (numeric)
#> 6          (numeric)
#> 7          (numeric)
#> 8          (numeric)
#> 9          (numeric)
#> 10         (numeric)

Aggregation Over Nested Grouping Structure

When nested grouping structure ~g1/g2/.../gm is present, we may aggregate over the

by either one of the grouping levels ~g1, ~g2, …, or ~gm. If the lowest grouping ~gm is specified, then no aggregation is performed.

Aggregation of fv-hypercolumns

Function aggregate_fv() aggregates

(afv = out |>
  aggregate_fv(by = ~ patient_id, f_aggr_ = pmean, mc.cores = 1L))
#> Column(s) image_id removed; as they are not identical per aggregation-group
#> Hyperframe:
#>      OS gender age    patient_id hladr.E.value hladr.E.cumtrapz
#> 1 3488+      F  85 #01 0-889-121     (numeric)        (numeric)
#> 2  1605      M  66 #02 1-037-393     (numeric)        (numeric)
#> 3   176      M  84 #03 2-080-378     (numeric)        (numeric)
#> 4 2042+      M  79 #04 2-223-153     (numeric)        (numeric)
#> 5 3747+      M  68 #05 2-286-740     (numeric)        (numeric)
#>   phenotype.G.value phenotype.G.cumtrapz
#> 1         (numeric)            (numeric)
#> 2         (numeric)            (numeric)
#> 3         (numeric)            (numeric)
#> 4         (numeric)            (numeric)
#> 5         (numeric)            (numeric)

Each of the numeric-hypercolumns contains tabulated values on the common grid of r. One “slice” of this grid may be extracted by

afv$hladr.E.cumtrapz |> .slice(j = '50')
#>         1         2         3         4         5 
#> 10.489960 10.463419 31.248955  3.162186 23.635120

Aggregation of numeric-hypercolumns and numeric mark(s) in ppp-hypercolumn

Function aggregate_quantile() aggregates the quantile of

out |>
  aggregate_quantile(by = ~ patient_id, probs = seq.int(from = 0, to = 1, by = .1), mc.cores = 1L)
#> Column(s) image_id removed; as they are not identical per aggregation-group
#> Hyperframe:
#>      OS gender age    patient_id phenotype.nncross.quantile hladr.quantile
#> 1 3488+      F  85 #01 0-889-121                  (numeric)      (numeric)
#> 2  1605      M  66 #02 1-037-393                  (numeric)      (numeric)
#> 3   176      M  84 #03 2-080-378                  (numeric)      (numeric)
#> 4 2042+      M  79 #04 2-223-153                  (numeric)      (numeric)
#> 5 3747+      M  68 #05 2-286-740                  (numeric)      (numeric)

Function aggregate_kerndens() aggregates the kernel density of

(mdist = out$phenotype.nncross |> unlist() |> max())
#> [1] 354.2968
out |> 
  aggregate_kerndens(by = ~ patient_id, from = 0, to = mdist, mc.cores = 1L)
#> Column(s) image_id removed; as they are not identical per aggregation-group
#> Hyperframe:
#>      OS gender age    patient_id phenotype.nncross.kerndens hladr.kerndens
#> 1 3488+      F  85 #01 0-889-121                  (numeric)      (numeric)
#> 2  1605      M  66 #02 1-037-393                  (numeric)      (numeric)
#> 3   176      M  84 #03 2-080-378                  (numeric)      (numeric)
#> 4 2042+      M  79 #04 2-223-153                  (numeric)      (numeric)
#> 5 3747+      M  68 #05 2-286-740                  (numeric)      (numeric)