ukbflowukbflow is an R package for UK Biobank
analysis on the Research
Analysis Platform (RAP). It covers the full midstream-to-downstream
pipeline — from phenotype derivation and association analysis to
publication-ready figures and genetic risk scoring — designed for
RAP-native UKB workflows, with local simulated data for development and
testing.
df <- df |>
derive_missing() |> # recode "Prefer not to answer" → NA
derive_selfreport(name = "t2dm", regex = "diabetes", # T2DM self-report
field = "noncancer") |>
derive_icd10(name = "t2dm", icd10 = "E11", source = "hes") |> # T2DM from HES
derive_case(name = "t2dm") |> # → t2dm_status, t2dm_date
derive_followup(name = "t2dm",
event_col = "t2dm_date",
baseline_col = "p53_i0", # assessment centre date
censor_date = as.Date("2022-06-01"))# Forest plot — see vignette("plot") for full usage
res_df <- as.data.frame(res)
plot_forest(
data = res_df,
est = res_df$HR,
lower = res_df$CI_lower,
upper = res_df$CI_upper,
ci_column = 7L # res_df has 6 cols before HR; CI graphic goes here
)
# Table 1
plot_tableone(
data = as.data.frame(df),
vars = c("p21022", # age_at_recruitment
"p31", # sex
"p21001_i0"), # bmi
strata = "t2dm_status"
)| Module | Key functions | Vignette |
|---|---|---|
| Auth | auth_login(), auth_select_project() |
vignette("auth") |
| Fetch | fetch_ls(), fetch_file(),
fetch_tree() |
vignette("fetch") |
| Extract | extract_pheno(), extract_batch(),
extract_ls() |
vignette("extract") |
| Job | job_wait(), job_status(),
job_result() |
vignette("job") |
| Decode | decode_values(), decode_names() |
vignette("decode") |
| Derive | derive_missing(), derive_icd10(),
derive_case() |
vignette("derive") |
| Survival | derive_timing(), derive_age(),
derive_followup() |
vignette("derive-survival") |
| Assoc | assoc_coxph(), assoc_logistic(),
assoc_subgroup() |
vignette("assoc") |
| Plot | plot_forest(), plot_tableone() |
vignette("plot") |
| GRS | grs_check(), grs_score(),
grs_validate() |
vignette("grs") |
| Ops | ops_setup(), ops_toy(),
ops_snapshot() |
vignette("ops") |
For a complete worked example using a simulated UK Biobank cohort — covering data loading, phenotype derivation, cohort assembly, Cox regression, and publication-ready visualisation — see:
vignette("smoking_lung_cancer") — Smoking and
Lung Cancer Risk: A Complete Analysis Workflow
?ukbflow or
help(package = "ukbflow")“All models are wrong, but some are publishable.”
— after George Box