The HVT package offers a suite of R functions designed to construct topology preserving maps for in-depth analysis of multivariate data. It is particularly well-suited for datasets with numerous records. The package organizes the typical workflow into several key stages:
Data Compression: Long datasets are compressed using Hierarchical Vector Quantization (HVQ) to achieve the desired level of data reduction.
Data Projection: Compressed cells are projected into one and two dimensions using dimensionality reduction algorithms, producing embeddings that preserve the original topology. This allows for intuitive visualization of complex data structures.
Tessellation: Voronoi tessellation partitions the projected space into distinct cells, supporting hierarchical visualizations. Heatmaps and interactive plots facilitate exploration and insights into the underlying data patterns.
Scoring: Test dataset is evaluated against previously generated maps, enabling their placement within the existing structure. Sequential application across multiple maps is supported if required.
Temporal Analysis and Visualization: Functions in this stage examine time-series data to identify patterns, estimate transition probabilities, and visualize data flow over time.
Dynamic Forecasting: Monte Carlo simulations of Markov chain provides forecasting capabilities for both ex-post and ex-ante scenarios with meticulously handling problematic states when found.
The HVT package allows creation of visually stunning tessellations, showcasing the power of topology preserving maps. Below is an image depicting a captivating tessellation of a torus, see vignette for more details.
Figure 1: The Voronoi tessellation for layer 1 and number of cells 500 with the heat map overlaid for variable ‘z’.
Following are the links to the vignettes for the HVT package:
Version | Vignette Title | Description |
---|---|---|
v18.05.17 | HVT Vignette | Contains the workflow of the functions used for vector quantization and construction of Hierarchical Voronoi Tessellations for data analysis. |
v18.05.17 | HVT Model Diagnostics Vignette | Contains demonstrations of functions used to perform model diagnostics and validation for the trained HVT model. |
v23.05.16 | HVT Scoring Cells with Layers using scoreLayeredHVT | Contains explanations of the functions used for scoring cells with layers based on a sequence of maps using scoreLayeredHVT. |
v23.10.26 | Temporal Analysis and Visualization: Leveraging Time Series Capabilities in HVT | Contains implementations of the functions used for analyzing time series data and creating its state transition flow maps. |
v24.05.16 | Visualizing LLM Embeddings using HVT | Contains implementation and analysis of hierarchical clustering using functions to evaluate and visualize token embeddings generated by OpenAI in 2D Space. |
v24.08.14 | Implementation of t-SNE and UMAP in trainHVT function | Contains enhancements to the trainHVT function with
advanced dimensionality reduction techniques such as t-SNE and UMAP, and
includes a table of evaluation metrics to improve interpretability. |
v25.03.01 | Dynamic Forecasting of Macroeconomic Time Series Dataset using HVT | Contains enhancements to the HVT package for dynamic forecasting using Monte Carlo Simulations of Markov Chain (MSM) on macroeconomic time series dataset. |
v25.08.25 | Hyperparameter Experimentation for Champion Model Selection in MSM Dynamic Forecasting | Contains enhancements to enable strategic selection of the champion model based on the lowest Mean Absolute Error by hyperparameter tuning in msm - dynamic forecasting. |
14th October, 2025
In this version of the HVT package, the following new feature and vignette have been introduced:
Feature
msm
: This update introduces a new function called
HVTMSMoptimization
that runs grid search experiments across
different hyperparameters (number of cells, clusters(k), nearest
neighbors(nn)) by training and scoring HVT models, running MSM
simulations for each combination. Returns the tabulated results and
plotly object visualizations that highlight the champion model (i.e.,
the combination with lowest MAE).Vignette
HVTMSMoptimization
,
covering the complete workflow from initial dataset handling, selection
for train & test, executing hyperparameter tuning and identifying
the champion model, implementing the champion model, and comparing MAE
results.The issue with time-series animation plots from previous release has now been resolved with the latest gganimate update.
04th July, 2025
Dropping the time-series animation plots from the package since the latest version of gganimate doesn’t support them — a patched release will follow once the issue is resolved.
04th June, 2025
In this version of the HVT package, the following new features and vignette have been introduced:
Features
Dynamic Forecasting of a Time Series Dataset:
This update introduces a new function called msm
Monte
Carlo Simulations of Markov Chain for dynamic forecasting of states in
time series dataset. It supports both ex-post and ex-ante forecasting,
offering valuable insights into future trends while resolving state
transition challenges through clustering and nearest-neighbor methods to
enhance simulation accuracy.
Z score Plots: This update introduces a new
function called plotZscore
that generates Z-score plots
corresponding to the HVT cells for the given data, offering a visual
representation of data distribution and highlighting potential
outliers.
Vignette
4th September, 2024
In this version of the HVT package, the following new features and vignettes have been introduced:
Features
Implementation of t-SNE and UMAP in
trainHVT
: This update incorporates dimensionality
reduction methods like t-SNE and UMAP in the trainHVT
function, complementing the existing Sammon’s projection. It also
enables the visualization of these techniques across all hierarchical
levels within the HVT framework.
Implementation of dimensionality reduction evaluation
metrics: This update introduces highly effective dimensionality
reduction evaluation metrics as part of the output list of the
trainHVT
function. These metrics are organized into two
levels: Level 1 (L1) and Level 2 (L2). The L1 metrics address key areas
of dimensionality reduction which are mentioned below, by ensuring
comprehensive evaluation and performance.
clustHVT
function: In
this update, we introduced a new function called clustHVT
specifically designed for Hierarchical clustering analysis. The function
performs clustering of cells exclusively when the hierarchy level is set
to 1, determining the optimal number of clusters by evaluating various
indices. Based on user input, it conducts hierarchical clustering using
AGNES with the default ward.D2 method. The output includes a dendrogram
and an interactive 2D clustered HVT map that reveals cell context upon
hovering. This function is not applicable when the hierarchy level is
greater than 1.Vignettes
Implementation of t-SNE and UMAP in trainHVT
function: This vignette showcases the integration of t-SNE and
UMAP in the trainHVT
function, offering a comprehensive
guide on how to apply and visualize these dimensionality reduction
techniques. It also covers the dimensionality reduction evaluation
metrics and provides insights into their interpretation.
Visualizing LLM Embeddings using HVT (Hierarchical
Voronoi Tessellation): This vignette will outline the process
of analyzing OpenAI-generated token embeddings using the HVT package,
covering data compression, visualization, and hierarchical clustering,
as well as comparing domain name assignments for clusters. It examines
HVT’s effectiveness in preserving contextual relationships between
embeddings. Additionally, it provides a brief overview of the newly
added clustHVT
function and its parameters.
2nd May, 2024
In this version of HVT package, the following new features have been introduced:
HVT
to trainHVT
predictHVT
to scoreHVT
predictLayerHVT
to scoreLayeredHVT
trainHVT
function now resides within the
Training_or_Compression
section.plotHVT
function now resides within the
Tessellation_and_Heatmap
section.scoreHVT
function now resides within the
Scoring
section.Enhancements: The pre-existed functions,
hvtHmap
and exploded_hmap
, have been combined
and incorporated into the plotHVT
function. Additionally,
plotHVT
now includes the ability to perform 1D
plotting.
Temporal Analysis
Below are the new functions and its brief descriptions:
plotStateTransition
: Provides the time series flowmap
plot.getTransitionProbability
: Provides a list of transition
probabilities.reconcileTransitionProbability
: Provides plots and
tables for comparing transition probabilities calculated manually and
from markovchain function.plotAnimatedFlowmap
: Creates flowmaps and animations
for both self state and without self state scenarios.17th November, 2023
This version of HVT package offers functionality to score cells with
layers based on a sequence of maps created using
scoreLayeredHVT
. Given below are the steps to created the
successive set of maps.
Map A - The output of trainHVT
function which is trained on parent data.
Map B - The output of trainHVT
function which is trained on the ‘data with novelty’ created from
removeNovelty
function.
Map C - The output of trainHVT
function which is trained on the ‘data without novelty’ created from
removeNovelty
function.
The scoreLayeredHVT
function uses these three maps to
score the test datapoints.
Let us try to understand the steps with the help of the diagram below
Figure 2: Data Segregation for scoring based on a sequence of maps using scoreLayeredHVT()
06th December, 2022
This version of HVT package offers features for both training an HVT model and eliminating outlier cells from the trained model.
Training or Compression: The initial step
entails training the parent data using the trainHVT
function, specifying the desired compression percentage and quantization
error.
Remove novelty cells: Following the training
process, outlier cells can be identified manually from the 2D hvt plot.
These outlier cells can then be inputted into the
removeNovelty
function, which subsequently produces two
datasets in its output: one containing ‘data with novelty’ and the other
containing ‘data without novelty’.
CRAN Installation
install.packages("HVT")
Git Hub Installation
library(devtools)
devtools::install_github(repo = "Mu-Sigma/HVT")