% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/hcsvd.R
\name{hcsvd}
\alias{hcsvd}
\title{Hierarchical Clustering Using Singular Vectors (HC-SVD).}
\usage{
hcsvd(S, linkage = "average", q = 1, h.power = 2, max.iter, verbose = TRUE)
}
\arguments{
\item{S}{A scaled \eqn{p}x\eqn{p} similarity matrix. For example, this may be a correlation matrix.}

\item{linkage}{The linkage function to be used. This should be one of \code{"average"}, \code{"single"}, or
\code{"RV"} (for RV-coefficient). Note that the RV-coefficient might not yield an ultrametric distance.}

\item{q}{Number of sparse eigenvectors to be used. This should be either a numeric value between zero and one to indicate percentages, or \code{"Kaiser"} for as many sparse eigenvectors as
there are eigenvalues larger or equal to one. For a numerical value between zero and one, the number of sparse eigenvectors is determined as the corresponding share of the total number of eigenvectors.
E.g., \code{q = 1} (100\%) uses all sparse eigenvectors and \code{q = 0.5} (50\%) will use half of all sparse eigenvectors. For \code{q = 1}, identification is best (see Bauer (202Xa) for details).}

\item{h.power}{\code{h}-th Hadamard power of \code{S}. This should be a positive integer and increases robustness of the method, as described in Bauer (202Xa).}

\item{max.iter}{How many iterations should be performed for computing the sparse eigenvectors.
Default is \code{500}.}

\item{verbose}{Print out progress as \eqn{p-1} iterations for divisive hierarchical clustering are performed.
Default is \code{TRUE}.}
}
\value{
A list with four components:
\item{hclust}{
The clustering structure identified by HC-SVD as an object of type \code{hclust}.
}
\item{dist.matrix}{
The ultrametric distance matrix (cophenetic matrix) of the HC-SVD structure as an object of class \code{dist}.
}
\item{u.sim}{
The ultrametric similarity matrix of \eqn{S} obtained by HC-SVD as an object of class \code{matrix}. The ultrametric similarity matrix
is calculated as \code{1-dist.matrix}.
}
\item{q.p}{
A vector of length \eqn{p-1} containing the ratio \eqn{q_i/p_i} of the \eqn{q_i} sparse eigenvectors used relative to all sparse
eigenvectors \eqn{q_i} for the split of each cluster. The ratio is set to \code{NA} if the cluster contains only two variables as the search
for sparse eigenvectors that reflect this obvious split is not required in this case.
}
}
\description{
Performs HC-SVD to reveal the hierarchical structure as described in Bauer (202Xa). This divisive approach iteratively splits each cluster into two subclusters.
Candidate splits are determined by the first sparse eigenvectors (sparse approximations of the first eigenvectors, i.e., vectors with many zero entries) of the similarity matrix.
The selected split is the one that yields the best block-diagonal approximation of the similarity matrix according to a specified linkage function. The procedure continues until each object is assigned to its own cluster.
}
\details{
The sparse loadings are computed using the method proposed by Shen & Huang (2008). The corresponding implementation is written in \code{Rcpp}/\code{RcppArmadillo}
for computational efficiency and is based on the \code{R} implementation by Baglama, Reichel, and Lewis in \code{\link[irlba]{ssvd}} (\pkg{irlba}).
However, the implementation has been adapted to better align with the scope of the \pkg{bdsvd} package which is the base for the \pkg{blox} package.

Supplementary details are in \code{\link[blox]{hc.beta}} and in Bauer (202Xb).
}
\examples{
#We give one example for variable clustering directly on a correlation matrix,
#and we replicate the USArrest example in Bauer (202Xa) for observation clustering.
#More elaborate code alongside a different example for variable clustering can be
#found in the corresponding supplementary material of that manuscripts.

\donttest{
### VARIABLE CLUSTERING

#Load the correlation matrix Bechtoldt from the psych
#package (see ?Bechtoldt for more information).
if (requireNamespace("psych", quietly = TRUE)) {
  data("Bechtoldt", package = "psych")
}

#Compute HC-SVD (with average linkage).
hcsvd.obj <- hcsvd(Bechtoldt)

#The object of type hclust with corresponding dendrogram can be obtained
#directly from hcsvd(...):
hc.div <- hcsvd.obj$hclust
plot(hc.div, ylab = "")

#The dendrogram can also be obtained from the ultrametric distance matrix:
plot(hclust(hcsvd.obj$dist.matrix), main = "HC-SVD", sub = "", xlab = "")


### OBSERVATION CLUSTERING

#Correct for the known transcription error
data("USArrests")
USArrests["Maryland", "UrbanPop"] <- 76.6

#The distance matrix is scaled (divided by max(D)) to later allow a
#transformation to a matrix S that fulfills the properties of a similarity
#matrix.
D <- as.matrix(dist(USArrests))
D <- D / max(D)
S <- 1 - D

#Compute HC-SVD (with average linkage).
hcsvd.obj <- hcsvd(S)

#The object of type hclust with corresponding dendrogram can be obtained
#directly from hcsvd(...):
hc.div <- hcsvd.obj$hclust
plot(hc.div, ylab = "")

#The dendrogram can also be obtained from the ultrametric distance matrix:
plot(hclust(hcsvd.obj$dist.matrix), main = "HC-SVD", sub = "", xlab = "")
}


}
\references{
\cite{Bauer, J.O. (202Xa). Divisive hierarchical clustering using block diagonal matrix approximations. Working paper.}

\cite{Bauer, J.O. (202Xb). Revelle's beta: The wait is over - we can compute it!. Working paper.}

\cite{Shen, H. and Huang, J.Z. (2008). Sparse principal component analysis via regularized low rank matrix approximation, J. Multivar. Anal. 99, 1015–1034.}
}
\seealso{
\code{\link[bdsvd]{bdsvd}} \{\link[bdsvd]{bdsvd}\}
}
