This vignette provides an introduction to using the
backbone
package to extract the backbone of a
network. A backbone is a sparse and unweighted subgraph of a weighted or
unweighted network that contains only the most ‘important’ or
‘significant’ edges A backbone can be useful when the original network
is too dense, when edge weights are not needed, or when edge weights are
difficult to interpret. For a complete example using empirical data,
please see the Empirical Example of
Backbone Extraction vignette.
The backbone
package can be cited as:
Neal, Z. P. (2025). backbone: An R package to Extract Network Backbones. CRAN. https://doi.org/10.32614/CRAN.package.backbone
For additional resources on the backbone package, please see https://www.rbackbone.net/.
If you have questions about the backbone package or would like a backbone hex sticker, please contact the maintainer Zachary Neal by email (zpneal@msu.edu). Please report bugs in the backbone package at https://github.com/zpneal/backbone/issues.
After the backbone package has been installed from CRAN using
install.packages("backbone")
, it can be loaded in the usual
way:
library(backbone)
#> ____ backbone v3.0.0
#> | _ \ Cite: Neal, Z. P., (2025). backbone: An R package to extract network
#> |#|_) | backbones. CRAN. https://doi.org/10.32614/CRAN.package.backbone
#> |# _ <
#> |#|_) | Help: type vignette("backbone"); email zpneal@msu.edu; github zpneal/backbone
#> |____/ Beta: type devtools::install_github("zpneal/backbone", ref = "devel")
Upon successful loading, a startup message will display that shows the version number, citation, ways to get help, and ways to contact me.
This installs and loads the current release of the package. However,
the package is constantly being updated with new features and bug fixes.
You can install the development version of the package using
devtools::install_github("zpneal/backbone", ref = "devel", build_vignettes = TRUE)
.
Although the development version has been tested, some new features may
not yet work as intended.
For ease of illustration, this vignette also makes use of functions
in the igraph
package, which can be installed from CRAN
using install.packages("igraph")
and loaded:
The primary use of the backbone package is backbone extraction, that is, identifying and preserving only the most important edges in a network. Different types of networks require different backbone extraction methods, however the basic workflow using the backbone package is the same for all methods:
You begin with some network data dat
, which may take
the form of an an adjacency/incidence matrix or Matrix, or an igraph
object.
Depending on the type of network contained in dat
(e.g., weighted, bipartite, unweighted), you run the applicable backbone
function (backbone_from_projection()
,
backbone_from_weighted()
, or
backbone_from_unweighted()
), which returns a backbone of
the same class as dat
.
Each of these functions includes a narrative
parameter. Setting narrative = TRUE
displays narrative text
describing the backbone extraction with the relevant citations. This
text is suitable for use in manuscripts, and ensures complete and
consistent reporting of the backbone extraction.
The sections below describe and illustrate the backbone functions applicable to different types of networks, including weighted networks, weighted projections, and unweighted networks. Additionally, the package contains some utility functions that are used by the backbone methods, but may also have independent applications.
New in Version 3.0.0 – In older versions, each
backbone extraction model required a different function. For example,
you’d use disparity()
to extract a backbone using the
disparity filter, and mlf()
to extract it using the
marginal likelihood filter. Now all models for a specific type of
network are accessible in a single function. For example, you can use
backbone_from_weighted(model = "disparity")
to use the
disparity filter, and backbone_from_weighted(model = "mlf")
to use the marginal likelihood filter.
The functions in the backbone package support three data formats that can be used to represent a network or graph in R:
matrix or Matrix - A bipartite
network can be represented as an incidence matrix, and a unipartite
network can be represented as an adjacency matrix. An
incidence/adjacency matrix can be supplied to a backbone extraction
function as a matrix object or as a Matrix object
created by the Matrix
package.
igraph - A network can be supplied to a backbone
extraction function as an igraph object from the
igraph
package.
The backbone package does not support edgelist formats
because these can be ambiguous when isolated nodes are present. It also
does not support network objects from the network
package (from the statnet
family of packages) because these
require loading many dependencies. You can convert a
network
bipartite graph into an incidence matrix using
network::as.matrix.network(graph, type = "incidence")
, and
can convert a network
unipartite graph into an adjacency
matrix using
network::as.matrix.network(graph, type = "adjacency")
.
A weighted unipartite network contains one type of node, and pairs of nodes are connected by edges that have weights. These weights might indicate the intensity or frequency of the relationship between them. A weighted unipartite network can be represented by an adjacency matrix W, where Wij indicates the weight of the edge between node i and node j. Weighted networks can be undirected (i.e., Wij = Wji) or directed (i.e., Wij != Wji). The goal of backbone extraction from a weighted network is to identify edges in W that are strong and should be preserved in the backbone.
In the subsections below, we illustrate backbone extraction methods
in an example weighted network W
:
W <- matrix(c(0,10,10,10,10,75,0,0,0,0, #Adjacency matrix of example network
10,0,1,1,1,0,0,0,0,0,
10,1,0,1,1,0,0,0,0,0,
10,1,1,0,1,0,0,0,0,0,
10,1,1,1,0,0,0,0,0,0,
75,0,0,0,0,0,100,100,100,100,
0,0,0,0,0,100,0,10,10,10,
0,0,0,0,0,100,10,0,10,10,
0,0,0,0,0,100,10,10,0,10,
0,0,0,0,0,100,10,10,10,0),10)
W <- graph_from_adjacency_matrix(W, mode = "undirected", weighted = TRUE, diag = FALSE) #Convert to an igraph object
This network has strongly heterogeneous edge weights. Some edges are quite weak (10) while others are quite strong (100), which gives the network a multi-scalar character. This network might represent the number of passengers flying between each of 10 airports, where there are two hub airports (one large, one small) that each serve their local areas.
Although it is possible to visualize this network by using thicker lines to represent stronger edges, in larger networks this often leads to cluttered visualizations. Additionally, many network analytic techniques (e.g., ERGM, SAOM) have not yet been extended for application to weighted. In such cases, it can be useful to focus on this network’s backbone.
Structural models choose edges for retention based on edges’ structural properties, mainly the values of their edge weights. The backbone package implements one structural model, the Global Threshold, which simply preserves all edges whose weights are larger than a given threshold value.
We could extract a global threshold backbone using a threshold of 0
using the backbone_from_weighted()
function and specifying
model = "global"
and parameter = 0
. This
preserves all the edges in the backbone and yields a simpler unweighted
network, but one that fails to preserve the network’s true hub-and-spoke
structure.
We could choose the threshold value based on the edge weights. For example, we could use the average edge weight as the threshold to extract a backbone that preserves the stronger-than-average edges. However, this yields a network that is focused only on high-degree nodes (i.e., the larger hub airport), while ignoring the low-degree nodes (i.e., the smaller hub airport):
bb <- backbone_from_weighted(W, model = "global", parameter = mean(E(W)$weight))
plot(bb, vertex.label = NA)
The choice of a global threshold is often arbitary, but can significantly impact the structure of the backbone. Moreover, backbones extracted using a global threshold fail to preserve network structure when edge weights are heterogeneous. For this reason, global threshold backbones should generally be avoided.
Statistical models choose edges to retain in the backbone by testing
whether their weights are statistically significantly stronger than
expected under a null model, with the stringency of the statistical test
controlled by the alpha
parameter. Among the most widely
used statistical backbone model is the disparity filter (Serrano, Boguná,
and Vespignani 2009).
We can extract a disparity filter backbone at a conventional level of
statistical significance using the backbone_from_weighted()
function and specifying model = "disparity"
and
alpha = 0.05
. Because this model evaluates the strength of
edges from the local perspective of each node, it preserves the
network’s true hub-and-spoke structure.
Two additional statistical models are also available:
model = "lans"
(Foti, Hughes, and Rockmore 2011)model = "mlf"
(Dianati
2016)Despite differences in their null models, they operate similarly to the disparity filter. Currently there is limited guidance on model selection, however the disparity filter is the most widely known and thus is often the default choice unless there is a specific reason to choose a different model.
The backbone models available using
backbone_from_weighted()
can be applied to any weighted
network. However, specialized models exist for weighted networks that
are the result of projecting a bipartite network or hypergraph.
A bipartite network contains two types of nodes, which we call agents and artifacts, and edges that exist only between nodes of different types (i.e., an edge may connect an agent to an artifact, but never an agent to another agent). For example, a bipartite network may represent authors (agents) connected to the papers (artifacts) they they wrote. A hypergraph contains nodes that we call agents. Unlike an ordinary network, edges in a hypergraph (i.e., hyperedges) can connect more than two agents simultaneously. For example, a single edge in a hypergraph (i.e., a hyperedge) can connect all the authors (agents) who wrote a given paper. Both bipartite networks and hypergraphs can be represented by an incidence matrix B, where Bik = 1 if agent i is connected to artifact k or if agent i is connected to hyperedge k, and otherwise is 0 (e.g., author i wrote paper k).
A bipartite or hypergraph projection is a network of agents that are connected by the fact that they share artifacts or hyperedges in common. A projection can be represented by a square symmetric matrix P, which is formed as P = B * B’, where B’ is the transpose of B. In P, Pij indicates the number of artifacts or hyperedges k that are shared by agents i and j, and can be viewed as the weight of the edge connecting them in the network (e.g., the number of papers co-authored by authors i and j). However, these weights can be noisy. The goal of backbone extraction from weighted projections is to identify edges in P whose weights are strong and should be preserved in the backbone.
In the subsections below, we illustrate backbone extraction methods
in an example bipartite network B
:
B <- rbind(cbind(matrix(rbinom(250,1,.8),10), #Incidence matrix of example bipartite network
matrix(rbinom(250,1,.2),10),
matrix(rbinom(250,1,.2),10)),
cbind(matrix(rbinom(250,1,.2),10),
matrix(rbinom(250,1,.8),10),
matrix(rbinom(250,1,.2),10)),
cbind(matrix(rbinom(250,1,.2),10),
matrix(rbinom(250,1,.2),10),
matrix(rbinom(250,1,.8),10)))
B <- graph_from_biadjacency_matrix(B) #Convert to an igraph object
plot(B, layout = layout_as_bipartite(B), vertex.label = NA, vertex.size = 1)
This network contains 30 agents and 75 artifacts, and might represent 30 authors’ participation in 75 papers across three overlapping disciplines.
We can construct this network’s weighted projection P
and retain the shared number of artifacts between agents as edge
weights. However, this yields a very dense network with no obvious
structure:
P <- bipartite_projection(B, which = "true")
plot(P, vertex.label = NA, edge.width = sqrt(E(P)$weight), edge.color = rgb(0,0,0,.1))
All of the backbone models implemented in the backbone package for
weighted projections are statistical models, which choose edges to
retain in the backbone by testing whether their weights are
statistically significantly stronger than expected under a null model,
with the stringency of the statistical test controlled by the
alpha
parameter. Among the fastest and most widely used
backbone models is the Stochastic Degree Sequency Model (SDSM) (Neal, Domagalski, and
Sagan 2021).
We can extract an SDSM backbone at a conventional level of
statistical significance using the
backbone_from_projection()
function and specifying
model = "sdsm"
and alpha = 0.05
. Importantly,
weighted projection backbones are based on the original bipartite
network or hypergraph, not on the weighted projection. Therefore, we
must supply B
and not P
to the
backbone_from_projection()
function. Because this model
uses information from the original network to identify strong edges, it
preserves the network’s true three-community structure.
Four additional statistical models are also available:
model = "fdsm"
(Zweig and
Kaufmann 2011)model = "fixedrow"
(Neal, Domagalski, and
Sagan 2021)model = "fixedcol"
(Neal, Domagalski, and
Sagan 2021)model = "fixedfill"
(Neal, Domagalski, and
Sagan 2021)Each of these models uses different null models that constrain
different features of B
‘s incidence matrix when evaluating
edges’ strength. The SDSM and FDSM are often most useful, but have
trade-offs. The FDSM offers more statistical power, but requires
computationally expensive monte carlo methods to estimate edges
p-values. In contrast, the SDSM is less powerful (although this can be
compensated by using a more liberal alpha
), but can quickly
compute edges’ exact p-values.
An unweighted unipartite network contains one type of node, and pairs of nodes are either connected or not. An unweighted unipartite network can be represented by an adjacency matrix U where Uij = 1 if nodes i and j are connected, and otherwise is 0. Unweighted networks can be undirected (i.e., Uij = Uji) or directed (i.e., Uij != Uji), but currently the models implemented in the backbone package only allow undirected networks.
The goal of backbone extraction from an unweighted network is to identify edges in U that are important to the network’s structure, and thus should be preserved in the backbone. Because the network is already unweighted, and the goal is simply to reduce the number of edges, this process is sometimes also called sparsification. The challenge is that, unlike weighted networks and weighted projections, the edges in unweighted networks do not have weights that can be used to evaluate their importance. To overcome this challenge, backbone models designed for unweighted networks first assign each edge a score that is intended to capture its importance.
All of the backbone models implemented in the backbone package for unweighted networks are structural models. Different models are designed to preserve or uncover different structural features of the network, but all are characterized by four steps:
Some models, such as the Local Sparsification (L-Spar) model (Satuluri,
Parthasarathy, and Ruan 2011), are designed to uncover hidden
clustering. This model involves assigning each edge a score using the
Jaccard coefficient, normalizing these scores by ranking them from the
perspective of each node, and filtering them based on the degree of each
node. To illustrate, we can generate a dense unweighted network
U
that contains three hidden communities, then extract its
backbone using the backbone_from_unweighted()
function and
specifying model = "lspar"
and
parameter = 0.5
. The backbone reveals the network’s known
three communities.
par(mfrow = c(1, 2), mar = c(0,0,2,0)) #Display plots side-by-side
U <- sample_sbm(60, matrix(c(.75,.25,.25,.25,.75,.25,.25,.25,.75),3,3), c(20,20,20)) #Example unweighted network
plot(U, main = "Original Network", vertex.label = NA)
bb <- backbone_from_unweighted(U, model = "lspar", parameter = 0.5)
plot(bb, main = "L-Spar Backbone", vertex.label = NA) #Communities are clearly visible
Other models, such as the Local Degree model (Hamann et al.
2016), are designed to uncover hubs. This model involves
assigning each edge a score using the degree of an endpoint node,
normalizing these scores by ranking them from the perspective of each
node, and filtering them based on the degree of each node. To
illustrate, we can generate a dense network U
that contains
some hubs, then extract its backbone using the
backbone_from_unweighted()
function and specifying
model = "degree"
and parameter = 0.25
. The
backbone preserves the highest-degree nodes as hubs.
par(mfrow = c(1, 2), mar = c(0,0,2,0)) #Display plots side-by-side
U <- sample_pa(n = 60, m = 3, directed = FALSE) #Example unweighted network
V(U)$name <- ""
V(U)$name[which(degree(U)>=sort(degree(U), decreasing = TRUE)[5])] <- LETTERS[1:5] #Label five highest-degree nodes
plot(U, vertex.size = degree(U), vertex.label.family = "sans", vertex.label.font = 2, main = "Original Network")
bb <- backbone_from_unweighted(U, model = "degree", parameter = 0.25)
plot(bb, vertex.size = degree(bb), vertex.label.family = "sans", vertex.label.font = 2, main = "Local Degree Backbone")
Eight additional structural models are also available:
model = "skeleton"
(Karger
1994)model = "gspar"
(Satuluri,
Parthasarathy, and Ruan 2011)model = "simmelian"
(Nick et al.
2013)model = "jaccard"
(Goldberg and Roth
2003)model = "meetmin"
(Goldberg and Roth
2003)model = "geometric"
(Goldberg and Roth
2003)model = "hyper"
(Goldberg and Roth
2003)model = "quadrilateral"
(Nocaj, Ortmann, and Brandes
2015).Each of these models involve different ways of scoring, normalizing, and filtering edges. Currently there is limited guidance on model selection, however Hamann et al. (2016) compared some of these models to understand how they yield different backbone.
By default, backbone_from_
functions return a backbone
in the same class (matrix, Matrix, or igraph) as the original network.
However, to facilitate transparency and reproducibility, specifying
return = "everything"
returns a list that contains:
backbone_from_projection()
)For example:
B <- matrix(sample(c(0,1), 18, replace = TRUE), 3, 6) #A simple incidence matrix
bb <- backbone_from_projection(B, model = "sdsm", alpha = 0.05, return = "everything") #Extract backbone, return everything
bb$bipartite #The original bipartite network
#> [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,] 0 0 0 1 1 0
#> [2,] 1 1 0 0 1 0
#> [3,] 1 1 0 0 1 0
bb$projection #The weighted projection
#> [,1] [,2] [,3]
#> [1,] 2 1 1
#> [2,] 1 3 3
#> [3,] 1 3 3
bb$pvalues #The edgewise p-values
#> $upper
#> [,1] [,2] [,3]
#> [1,] NA 0.9753388 0.9753388
#> [2,] 0.9753388 NA 0.4596459
#> [3,] 0.9753388 0.4596459 NA
bb$backbone #The backbone (as a matrix)
#> [,1] [,2] [,3]
#> [1,] 0 0 0
#> [2,] 0 0 0
#> [3,] 0 0 0
bb$narrative #Narrative description
#> [1] "We used the backbone package for R (v3.0.0; Neal, 2025) to extract the unweighted backbone of the weighted projection of a bipartite network containing 3 agents and 6 artifacts. An edge was retained in the backbone if its weight was statistically significant (alpha = 0.05) using the stochastic degree sequence model (SDSM; Neal, Domagalski, and Sagan, 2021), which reduced the number of edges by 100%.\n\nNeal, Z. P. 2025. backbone: An R Package to Extract Network Backbones. CRAN. https://doi.org/10.32614/CRAN.package.backbone\n\nNeal, Z. P., Domagalski, R., and Sagan, B. (2021). Comparing Alternatives to the Fixed Degree Sequence Model for Extracting the Backbone of Bipartite Projections. Scientific Reports, 11, 23929. https://doi.org/10.1038/s41598-021-03238-3"
bb$call #Function call
#> backbone_from_projection(B = B, alpha = 0.05, model = "sdsm",
#> return = "everything")
By default, statistical backbone models return a backbone that
contains edges whose weights were statistically significantly
stronger than expected under the corresponding null model using
a one-tailed test. However, it is also possible to return a signed
backbone that contains positive edges whose weights were statistically
significantly stronger than expected, and negative edges whose
weights were statistically significantly weaker than expected,
using a two-tailed test. Signed backbones are obtained by setting
signed = TRUE
.
Returning to the example bipartite
backbone extraction, we can compare the default non-signed backbone
to a signed backbone. Signed backbone extraction relies on a two-tailed
test for each edge weight, so to make the two backbones comparable in
stringency, we double the alpha
parameter
(alpha = 0.1
) when extracting the signed backbone. The
example network is known to contain three communities, which are clearly
visible in the non-signed backbone. In the signed backbone, these
communities are connected by positive (green) edges, while negative
(red) edges exist between communities. If the original bipartite network
represents authors connected to papers they have written, then the
signed backbone may represent relations of both collaboration (positive)
and avoidance (negative).
par(mfrow = c(1, 2), mar = c(0,0,2,0)) #Display plots side-by-side
B <- rbind(cbind(matrix(rbinom(250,1,.8),10), #Incidence matrix of example bipartite network
matrix(rbinom(250,1,.2),10),
matrix(rbinom(250,1,.2),10)),
cbind(matrix(rbinom(250,1,.2),10),
matrix(rbinom(250,1,.8),10),
matrix(rbinom(250,1,.2),10)),
cbind(matrix(rbinom(250,1,.2),10),
matrix(rbinom(250,1,.2),10),
matrix(rbinom(250,1,.8),10)))
B <- graph_from_biadjacency_matrix(B) #Convert to an igraph object
bb1 <- backbone_from_projection(B, model = "sdsm", alpha = 0.05) #Extract non-signed backbone
layout <- layout_nicely(bb1)
plot(bb1, vertex.labels = NA, layout = layout, main = "Non-signed")
bb2 <- backbone_from_projection(B, model = "sdsm", alpha = 0.1, signed = TRUE) #Extract signed backbone
E(bb2)$color <- "green"
E(bb2)$color[which(E(bb2)$sign==-1)] <- "red"
plot(bb2, vertex.labels = NA, layout = layout, main = "Signed")
The model
argument in the
backbone_from_unweighted()
function allows the extraction
of backbones from unweighted networks using 10 different backbone models
that have previously been described in the literature. However, it is
also possible to specify custom backbone models with
model = "custom"
and by using different combinations of
escore
, normalize
, filter
, and
umst
. For example, consider a backbone defined by the
fewest, strongest (weighted by Jaccard coefficient) edges necessary to
ensure a connected network. Although such a backbone has not been
described in the literature (and may not be meaningful), it can be
extracted using backbone_from_unweighted()
with
model = "custom"
:
par(mfrow = c(1, 2), mar = c(0,0,2,0)) #Display plots side-by-side
U <- sample_gnp(100, .25)
plot(U, vertex.label = NA, main = "Original Network")
bb <- backbone_from_unweighted(U, model = "custom", escore = "jaccard", normalize = "none", filter = "proportion", parameter = 0, umst = TRUE)
plot(bb, vertex.label = NA, main = "Custom Backbone")
The model = "custom"
argument specifies that a custom
backbone model specification should be used. The
escore = "jaccard"
argument specifies that each unweighted
edge should receive a score equal to the Jaccard coefficient of its two
endpoints’ neighborhoods, and the normalize = "none"
argument specifies that these scores should not be normalized. The
filter = "proportion"
and parameter = 0
arguments specify that 0% of these edges should be retained in the
backbone, while the umst = TRUE
argument specifies that the
union of maximum spanning trees should be included in the backbone.
Extracting a backbone from a bipartite projection using the fixed
degree sequence model with
backbone_from_projection(model = "fdsm")
requires randomly
sampling matrices from the space of all binary matrices with given row
and column sums, which is performed using the fastball algorithm (Godard and Neal
2022). Given a matrix, fastball()
returns a new
matrix randomly sampled from the space of all binary matrices with the
same row and column sums:
Extracting a backbone from a bipartite projection using the
stochastic degree sequence model with
backbone_from_projection(model = "sdsm")
requires
determining the probability that Bij = 1 in the space of all
matrices with given row and column sums. For example, if the row sums of
B are [1,1,2] and the column sums of B
are [1,1,2], then in the space of all matrices with these row and column
sums, the probability that Bij = 1 is:
.2 | .2 | .6 |
.2 | .2 | .6 |
.6 | .6 | .8 |
These probabilities are usually unknown and must be estimated. The
fastest and most accurate way to estimate them (Neal, Domagalski, and
Sagan 2021) is using the Bipartite Configuration Model (Saracco et al.
2015). Given a matrix mat
with specific row and
column sums, bicm()
estimates these probabilities: