Single cell analysis is a powerful method that allows for the deconvolution of the effect of treatments on complex populations containing different cell types, that may or may not respond to specific treatments. Depending on the technology used, the analytes can be genes, transcripts, proteins or metabolites. Using mass cytometry, bodenmiller et al measured the level of 9 proteins and 14 post-translational modifications. After using signal intensity from the 9 proteins (so called phenotypic markers) to define 14 sub-populations, they monitored the effect of several treatments using the 14 post-translational modifications.
Modeling and visualization of these type of data is challenging: the large number of events measured combined to the complexity of each samples is making the modeling complex, while the high dimensionality of the data precludes the use of standard visualizations.
The goal of this package is to enable the development of new methods by providing a curated set of data for testing and benchmarking.
For details on data acquisition please refer to Bodenmiller et al Nat Biotech 2012. Briefly, after treatment cells where profiled using a CyTOF, dead cells and debris were excluded and live cells were assigned to 1 of the 14 sub-populations using signal intensity from 9 phenotypic markers.
Cells from samples samples corresponding to untreated cells,
stimulated with BCR/FcR-XL, PMA/Ionomycin or vanadate or unstimulated,
were extracted and data was transformed using the arcsinh
function with a cofactor of 5.
Due to recent changes in Cytobank the original data is not available anymore, so this vignette relies on the bodenmiller package for access to data.
We begin by assembling a full dataset from the
bodenmiller package, before filtering down to T cells:
data(refPhenoMat)
data(refFuncMat)
data(refAnnots)
ref.df <- data.frame(refAnnots,
                     refPhenoMat,
                     refFuncMat)
data(untreatedPhenoMat)
data(untreatedFuncMat)
data(untreatedAnnots)
untreated.df <- bind_rows(ref.df %>% 
                            mutate(Treatment='unstimulated',
                                   Source=as.character(Source),
                                   Cells=as.character(Cells)),
                          data.frame(untreatedAnnots,
                                     untreatedPhenoMat,
                                     untreatedFuncMat) %>% 
                            mutate(Treatment=as.character(Treatment),
                                   Source=as.character(Source),
                                   Cells=as.character(Cells))) %>% 
  mutate(Treatment=factor(Treatment),
         Treatment=relevel(Treatment,'unstimulated'),
         Cells=factor(Cells))
btcells.df <- untreated.df %>% 
  filter(Cells %in% c('cd8+','igm+')) %>% 
  mutate(Cells=droplevels(Cells)) %>% 
  group_by(Cells,Treatment) %>% 
  mutate(cellID=seq(length(Cells))) %>% 
  unite('cellID',one_of(c('Treatment','Cells','cellID')),sep = '_',remove = FALSE)We end up with 19807 cells to analyse, broken down by stimulation condition as follows:
left_join(btcells.df %>% 
  count(Cells,Treatment),
  untreated.df %>% 
    count(Treatment,name = 'Total'),
  by=c('Treatment')) %>% 
  mutate(Fraction=round(100*n/Total,1))## # A tibble: 8 × 5
## # Groups:   Cells, Treatment [8]
##   Cells Treatment         n Total Fraction
##   <fct> <fct>         <int> <int>    <dbl>
## 1 cd8+  unstimulated   5068 15792     32.1
## 2 cd8+  BCR/FcR-XL     5098 15252     33.4
## 3 cd8+  PMA/Ionomycin  5122 14576     35.1
## 4 cd8+  vanadate       1200  6316     19  
## 5 igm+  unstimulated   1069 15792      6.8
## 6 igm+  BCR/FcR-XL      863 15252      5.7
## 7 igm+  PMA/Ionomycin   992 14576      6.8
## 8 igm+  vanadate        395  6316      6.3The Vanadate condition seems to contain less cells, and the fraction of CD4+ T cells is roughly 25% of the unstimulated sample. There is also a drop in the fraction of CD8+ T cells.
To simplify processing we will sample 1000 cells from each condition, with replacement where appropriate:
We used fan plots to visualize the phenotypic and functional profiles of CD8+ T cells and IgM+ B cells. As expected there is no change in phenotype while stimulation is changing the functional profile of each population.
btcells.df %>% 
  gather('Channel','value',
         one_of(colnames(refPhenoMat),colnames(refFuncMat))) %>% 
  filter(Channel %in% colnames(refPhenoMat)) %>% 
  ggplot(aes(x=Channel,y=value))+
  geom_fan()+
  facet_grid(Treatment~Cells)+
  theme_light(base_size = 16)+
  theme(axis.text.x = element_text(angle = 45, hjust = 1))## Warning: The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as
## of ggplot2 3.3.4.
## ℹ The deprecated feature was likely used in the cytofan package.
##   Please report the issue at <https://github.com/yannabraham/cytofan/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.## Warning: The following aesthetics were dropped during statistical transformation: y.
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?
## The following aesthetics were dropped during statistical transformation: y.
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?
## The following aesthetics were dropped during statistical transformation: y.
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?
## The following aesthetics were dropped during statistical transformation: y.
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?
## The following aesthetics were dropped during statistical transformation: y.
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?
## The following aesthetics were dropped during statistical transformation: y.
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?
## The following aesthetics were dropped during statistical transformation: y.
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?
## The following aesthetics were dropped during statistical transformation: y.
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?btcells.df %>% 
  gather('Channel','value',
         one_of(colnames(refPhenoMat),colnames(refFuncMat))) %>% 
  filter(Channel %in% colnames(refFuncMat)) %>% 
  ggplot(aes(x=Channel,y=value))+
  geom_fan()+
  facet_grid(Treatment~Cells)+
  theme_light(base_size = 16)+
  theme(axis.text.x = element_text(angle = 45, hjust = 1))## Warning: The following aesthetics were dropped during statistical transformation: y.
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?
## The following aesthetics were dropped during statistical transformation: y.
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?
## The following aesthetics were dropped during statistical transformation: y.
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?
## The following aesthetics were dropped during statistical transformation: y.
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?
## The following aesthetics were dropped during statistical transformation: y.
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?
## The following aesthetics were dropped during statistical transformation: y.
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?
## The following aesthetics were dropped during statistical transformation: y.
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?
## The following aesthetics were dropped during statistical transformation: y.
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?We select channels for analysis where 20% of cells from any treatment, any cell type, have a recorded intensity of at least 1 unit:
btcells.channels <- btcells.df %>% 
  gather('Channel','value',
         one_of(colnames(refPhenoMat),colnames(refFuncMat))) %>% 
  group_by(Channel,Treatment,Cells) %>% 
  summarize(value=quantile(value,0.8)) %>% 
  group_by(Channel) %>% 
  summarise(value=max(value)) %>% 
  filter(value>1) %>% 
  .$Channel## `summarise()` has grouped output by 'Channel', 'Treatment'. You can override
## using the `.groups` argument.The following 20 were selected out of 23:
We start by projecting cells using the classical Radviz algorithm:
sim.mat <- cosine(as.matrix(btcells.df[,btcells.channels]))
classic.S <- make.S(btcells.channels)
classic.optim <- do.optimRadviz(classic.S,sim.mat)## Selected optimization function: in.da 
## Starting performance: -101.7308 
## 0 Current performance: -104.4382 
## 1 Current performance: -105.3105 
## 2 Current performance: -106.9203 
## 3 Current performance: -107.0964 
## 4 Current performance: -107.1451 
## 5 Current performance: -107.532 
## 6 Current performance: -108.4229 
## 7 Current performance: -108.4341 
## 8 Current performance: -108.6118 
## 9 Current performance: -108.7111 
## 10 Current performance: -108.7934 
## 11 Current performance: -108.8249 
## 12 Current performance: -108.8284 
## 13 Current performance: -108.8284 
## 14 Current performance: -108.8284 
## 15 Current performance: -108.8284 
## 16 Current performance: -108.8284 
## Execution stopped after 16 iterations: no better solution found in the last 5 iterationsNext we visualize the projection, coloring individual cells by their treatment of origin:
Because of the spring paradigm used by Radviz the cells are
concentrated to the center of the plot. Using the
rescalePlot function we can zoom inside the circle :
We see a clear effect of Vanadate, and 2 separate groups of points. We confirm with the next plot that those points actually correspond to B and T cells:
One can facet the display by population to visualize the effects of stimulation:
plot(btcells.rv)+
  geom_point(aes(color=Treatment))+
  facet_wrap(~Cells)+
  theme_radviz(base_size = 16)We can now visualize the effects of Vanadate and Ionomycin stimulations compared to unstimulated, but the contributions of individual channels are unclear and the effects of BCR/FcR-XL on B cells are unclear.
Moreover, the visualization depends on the optimized order of channels, which in turn depends on the relative amount of the different conditions. To address these challenges we implemented the Freeviz algorithm described in the next section.
Freeviz will optimize the order of channels as well as their weights based on predefined classes in the data. In that example we use stimulation and cell type :
btcells.df <- btcells.df %>% 
  unite('Condition',c('Treatment','Cells'),sep='_',remove = FALSE)
treat.S <- do.optimFreeviz(btcells.df[,btcells.channels],
                           classes = btcells.df$Condition)## [1] "# iters: 19"The final projection is shown below, with cells colored by stimulation:
Compared to Radviz, where the contribution of each channel is fixed through its position on the circle, in Freeviz the channels with the largest contribution to the difference between classes are the furthest away from the center of the projection. It is therefore easier to differentiate B cells from T cells, and to identify the channels that are affected the most by stimulation.
plot(btcells.fv)+
  geom_point(aes(color=Treatment))+
  facet_wrap(~Cells)+
  theme_radviz(base_size = 16)Freeviz plots can also be rescaled : if after rescaling the points extend beyond a particular anchor, the exact contribution of this anchor to the projection is lost except for the direction. The function will issue a warning whenever this occurs.
## Warning in rescalePlot(btcells.fv, fraction = 0.5): After rescaling the following anchors will be below the maximum range of points:
## pp38, pStat3, pZap70, pBtk, pErk, pStat1, pStat5, CD33, pSHP2, pLat, pAkt, pNFkBAfter rescaling it is clear that a subset of B and T cells is not responding to the treatment:
In any case, it is possible to filter out the channels that have low influence on the projection:
We compute the cosine distance between cells based on all available channels:
btcells.dist <- as.matrix(btcells.df[,btcells.channels])
rownames(btcells.dist) <- btcells.df$cellID
btcells.dist <- btcells.dist%*% t(btcells.dist)
btcells.dist <- btcells.dist/(sqrt(diag(btcells.dist)) %*% t(sqrt(diag(btcells.dist))))
btcells.dist[btcells.dist>1] <- 1
btcells.dist[btcells.dist<0] <- 0
btcells.dist <- 2*acos(btcells.dist)/pi
diag(btcells.dist) <- NA # avoid self loopsNext we build an adjacency matrix, selecting the 63 nearest neighbors foreach cell:
K <- floor(nrow(btcells.df)^0.5)
btcells.adj <- apply(btcells.dist,1,rank,na.last = TRUE)
btcells.adj[btcells.adj<=K] <- btcells.dist[btcells.adj<=K]
btcells.adj[btcells.adj>K] <- 0For 4000 vertices (cells) 250183 edges are recovered. We can compare the distribution of distances overall with that of selected nearest neighbors:
bind_rows(data.frame(value=btcells.dist[sample(1000)],
                     Type='Overall',
                     stringsAsFactors = FALSE),
          data.frame(value=btcells.adj[btcells.adj!=0][sample(1000)],
                     Type='Nearest Neighbors',
                     stringsAsFactors = FALSE)) %>% 
  mutate(Type=factor(Type),
         Type=relevel(Type,'Overall')) %>% 
  filter(!is.na(value)) %>% 
  ggplot(aes(x=value))+
  geom_histogram(aes(fill=Type),
                 position = 'identity',
                 bins=50,
                 alpha=0.5)+
  theme_light(base_size=16)We enriched for cells that are closer together than the average population.
Next we turn the distances into weights using a gaussian kernel:
Next we build a kNN graph based on weights:
btcells.graph <- graph_from_adjacency_matrix(btcells.weights,
                                            mode='undirected',
                                            weighted = TRUE,
                                            diag = FALSE)## Warning: The `adjmatrix` argument of `graph_from_adjacency_matrix()` must be symmetric
## with mode = "undirected" as of igraph 1.6.0.
## ℹ Use mode = "max" to achieve the original behavior.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.We use the Louvain algorithm to identify groups in the weighted T cells graph:
btcells.groups <- cluster_louvain(btcells.graph)
btcells.df <- btcells.df %>% 
  ungroup() %>% 
  mutate(Group=membership(btcells.groups),
         Group=as.numeric(Group),
         Group=factor(Group))6 communities were identified.
btcells.df %>% 
  gather('Channel','value',
         one_of(colnames(refPhenoMat),colnames(refFuncMat))) %>% 
  filter(Channel %in% btcells.channels) %>% 
  ggplot(aes(x=Channel,y=value))+
  geom_fan()+
  facet_grid(Group~.)+
  theme_light(base_size = 16)+
  theme(axis.text.x = element_text(angle = 45, hjust = 1))## Warning: The following aesthetics were dropped during statistical transformation: y.
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?
## The following aesthetics were dropped during statistical transformation: y.
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?
## The following aesthetics were dropped during statistical transformation: y.
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?
## The following aesthetics were dropped during statistical transformation: y.
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?
## The following aesthetics were dropped during statistical transformation: y.
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?
## The following aesthetics were dropped during statistical transformation: y.
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?Each community captures a specific population and a specific functional state, as illustrated in the following tables:
## # A tibble: 6 × 3
## # Groups:   Group [6]
##   Group `cd8+`   `igm+`
##   <fct>  <dbl>    <dbl>
## 1 1      1     NA      
## 2 2      0.994  0.00641
## 3 3      1     NA      
## 4 4     NA      1      
## 5 5     NA      1      
## 6 6     NA      1And with respect to treatment:
btcells.df %>% 
  count(Group,Treatment) %>% 
  group_by(Group) %>% 
  mutate(n=n/sum(n)) %>% 
  spread(Treatment, n)## # A tibble: 6 × 5
## # Groups:   Group [6]
##   Group unstimulated `BCR/FcR-XL` `PMA/Ionomycin` vanadate
##   <fct>        <dbl>        <dbl>           <dbl>    <dbl>
## 1 1           0.409       0.415           0.114    0.0620 
## 2 2           0.0470      0.0321          0.00427  0.917  
## 3 3           0.0107      0.00804         0.979    0.00268
## 4 4           0.526       0.169           0.196    0.108  
## 5 5           0.0190      0.505           0.476   NA      
## 6 6           0.0469      0.00939        NA        0.944Compared to Freeviz, where anchors are optimized based on classes, in Graphviz the anchors are optimized after the structure of the graph itself, similar to a force-directed layout with the context provided by anchors.
As with Radviz and Freeviz we start with optimizing the anchors:
## [1] "# iters: 20"We can then plot the graph in the context of the selected channels, colored by community :
This plot is to be compared to the classical force directed layout used to visualize weighted graphs:
btcells.graph$layout <- layout_with_drl(btcells.graph)
community.cols <- hue_pal()(length(btcells.groups))
plot(btcells.graph,
     vertex.label=NA,
     vertex.color=community.cols[membership(btcells.groups)])Points can be colored by cell assignment:
Or treatment:
Confirming previous observations made with fan plots. Compared to classical visualization, one can quickly identify channels associated with specific communities.