Contents

# Loading required packages
library(Statial)
library(spicyR)
library(ClassifyR)
library(lisaClust)
library(dplyr)
library(SingleCellExperiment)
library(ggplot2)
library(ggsurvfit)
library(survival)
library(tibble)

theme_set(theme_classic())
nCores <- 1

1 Installation

# Install the package from Bioconductor
if (!requireNamespace("BiocManager", quietly = TRUE)) {
  install.packages("BiocManager")
}
BiocManager::install("Statial")

2 Overview

There are over 37 trillion cells in the human body, each taking up different forms and functions. The behaviour of these cells can be described by canonical characteristics, but their functions can also dynamically change based on their environmental context, leading to cells with diverse states. Understanding changes in cell state and the interplay between cells is key to understanding their mechanisms of action and how they contribute to human disease. Statial is a suite of functions for identifying changes in cell state. This guide will provide a step-by-step overview of some key functions within Statial.

3 Loading example data

In the following we will analyse breast cancer data from Keren et al. 2018. These images are stored in a SingleCellExperiment object. This object contains 57811 cells across 10 images and includes information on cell type and patient survival.

Note: The original dataset was reduced down from 41 images to 10 images for the purposes of this vignette, due to size restrictions.

# Load head and neck data
data("kerenSCE")

kerenSCE
#> class: SingleCellExperiment 
#> dim: 48 57811 
#> metadata(0):
#> assays(1): intensities
#> rownames(48): Na Si ... Ta Au
#> rowData names(0):
#> colnames(57811): 1 2 ... 171281 171282
#> colData names(8): x y ... Survival_days_capped Censored
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):

4 Evaluating cell localisation

Kontextual is a method to evaluate the localisation relationship between two cell types in an image. Kontextual builds on the L-function by contextualising the relationship between two cell types in reference to the typical spatial behaviour of a \(3^{rd}\) cell type/population. By taking this approach, Kontextual is invariant to changes in the window of the image as well as tissue structures which may be present.

The definitions of cell types and cell states are somewhat ambiguous, cell types imply well defined groups of cells that serve different roles from one another, on the other hand cell states imply that cells are a dynamic entity which cannot be discretised, and thus exist in a continuum. For the purposes of using Kontextual we treat cell states as identified clusters of cells, where larger clusters represent a “parent” cell population, and finer sub-clusters representing a “child” cell population. For example a CD4 T cell may be considered a child to a larger parent population of Immune cells. Kontextual thus aims to see how a child population of cells deviate from the spatial behaviour of their parent population, and how that influences the localisation between the child cell state and another cell state.

5 Kontextual

The first step in analysing these images is to organise all the cells present into cell state populations or clusters, e.g. all the different B cell types are put in a vector called bcells.

# Examine all cell types in image
unique(kerenSCE$cellType)
#>  [1] "Keratin_Tumour" "CD3_Cell"       "B"              "CD4_Cell"      
#>  [5] "Dc/Mono"        "Unidentified"   "Macrophages"    "CD8_Cell"      
#>  [9] "other immune"   "Endothelial"    "Mono/Neu"       "Mesenchymal"   
#> [13] "Neutrophils"    "NK"             "Tumour"         "DC"            
#> [17] "Tregs"

# Set up cell populations
tumour <- c("Keratin_Tumour", "Tumour")

bcells <- c("B")
tcells <- c("CD3_Cell", "CD4_Cell", "CD8_Cell", "Tregs")
myeloid <- c("Dc/Mono", "DC", "Mono/Neu", "Macrophages", "Neutrophils")

endothelial <- c("Endothelial")
mesenchymal <- c("Mesenchymal")

tissue <- c(endothelial, mesenchymal)
immune <- c(bcells, tcells, myeloid, "NK", "other immune") # NK = Natural Killer cells

all <- c(tumour, tissue, immune, "Unidentified")

Kontextual accepts a SingleCellExperiment object, or a single image, or list of images from a SingleCellExperiment object, this gets passed into the cells argument. The two cell types which will be evaluated are specified in the to and from arguments. A parent population must also be specified in the parent argument, note the parent cell population must include the to cell type. The argument r will specify the radius which the cell relationship will be evaluated on. Kontextual supports parallel processing, the number of cores can be specified using the cores argument. Kontextual can take a single value or multiple values for each argument and will test all combinations of the arguments specified.

6 Kontextual Curve

Here we examine image 6 from the Keren et al. dataset where the relationship between 2 cell types depends on a parent cell population. In the image below we can see that p53 and Immune are dispersed. However when the behaviour of p53 is placed in the context of the spatial behaviour of its parent population Keratin+Tumour, p53 and Immune now appear localised.

#Select image 6 from the kerenSCE dataset
kerenImage6 = kerenSCE[, kerenSCE$imageID =="6"]

#Select for all cells that express higher than baseline level of p53
p53Pos = assay(kerenImage6)["p53",]  |> 
  as.numeric() > -0.300460
kerenImage6$cellType[p53Pos & kerenImage6$cellType %in% c("Keratin_Tumour")] <- "p53+Tumour"

#Group all immune cells under the name "Immune"
kerenImage6$cellType[kerenImage6$cellType %in% immune] <- "Immune"

kerenImage6 |>
  colData() %>%
  as.data.frame() %>%
  filter(cellType %in% c("Keratin_Tumour", "Immune", "p53+Tumour")) %>%
  arrange(cellType) %>%
  ggplot(aes(x = x, y = y, color = cellType)) +
  geom_point(size = 1) +
  scale_colour_manual(values = c("#505050", "#D6D6D6", "#64BC46"))

The kontextCurve function plots the L-function value and Kontextual values over a range of radii. If the points lie above the red line (expected pattern) then localisation is indicated for that radius, if the points lie below the red line then dispersion is indicated. As seen in the following plot Kontextual is able to correctly identify localisation between p53 and Immune in the example image for a certain range of radii. When the radius gets too large the overall relationship between p53 and Immune looks dispersed. The original L-function is not able to identify localisation at any value of radii.


#Select for all cells that express higher than baseline level of p53
kerenSCE$cellTypeNew <- kerenSCE$cellType
p53Pos = assay(kerenSCE)["p53",]  |> 
  as.numeric() > -0.300460
kerenSCE$cellTypeNew[p53Pos & kerenSCE$cellType %in% c("Keratin_Tumour")] <- "p53+Tumour"

#Group all immune cells under the name "Immune"
kerenSCE$cellTypeNew[kerenSCE$cellType %in% immune] <- "Immune"


curves <- kontextCurve(
  cells = kerenSCE,
  from = "p53+Tumour",
  to = "Immune",
  parent = c("p53+Tumour", "Keratin_Tumour"),
  rs = seq(10, 510, 100),
  image = "6",
  cellType = "cellTypeNew",
  cores = nCores
)

kontextPlot(curves)

We can calculate these relationships across all images for a single radius.

p53_Kontextual <- Kontextual(
  cells = kerenSCE,
  r = 50,
  from = "p53+Tumour",
  to = "Immune",
  parent = c("p53", "Keratin_Tumour"),
  cellType = "cellTypeNew"
)

p53_Kontextual
#>    imageID               test original kontextual  r weightQuantile inhom  edge
#> 1        1 p53+Tumour__Immune       NA         NA 50            0.8  TRUE FALSE
#> 2       14 p53+Tumour__Immune       NA         NA 50            0.8  TRUE FALSE
#> 3       18 p53+Tumour__Immune       NA         NA 50            0.8  TRUE FALSE
#> 4       21 p53+Tumour__Immune       NA         NA 50            0.8  TRUE FALSE
#> 5       29 p53+Tumour__Immune       NA         NA 50            0.8  TRUE FALSE
#> 6        3 p53+Tumour__Immune       NA         NA 50            0.8  TRUE FALSE
#> 7       32 p53+Tumour__Immune       NA         NA 50            0.8  TRUE FALSE
#> 8       35 p53+Tumour__Immune       NA         NA 50            0.8  TRUE FALSE
#> 9        5 p53+Tumour__Immune       NA         NA 50            0.8  TRUE FALSE
#> 10       6 p53+Tumour__Immune       NA         NA 50            0.8  TRUE FALSE
#>    includeZeroCells window window.length
#> 1              TRUE convex            NA
#> 2              TRUE convex            NA
#> 3              TRUE convex            NA
#> 4              TRUE convex            NA
#> 5              TRUE convex            NA
#> 6              TRUE convex            NA
#> 7              TRUE convex            NA
#> 8              TRUE convex            NA
#> 9              TRUE convex            NA
#> 10             TRUE convex            NA

Alternatively all pairwise cell relationships and their corresponding parent in the dataset can be tested. A data frame with all pairwise combinations can be creating using the parentCombinations function. This function takes in a vector of all the cells, as well as all the parent vectors set up earlier. As shown below the output is a data frame specifying the to, from, and parent arguments for Kontextual.

# Get all relationships between cell types and their parents
parentDf <- parentCombinations(
  all = all,
  tumour,
  bcells,
  tcells,
  myeloid,
  endothelial,
  mesenchymal,
  tissue,
  immune
)

Rather than specifying to, from, and parent in Kontextual, the output from parentCombinations can be inputed into Kontextual using the parentDf argument, to examine all pairwise relationships in the dataset. This chunk will take a signficant amount of time to run, for demonstration the results have been saved and are loaded in.

# Running Kontextual on all relationships across all images.
kerenKontextual <- Kontextual(
  cells = kerenSCE,
  parentDf = parentDf,
  r = 50,
  cores = nCores
)
data("kerenKontextual")
head(kerenKontextual, 10)
#>    imageID                      test   original kontextual  r weightQuantile
#> 1        1 B__Keratin_Tumour__bcells -12.992212  14.433756 50            0.8
#> 2       14 B__Keratin_Tumour__bcells         NA         NA 50            0.8
#> 3       18 B__Keratin_Tumour__bcells  -4.475831  -2.032805 50            0.8
#> 4       21 B__Keratin_Tumour__bcells         NA         NA 50            0.8
#> 5       29 B__Keratin_Tumour__bcells         NA         NA 50            0.8
#> 6        3 B__Keratin_Tumour__bcells -22.725228  -1.330939 50            0.8
#> 7       32 B__Keratin_Tumour__bcells -25.427475  -9.672005 50            0.8
#> 8       35 B__Keratin_Tumour__bcells -36.867563 -16.313205 50            0.8
#> 9        5 B__Keratin_Tumour__bcells -26.894815  22.394483 50            0.8
#> 10       6 B__Keratin_Tumour__bcells -19.788448 -10.515649 50            0.8
#>    inhom  edge includeZeroCells window window.length
#> 1   TRUE FALSE             TRUE convex            NA
#> 2   TRUE FALSE             TRUE convex            NA
#> 3   TRUE FALSE             TRUE convex            NA
#> 4   TRUE FALSE             TRUE convex            NA
#> 5   TRUE FALSE             TRUE convex            NA
#> 6   TRUE FALSE             TRUE convex            NA
#> 7   TRUE FALSE             TRUE convex            NA
#> 8   TRUE FALSE             TRUE convex            NA
#> 9   TRUE FALSE             TRUE convex            NA
#> 10  TRUE FALSE             TRUE convex            NA

7 Examining Cell-to-cell interactions with marker expression

In the next section of this vignette, we will utilise marker expression features from the Keren et al. dataset to computationally identify and quantify evidence of cell interactions that catalyse cell state changes. This approach measures how protein markers in a cell change with spatial proximity and abundance to other cell types. The methods utilised here will provide a framework to explore how the dynamic behaviour of cells are altered by the agents they are surrounded by.

The first step in analysing these changes is to calculate the spatial proximity (getDistances) and abundance (getAbundances) of each cell to every cell type. These values will then be stored in the reducedDims slot of the SingleCellExperiment object under the names distances and abundances respectively.


kerenSCE <- getDistances(kerenSCE,
                    maxDist = 200,
                    nCores = 1)

kerenSCE <- getAbundances(kerenSCE,
                     r = 200,
                     nCores = 1)

First, let’s examine the same effect observed earlier with Kontextual - the localisation between p53-positive keratin/tumour cells and macrophages in the context of total keratin/tumour cells for image 6 of the Keren et al. dataset.

Statial provides two main functions to assess this relationship - calcStateChanges and plotStateChanges. We can use calcStateChanges to examine the relationship between 2 cell types for 1 marker in a specific image. In this case, we’re examining the relationship between keratin/tumour cells (from = Keratin_Tumour) and macrophages (to = "Macrophages") for the marker p53 (marker = "p53") in image = "6". We can appreciate that the fdr statistic for this relationship is significant, with a negative tvalue, indicating that the expression of p53 in keratin/tumour cells decreases as distance from macrophages increases.

stateChanges <- calcStateChanges(
  cells = kerenSCE,
  type = "distances",
  image = "6",
  from = "Keratin_Tumour",
  to = "Macrophages",
  marker = "p53",
  nCores = 1)

stateChanges
#>   imageID primaryCellType otherCellType marker         coef      tval
#> 1       6  Keratin_Tumour   Macrophages    p53 -0.001402178 -7.010113
#>           pval          fdr
#> 1 2.868257e-12 2.868257e-12

Statial also provides a convenient function for visualising this interaction - plotStateChanges. Here, again we can specify image = 6 and our main cell types of interest, keratin/tumour cells and macrophages, and our marker p53, in the same format as calcStateChanges.

Through this analysis, we can observe that keratin/tumour cells closer to a group of macrophages tend to have higher expression of p53, as observed in the first graph. This relationship is quantified with the second graph, showing an overall decrease of p53 expression in keratin/tumour cells as distance to macrophages increase.

These results allow us to essentially arrive at the same result as Kontextual, which calculated a localisation between p53+ keratin/tumour cells and macrophages in the wider context of keratin/tumour cells.

p <- plotStateChanges(
  cells = kerenSCE,
  type = "distances",
  image = "6",
  from = "Keratin_Tumour",
  to = "Macrophages",
  marker = "p53",
  size = 1,
  shape = 19,
  interactive = FALSE,
  plotModelFit = FALSE,
  method = "lm")

p
#> $image

#> 
#> $scatter

Beyond looking at single cell-to-cell interactions for a single image, we can also look at all interactions across all images. The calcStateChanges function provided by Statial can be expanded for this exact purpose - by not specifying cell types, a marker, or an image, calcStateChanges will examine the most significant correlations between distance and marker expression across the entire dataset. Here, we’ve filtered out the most significant interactions to only include those found within image 6 of the Keren et al. dataset.

stateChanges <- calcStateChanges(
  cells = kerenSCE,
  type = "distances",
  nCores = 1,
  minCells = 100)

stateChanges |> head(n = 10)
#>      imageID primaryCellType otherCellType     marker         coef      tval
#> 8674      35        CD4_Cell             B       CD20 -0.029185750 -40.57355
#> 8770      35        CD4_Cell       Dc/Mono       CD20  0.019125946  40.53436
#> 1819      35               B       Dc/Mono phospho.S6  0.005282065  40.41385
#> 8779      35        CD4_Cell       Dc/Mono phospho.S6  0.004033218  34.72882
#> 1813      35               B       Dc/Mono     HLA.DR  0.011120703  34.15344
#> 1971      35               B  other immune          P  0.011182182  34.14375
#> 8626      35        CD4_Cell      CD3_Cell       CD20  0.016349492  33.91901
#> 1816      35               B       Dc/Mono     H3K9ac  0.005096632  33.99856
#> 2011      35               B  other immune phospho.S6  0.005986586  33.66466
#> 1818      35               B       Dc/Mono   H3K27me3  0.006980810  33.22740
#>               pval           fdr
#> 8674 7.019343e-282 3.553472e-277
#> 8770 1.891267e-281 4.787176e-277
#> 1819 5.306590e-278 8.954694e-274
#> 8779 4.519947e-219 5.720445e-215
#> 1813 8.401034e-212 8.505879e-208
#> 1971 1.056403e-211 8.913225e-208
#> 8626 1.219488e-210 8.819335e-207
#> 1816 3.266533e-210 2.067062e-206
#> 2011 8.545691e-207 4.806856e-203
#> 1818 2.438769e-202 1.234603e-198

stateChanges |> 
  filter(imageID == 6) |>
  head(n = 10)
#>    imageID primaryCellType otherCellType       marker         coef      tval
#> 1        6  Keratin_Tumour  Unidentified           Na  0.004218419  25.03039
#> 2        6  Keratin_Tumour   Macrophages  HLA_Class_1 -0.003823497 -24.69629
#> 3        6  Keratin_Tumour      CD4_Cell  HLA_Class_1 -0.003582774 -23.87797
#> 4        6  Keratin_Tumour  Unidentified Beta.catenin  0.005893120  23.41953
#> 5        6  Keratin_Tumour      CD8_Cell  HLA_Class_1 -0.003154544 -23.13804
#> 6        6  Keratin_Tumour       Dc/Mono  HLA_Class_1 -0.003353834 -22.98944
#> 7        6  Keratin_Tumour      CD3_Cell  HLA_Class_1 -0.003123446 -22.63197
#> 8        6  Keratin_Tumour        Tumour  HLA_Class_1  0.003684079  21.94265
#> 9        6  Keratin_Tumour      CD4_Cell           Fe -0.003457338 -21.43550
#> 10       6  Keratin_Tumour      CD4_Cell   phospho.S6 -0.002892457 -20.50767
#>             pval           fdr
#> 1  6.971648e-127 1.176442e-123
#> 2  7.814253e-124 1.236215e-120
#> 3  1.745242e-116 2.208779e-113
#> 4  1.917245e-112 2.257178e-109
#> 5  5.444541e-110 5.991836e-107
#> 6  1.053130e-108 1.110701e-105
#> 7  1.237988e-105 1.205229e-102
#> 8  8.188258e-100  7.025803e-97
#> 9   1.287478e-95  9.727951e-93
#> 10  3.928912e-88  2.583081e-85

Let’s take a look at the top 10 most significant gene-to-distance pairwise interactions. In image 6, the majority of the top 10 most significant interactions occur between keratin/tumour cells and an immune population, and many of these interactions appear to involve the HLA class I ligand.

We can examine some of these interactions further with the plotStateChanges function. Taking a closer examination of the relationship between macrophages and keratin/tumour HLA class I expression, the plot below shows us a clear visual correlation - as macrophage density increases, keratin/tumour cells increase their expression HLA class I.

Biologically, HLA Class I is a ligand which exists on all nucleated cells, tasked with presenting internal cell antigens for recognition by the immune system, marking aberrant cells for destruction by either CD8+ T cells or NK cells. This increased expression of HLA Class I molecules on tumour cells is an interesting finding, and potentially uncovers a novel interaction between tumour cells and macrophages which drives aberrant HLA Class I expression in tumour cells.


p <- plotStateChanges(
  cells = kerenSCE,
  type = "distances",
  image = "6",
  from = "Keratin_Tumour",
  to = "Macrophages",
  marker = "HLA_Class_1",
  size = 1,
  shape = 19,
  interactive = FALSE,
  plotModelFit = FALSE,
  method = "lm")

p
#> $image

#> 
#> $scatter

Now, we can take a look at the top 10 most significant results across all images. Immediately, we can appreciate that a couple of interactions appear a bit strange. One of the most significant interactions occurs between B cells and CD4 T cells, where CD4 T cells are found to increase in CD20 expression when in close proximity to B cells. Biologically, CD20 is a highly specific ligand for B cells, and under healthy circumstances are usually not expressed in T cells.

Could this potentially be an artefact of calcStateChanges? We can examine the image through the plotStateChanges function, where we indeed observe a strong increase in CD20 expression in T cells nearby B cell populations.


p <- plotStateChanges(
  cells = kerenSCE,
  type = "distances",
  image = "35",
  from = "CD4_Cell",
  to = "B",
  marker = "CD20",
  size = 1,
  shape = 19,
  interactive = FALSE,
  plotModelFit = FALSE,
  method = "lm")

p
#> $image

#> 
#> $scatter

So why are T cells expressing CD20? This brings us to a key problem of cell segmentation - contamination.

8 Contamination

Contamination, or lateral marker spill over is an issue that results in a cell’s marker expressions being wrongly attributed to another adjacent cell. This issue arises from incorrect segmentation where components of one cell are wrongly determined as belonging to another cell. Alternatively, this issue can arise when antibodies used to tag and measure marker expressions don’t latch on properly to a cell of interest, thereby resulting in residual markers being wrongly assigned as belonging to a cell near the intended target cell. It is important that we either correct or account for this incorrect attribution of markers in our modelling process. This is critical in understanding whether significant cell-cell interactions detected are an artefact of technical measurement errors driven by spill over or are real biological changes that represent a shift in a cell’s state.

To circumvent this problem, Statial provides a function that predicts the probability that a cell is any particular cell type - calcContamination. calcContamination returns a dataframe of probabilities demarcating the chance of a cell being any particular cell type. This dataframe is stored under contaminations in the reducedDim slot of the SingleCellExperiment object. It also provides the rfMainCellProb column, which provides the probability that a cell is indeed the cell type it has been designated. E.g. For a cell designated as CD8, rfMainCellProb could give a 80% chance that the cell is indeed CD8, due to contamination.

We can then introduce these probabilities as covariates into our linear model by setting contamination = TRUE as a parameter in our calcStateChanges function. However, this is not a perfect solution for the issue of contamination. As we can see, despite factoring in contamination into our linear model, the correlation between B cell density and CD20 expression in CD4 T cells remains one of the most significant interactions in our model.

kerenSCE <- calcContamination(kerenSCE)

stateChangesCorrected <- calcStateChanges(
  cells = kerenSCE,
  type = "distances",
  nCores = 1,
  minCells = 100,
  contamination = TRUE)

stateChangesCorrected |> head(n = 20)
#>       imageID primaryCellType otherCellType      marker         coef      tval
#> 8674       35        CD4_Cell             B        CD20 -0.024355647 -34.01721
#> 8770       35        CD4_Cell       Dc/Mono        CD20  0.015618732  32.85899
#> 1819       35               B       Dc/Mono  phospho.S6  0.004261315  29.44082
#> 8779       35        CD4_Cell       Dc/Mono  phospho.S6  0.003534832  28.97376
#> 29188       3  Keratin_Tumour            DC          Ca -0.013722536 -29.12212
#> 8626       35        CD4_Cell      CD3_Cell        CD20  0.013232945  28.65811
#> 8629       35        CD4_Cell      CD3_Cell      HLA.DR  0.010029813  28.12833
#> 1669       35               B      CD3_Cell      HLA.DR  0.008777066  25.33554
#> 27641      21  Keratin_Tumour            DC Pan.Keratin -0.005859721 -24.35316
#> 31825       6  Keratin_Tumour  Unidentified          Na  0.004196466  24.57225
#> 8763       35        CD4_Cell       Dc/Mono      CSF.1R  0.008538209  24.75533
#> 1813       35               B       Dc/Mono      HLA.DR  0.008660381  24.67446
#> 2011       35               B  other immune  phospho.S6  0.004528999  24.35727
#> 1675       35               B      CD3_Cell  phospho.S6  0.003590518  23.91966
#> 8635       35        CD4_Cell      CD3_Cell  phospho.S6  0.002839680  23.85722
#> 1971       35               B  other immune           P  0.007425269  23.30130
#> 2008       35               B  other immune      H3K9ac  0.004647334  23.25937
#> 1816       35               B       Dc/Mono      H3K9ac  0.003733476  22.99443
#> 31918       6  Keratin_Tumour   Macrophages HLA_Class_1 -0.003427080 -22.70932
#> 31774       6  Keratin_Tumour      CD4_Cell HLA_Class_1 -0.003286863 -22.69682
#>                pval           fdr
#> 8674  1.835884e-211 9.293979e-207
#> 8770  1.494155e-199 3.782004e-195
#> 1819  1.042570e-164 1.759302e-160
#> 8779  5.429050e-161 6.871006e-157
#> 29188 2.642507e-158 2.675485e-154
#> 8626  5.764270e-158 4.863507e-154
#> 8629  6.305177e-153 4.559904e-149
#> 1669  8.880698e-127 5.619706e-123
#> 27641 1.749637e-123 9.841515e-120
#> 31825 1.157679e-122 5.860632e-119
#> 8763  4.013252e-122 1.846971e-118
#> 1813  5.584082e-121 2.355738e-117
#> 2011  3.129736e-118 1.218767e-114
#> 1675  1.782171e-114 6.444331e-111
#> 8635  2.496591e-114 8.425828e-111
#> 1971  3.041088e-109 9.622001e-106
#> 2008  6.833179e-109 2.034840e-105
#> 1816  1.112809e-106 3.129714e-103
#> 31918 2.922420e-106 7.786558e-103
#> 31774 3.738339e-106 9.462484e-103

However, this does not mean factoring in contamination into our linear model was ineffective. We can visualise this by plotting a ROC curve of true positives against false positives. In general, cell type specific markers such as CD4, CD8, and CD20 should not change in cells they are not specific to. Therefore, relationships detected to be significant involving these cell type markers are likely false positives and will be treated as such for the purposes of evaluation. Meanwhile, cell state markers are predominantly likely to be true positives.

Plotting the relationship between false positives and true positives, we’d expect the contamination correction to be greatest in the relationships with the top 100 lowest p values, where we indeed see more true positives than false positives with contamination correction.

cellTypeMarkers <- c("CD3", "CD4", "CD8", "CD56", "CD11c", "CD68", "CD45", "CD20")

values = c("blue", "red")
names(values) <- c("None", "Corrected")

df <- rbind(data.frame(TP =cumsum(stateChanges$marker %in% cellTypeMarkers), FP = cumsum(!stateChanges$marker %in% cellTypeMarkers), type = "None"),
            data.frame(TP =cumsum(stateChangesCorrected$marker %in% cellTypeMarkers), FP = cumsum(!stateChangesCorrected$marker %in% cellTypeMarkers), type = "Corrected"))

ggplot(df, aes(x = TP, y = FP, colour = type)) + geom_line()+ labs(y = "Cell state marker", x = "Cell type marker") + scale_colour_manual(values = values)


ggplot(df, aes(x = TP, y = FP, colour = type)) + geom_line()+ xlim(0,100) + ylim(0,1000)+ labs(y = "Cell state marker", x = "Cell type marker") + scale_colour_manual(values = values)

The next steps will involve comparing survival time to see if these distance-to-gene interactions are biologically significant.

9 Relating Statial features to patient groups or outcomes.

To examine whether the features obtained from Statial are associated with patient outcomes or groupings, we’ve provided the helper function coxTests to understand if survival outcomes differ significantly between 2 patient groups. Here we examine which features are most associated with patient survival using the Kontextual values as example. To do so, the survival data is extracted from kerenSCE and converted to a survival object, kerenSurv. In addition to this, the Kontextual results must be converted from a data.frame to a wide matrix, this can be done using prepMatrix. Note, to extract the original L-function values, specify column = "original" in prepMatrix. Finally, both the Kontextual matrix and survival object are passed into coxTests to obtain the survival results.

As we can see from the results Mesenchymal__other immune__tissue is the most significant pairwise relationship which contributes to patient survival. That is the relationship between Mesenchymal cells and other immune cells, relative to the parent population of all tissue cells. We can see that there is a positive coeffcient associated with this relationship, which tells us the increase in localisation of Mesenchymal and Macrophages lead to poorer survival outcomes for patients.

# Helper function to run coxPh models

coxTests = function(measurementMat, Surv) {
  
  result = apply(measurementMat, 2, function(measurementCol) {
    fit = coxph(Surv ~ measurementCol)
    summary(fit)$coefficients[1, c(1, 3, 5)]
  })
  
  result = result |> 
    t() |> 
    data.frame() |> 
    rownames_to_column() 
  
  colnames(result) = c("imageID", "coef", "se.coef", "p.value")
  result$`adj.p.value` = p.adjust(result$`p.value`, "fdr")
  
  result = result |> arrange(p.value)
  
  result[,sapply(result,is.numeric)] <- signif(result[,sapply(result,is.numeric)],2)
  
  return(result)
}

# Extracting survival data to create survival object
survData = kerenSCE |>
    colData() |> 
    data.frame() |> 
    select(imageID, Survival_days_capped, Censored) |> 
    unique()

# Creating survival vector
kerenSurv = Surv(survData$Survival_days_capped, survData$Censored)
names(kerenSurv) = survData$imageID


# Converting Kontextual result into data matrix
kontextMat = prepMatrix(kerenKontextual)

# Ensuring rownames of kontextMat match up with rownames of the survival vector 
kontextMat = kontextMat[names(kerenSurv), ]


# Running survival analysis
survivalResults = coxTests(kontextMat, kerenSurv)

head(survivalResults)
#>                               imageID   coef se.coef p.value adj.p.value
#> 1   Mesenchymal__other immune__tissue  0.096   0.048   0.044        0.64
#> 2  other immune__Endothelial__myeloid  0.047   0.024   0.049        0.64
#> 3       CD4_Cell__Neutrophils__immune  0.110   0.055   0.050        0.64
#> 4          CD4_Cell__CD8_Cell__tcells  0.580   0.300   0.050        0.64
#> 5    CD4_Cell__Keratin_Tumour__immune -0.150   0.075   0.051        0.64
#> 6 Neutrophils__Keratin_Tumour__immune  0.100   0.053   0.053        0.64

The association between Mesenchymal__other immune__tissue and survival can also be visualised on a Kaplan-Meier curve. We must first extract the Kontextual values of this relationship across all images. Next we determine if Mesenchymal and Macrophages are relatively attracted or avoiding in each image, by comparing the Kontextual value in each image to the median Kontextual value. Finally we plot the Kaplan-Meier curve using the ggsurvfit package.

As shown below, when Mesenchymal and Macrophages are relatively more dispersed to one another, patients tend to have better survival outcomes. When these cells are relatively localised, patients have a poorer survival outcome.

# Selecting most significant relationship
survRelationship = kontextMat[["Mesenchymal__other immune__tissue"]]
survRelationship = ifelse(survRelationship > median(survRelationship), "Localised", "Dispersed")
    
# Plotting Kaplan-Meier curve
survfit2(kerenSurv ~ survRelationship) |>
    ggsurvfit() +
    add_pvalue() +
    ggtitle("Mesenchymal__other immune__tissue")

Similar survival analysis can be done using the state changes results. Here, prepMatrix extracts the coefficients, or coef column of stateChanges by default. To use the t values instead, specify column = "tval" in the prepMatrix function.

For our state changes results, Keratin_Tumour__CD4_Cell__Keratin6 is the most significant pairwise relationship which contributes to patient survival. That is the relationship between Keratin6 expressing keratin/tumour cells and CD4 cells. We can see that there is a positive coeffcient associated with this relationship, which tells us that higher Keratin6 in keratin/tumour cells nearby CD4 populations lead to poorer survival outcomes for patients.

#Preparing features for Statial
stateMat <- prepMatrix(stateChanges)

# Ensuring rownames of stateMat match up with rownames of the survival vector 
stateMat = stateMat[names(kerenSurv), ]

survivalResults = coxTests(stateMat, kerenSurv)

head(survivalResults)
#>                            imageID  coef se.coef p.value adj.p.value
#> 1 CD4_Cell__other immune__Vimentin  1600     760   0.032        0.77
#> 2      CD4_Cell__Macrophages__CD68  -250     120   0.033        0.77
#> 3                  Dc/Mono__B__CD4 -1000     470   0.034        0.77
#> 4         CD4_Cell__CD4_Cell__CD16   910     430   0.034        0.77
#> 5             Dc/Mono__Tregs__CD45 -1500     690   0.034        0.77
#> 6         CD4_Cell__CD8_Cell__CD68  1400     650   0.036        0.77

Again, we can also visualise the Kaplan-Meier curve visualisation for the most significant interaction. As Kontextual and calcStateChanges provides different outputs, localisation here refers to increased localisation between Keratin6 expressing keratin/tumour cells and CD4 cells, i.e. localisation means keratin/tumour cells tend to increase Keratin6 expression nearby CD4 cells, whilst dispersion would mean keratin/tumour cells do not tend to express Keratin6 near CD4 cell populations.

As shown below, increased keratin6 expression in keratin/tumour cells near CD4 populations tends to worsen survival outcomes, whilst lower keratin6 expression appears to improve survival outcomes.

survRelationship = stateMat[["Keratin_Tumour__CD4_Cell__Keratin6"]]
survRelationship = ifelse(survRelationship > median(survRelationship), "Dispersed", "Localised")
    
# Plotting Kaplan-Meier curve
survfit2(kerenSurv ~ survRelationship) |>
    ggsurvfit() +
    add_pvalue() +
    ggtitle("Keratin_Tumour__CD4_Cell__Keratin6")

10 Region analysis using lisaClust

Next we can cluster areas with similar spatial interactions to identify regions using lisaClust. Here we set k = 5 to identify 5 regions.

set.seed(51773)

# Preparing features for lisaClust
kerenSCE <- lisaClust::lisaClust(kerenSCE, k = 5)

The regions identified by licaClust can be visualised using the hatchingPlot function.

# Use hatching to visualise regions and cell types.
lisaClust::hatchingPlot(kerenSCE,
  useImages = "5",
  line.spacing = 41, # spacing of lines
  nbp = 100 # smoothness of lines
) 

11 Marker Means

Statial provides functionality to identify the average marker expression of a given cell type in a given region, using the getMarkerMeans function. Similar to the analysis above, these features can also be used for survival analysis.


cellTypeRegionMeans <- getMarkerMeans(kerenSCE,
                              imageID = "imageID",
                              cellType = "cellType",
                              region = "region")

survivalResults = coxTests(cellTypeRegionMeans[names(kerenSurv),], kerenSurv)

head(survivalResults)
#>                          imageID coef se.coef p.value adj.p.value
#> 1      CD45RO__Dc/Mono__region_3  2.6     1.2   0.028        0.83
#> 2       CD11b__Dc/Mono__region_3  7.8     3.6   0.031        0.83
#> 3 HLA_Class_1__Dc/Mono__region_5  3.4     1.6   0.031        0.83
#> 4        HLA.DR__Tregs__region_3  3.6     1.7   0.032        0.83
#> 5        CD68__Dc/Mono__region_3  6.9     3.2   0.033        0.83
#> 6  HLA.DR__Mesenchymal__region_4  6.8     3.2   0.033        0.83

12 Patient classification

Finally we demonstrate how we can use ClassifyR to perform patient classification with the features generated from Statial. In addition to the kontextual, state changes, and marker means values, we also calculate cell type proportions and region proportions using the getProp function in spicyR. Here we perform 3 fold cross validation with 10 repeats, using a CoxPH model for survival classification, feature selection is also performed by selecting the top 10 features per fold using a CoxPH model.


# Calculate cell type and region proportions
cellTypeProp <- getProp(kerenSCE, 
                       feature = "cellType",
                       imageID = "imageID")
regionProp <- getProp(kerenSCE, 
                       feature = "region",
                       imageID = "imageID")

# Combine all the features into a list for classification 
featureList <- list(states = stateMat, 
                     kontextual = kontextMat,
                     regionMarkerMeans = cellTypeRegionMeans,
                     cellTypeProp = cellTypeProp,
                     regionProp = regionProp)

# Ensure the rownames of the features match the order of the survival vector
featureList <- lapply(featureList, function(x)x[names(kerenSurv),])


set.seed(51773)

kerenCV = crossValidate(
  measurements = featureList,
  outcome = kerenSurv,
  classifier = "CoxPH",
  selectionMethod  = "CoxPH",
  nFolds = 3,
  nFeatures = 10,
  nRepeats = 20,
  nCores = 1
  )

Here, we use the performancePlot function to assess the C-index from each repeat of the 3-fold cross-validation. We can see the resulting C-indexes are very variable due to the dataset only containing 10 images.

# Calculate AUC for each cross-validation repeat and plot.
performancePlot(kerenCV,
  characteristicsList = list(x = "Assay Name")
  ) + 
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1))

13 References

Appendix

Keren, L., Bosse, M., Marquez, D., Angoshtari, R., Jain, S., Varma, S., Yang, S. R., Kurian, A., Van Valen, D., West, R., Bendall, S. C., & Angelo, M. (2018). A Structured Tumor-Immune Microenvironment in Triple Negative Breast Cancer Revealed by Multiplexed Ion Beam Imaging. Cell, 174(6), 1373-1387.e1319. (DOI)

A sessionInfo

sessionInfo()
#> R version 4.3.1 (2023-06-16)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 22.04.3 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.17-bioc/R/lib/libRblas.so 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: America/New_York
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] tibble_3.2.1                ggsurvfit_0.3.1            
#>  [3] ggplot2_3.4.3               SingleCellExperiment_1.22.0
#>  [5] dplyr_1.1.3                 lisaClust_1.8.1            
#>  [7] ClassifyR_3.4.9             survival_3.5-7             
#>  [9] BiocParallel_1.34.2         MultiAssayExperiment_1.26.0
#> [11] SummarizedExperiment_1.30.2 Biobase_2.60.0             
#> [13] GenomicRanges_1.52.0        GenomeInfoDb_1.36.3        
#> [15] IRanges_2.34.1              MatrixGenerics_1.12.3      
#> [17] matrixStats_1.0.0           S4Vectors_0.38.1           
#> [19] BiocGenerics_0.46.0         generics_0.1.3             
#> [21] spicyR_1.12.2               Statial_1.2.4              
#> [23] BiocStyle_2.28.0           
#> 
#> loaded via a namespace (and not attached):
#>   [1] RColorBrewer_1.1-3        jsonlite_1.8.7           
#>   [3] magrittr_2.0.3            spatstat.utils_3.0-3     
#>   [5] magick_2.7.5              farver_2.1.1             
#>   [7] nloptr_2.0.3              rmarkdown_2.24           
#>   [9] zlibbioc_1.46.0           vctrs_0.6.3              
#>  [11] minqa_1.2.6               spatstat.explore_3.2-3   
#>  [13] DelayedMatrixStats_1.22.6 RCurl_1.98-1.12          
#>  [15] rstatix_0.7.2             htmltools_0.5.6          
#>  [17] S4Arrays_1.0.6            curl_5.0.2               
#>  [19] broom_1.0.5               Rhdf5lib_1.22.1          
#>  [21] rhdf5_2.44.0              sass_0.4.7               
#>  [23] bslib_0.5.1               htmlwidgets_1.6.2        
#>  [25] plyr_1.8.8                plotly_4.10.2            
#>  [27] cachem_1.0.8              lifecycle_1.0.3          
#>  [29] pkgconfig_2.0.3           Matrix_1.6-1             
#>  [31] R6_2.5.1                  fastmap_1.1.1            
#>  [33] GenomeInfoDbData_1.2.10   digest_0.6.33            
#>  [35] numDeriv_2016.8-1.1       colorspace_2.1-0         
#>  [37] tensor_1.5                dqrng_0.3.1              
#>  [39] ggpubr_0.6.0              beachmat_2.16.0          
#>  [41] labeling_0.4.3            spatstat.sparse_3.0-2    
#>  [43] fansi_1.0.4               httr_1.4.7               
#>  [45] polyclip_1.10-4           abind_1.4-5              
#>  [47] mgcv_1.9-0                compiler_4.3.1           
#>  [49] withr_2.5.0               backports_1.4.1          
#>  [51] carData_3.0-5             HDF5Array_1.28.1         
#>  [53] ggforce_0.4.1             R.utils_2.12.2           
#>  [55] ggsignif_0.6.4            MASS_7.3-60              
#>  [57] concaveman_1.1.0          DelayedArray_0.26.7      
#>  [59] rjson_0.2.21              tools_4.3.1              
#>  [61] ranger_0.15.1             goftest_1.2-3            
#>  [63] R.oo_1.25.0               glue_1.6.2               
#>  [65] nlme_3.1-163              rhdf5filters_1.12.1      
#>  [67] grid_4.3.1                reshape2_1.4.4           
#>  [69] gtable_0.3.4              spatstat.data_3.0-1      
#>  [71] class_7.3-22              R.methodsS3_1.8.2        
#>  [73] tidyr_1.3.0               data.table_1.14.8        
#>  [75] car_3.1-2                 utf8_1.2.3               
#>  [77] XVector_0.40.0            spatstat.geom_3.2-5      
#>  [79] pillar_1.9.0              stringr_1.5.0            
#>  [81] limma_3.56.2              splines_4.3.1            
#>  [83] tweenr_2.0.2              lattice_0.21-8           
#>  [85] deldir_1.0-9              tidyselect_1.2.0         
#>  [87] locfit_1.5-9.8            scuttle_1.10.2           
#>  [89] knitr_1.44                V8_4.3.3                 
#>  [91] bookdown_0.35             edgeR_3.42.4             
#>  [93] xfun_0.40                 DropletUtils_1.20.0      
#>  [95] pheatmap_1.0.12           fftwtools_0.9-11         
#>  [97] scam_1.2-14               stringi_1.7.12           
#>  [99] lazyeval_0.2.2            yaml_2.3.7               
#> [101] boot_1.3-28.1             evaluate_0.21            
#> [103] codetools_0.2-19          BiocManager_1.30.22      
#> [105] cli_3.6.1                 munsell_0.5.0            
#> [107] jquerylib_0.1.4           Rcpp_1.0.11              
#> [109] spatstat.random_3.1-6     parallel_4.3.1           
#> [111] sparseMatrixStats_1.12.2  bitops_1.0-7             
#> [113] lme4_1.1-34               SpatialExperiment_1.10.0 
#> [115] viridisLite_0.4.2         lmerTest_3.1-3           
#> [117] scales_1.2.1              purrr_1.0.2              
#> [119] crayon_1.5.2              rlang_1.1.1