R version: R version 4.4.1 (2024-06-14)
Bioconductor version: 3.20
Understanding the interplay between different types of cells and their immediate environment is critical for understanding the mechanisms of cells themselves and their function in the context of human diseases. Recent advances in high dimensional in situ cytometry technologies have fundamentally revolutionised our ability to observe these complex cellular relationships providing an unprecedented characterisation of cellular heterogeneity in a tissue environment.
We have developed an analytical framework for analysing data from high dimensional in situ cytometry assays including CODEX, CycIF, IMC and High Definition Spatial Transcriptomics. Implemented in R, this framework makes use of functionality from our Bioconductor packages spicyR, lisaClust, treekoR, FuseSOM, simpleSeg and ClassifyR. Below we will provide an overview of key steps which are needed to interrogate the comprehensive spatial information generated by these exciting new technologies including cell segmentation, feature normalisation, cell type identification, micro-environment characterisation, spatial hypothesis testing and patient classification. Ultimately, our modular analysis framework provides a cohesive and accessible entry point into spatially resolved single cell data analysis for any R-based bioinformaticians.
To install the current release of spicyWorkflow, run the following code.
if (!require("BiocManager", quietly = TRUE)) {
install.packages("BiocManager")
}
BiocManager::install("spicyWorkflow")
library(cytomapper)
library(dplyr)
library(ggplot2)
library(simpleSeg)
library(FuseSOM)
library(ggpubr)
library(scater)
library(spicyR)
library(ClassifyR)
library(lisaClust)
library(Statial)
library(tidySingleCellExperiment)
library(SpatialExperiment)
library(SpatialDatasets)
It is convenient to set the number of cores for running code in parallel. Please choose a number that is appropriate for your resources. Set the use_mc
flag to TRUE
if you would like to use parallel processing for the rest of the vignette. A minimum of 2 cores is suggested since running this workflow is rather computationally intensive.
use_mc <- TRUE
if (use_mc) {
nCores <- max(parallel::detectCores()/2, 1)
} else {
nCores <- 2
}
BPPARAM <- simpleSeg:::generateBPParam(nCores)
theme_set(theme_classic())
In the following we will re-analyse some IMC data (Ferguson et al, 2022) profiling the spatial landscape of head and neck cutaneous squamous cell carcinomas (HNcSCC), the second most common type of skin cancer. The majority of HNcSCC can be treated with surgery and good local control, but a subset of large tumours infiltrate subcutaneous tissue and are considered high risk for local recurrence and metastases. The key conclusion of this manuscript (amongst others) is that spatial information about cells and the immune environment can be used to predict primary tumour progression or metastases in patients. We will use our spicy workflow to reach a similar conclusion.
The R code for this analysis is available on github https://github.com/SydneyBioX/spicyWorkflow.
Once the spicyWorkflow package is installed, these images will be located within the spicyWorkflow
folder where the spicyWorkflow
package is installed, under inst/extdata/images
.
Here we use loadImages()
from the cytomapper
package to load all the tiff images into a CytoImageList
object and store the images as h5 file on-disk in a temporary directory using the h5FilesPath = HDF5Array::getHDF5DumpDir()
parameter.
We will also assign the metadata columns of the CytoImageList
object using the mcols()
function.
pathToImages <- SpatialDatasets::Ferguson_Images()
tmp <- tempfile()
unzip(pathToImages, exdir = tmp)
# Store images in a CytoImageList on_disk as h5 files to save memory.
images <- cytomapper::loadImages(
tmp,
single_channel = TRUE,
on_disk = TRUE,
h5FilesPath = HDF5Array::getHDF5DumpDir(),
BPPARAM = BPPARAM
)
mcols(images) <- S4Vectors::DataFrame(imageID = names(images))
As we’re reading the image channels directly from the names of the TIFF image, often these channel names will need to be cleaned for ease of downstream processing.
The channel names can be accessed from the CytoImageList
object using the channelNames()
function.
cn <- channelNames(images) # Read in channel names
head(cn)
## [1] "139La_139La_panCK.ome" "141Pr_141Pr_CD20.ome"
## [3] "142Nd_142Nd_HH3.ome" "143Nd_143Nd_CD45RA.ome"
## [5] "146Nd_146Nd_CD8a.ome" "147Sm_147Sm_podoplanin.ome"
cn <- sub(".*_", "", cn) # Remove preceding letters
cn <- sub(".ome", "", cn) # Remove the .ome
head(cn)
## [1] "panCK" "CD20" "HH3" "CD45RA" "CD8a"
## [6] "podoplanin"
channelNames(images) <- cn # Reassign channel names
Similarly, the image names will be taken from the folder name containing the individual TIFF images for each channel. These will often also need to be cleaned.
head(names(images))
## [1] "ROI001_ROI 01_F3_SP16-001550_1E" "ROI002_ROI 02_F4_SP16-001550_1E"
## [3] "ROI003_ROI 03_F5_SP16-001550_1E" "ROI005_ROI 05_G4_SP17-002069_1F"
## [5] "ROI006_ROI 06_G5_SP17-002069_1F" "ROI007_ROI 07_G6_SP17-005715_1B"
nam <- sapply(strsplit(names(images), "_"), `[`, 3)
head(nam)
## [1] "F3" "F4" "F5" "G4" "G5" "G6"
names(images) <- nam # Reassigning image names
mcols(images)[["imageID"]] <- nam # Reassigning image names
Our simpleSeg R package on https://github.com/SydneyBioX/simpleSeg provides a series of functions to generate simple segmentation masks of images. These functions leverage the functionality of the EBImage package on Bioconductor. For more flexibility when performing your segmentation in R we recommend learning to use the EBimage package. A key strength of the simpleSeg package is that we have coded multiple ways to perform some simple segmentation operations as well as incorporating multiple automatic procedures to optimise some key parameters when these aren’t specified.
If your images are stored in a list
or CytoImageList
they can be segmented with a simple call to simpleSeg()
. To summarise, simpleSeg()
is an R implementation of a simple segmentation technique which traces out the nuclei using a specified channel using nucleus
then dilates around the traced nuclei by a specified amount using discSize
. The nucleus can be traced out using either one specified channel, or by using the principal components of all channels most correlated to the specified nuclear channel by setting pca = TRUE
.
In the particular example below, we have asked simpleSeg
to do the following:
By setting nucleus = c("HH3")
, we’ve asked simpleSeg to trace out the nuclei signal in the images using the HH3 channel.
By setting pca = TRUE
, simpleSeg segments out the nuclei mask using a principal component analysis of all channels and using the principal components most aligned with the nuclei channel, in this case, HH3.
By setting cellBody = "dilate"
, simpleSeg uses a dilation strategy of segmentation, expanding out from the nucleus by a specified discSize
.
By setting discSize = 3
, simpleSeg dilates out from the nucleus by 3 pixels.
By setting sizeSelection = 20
, simpleSeg ensures that only cells with a size greater than 20 pixels will be used.
By setting transform = "sqrt"
, simpleSeg square root transforms each of the channels prior to segmentation.
By setting tissue = c("panCK", "CD45", "HH3")
, we specify a tissue mask which simpleSeg uses, filtering out all background noise outside the tissue mask. This is important as these are tumour cores, wand hence circular, so we’d want to ignore background noise which happens outside of the tumour core.
There are many other parameters that can be specified in simpleSeg (smooth
, watershed
, tolerance
, and ext
), and we encourage the user to select the best parameters which suit their biological context.
masks <- simpleSeg(images,
nucleus = c("HH3"),
pca = TRUE,
cellBody = "dilate",
discSize = 3,
sizeSelection = 20,
transform = "sqrt",
tissue = c("panCK", "CD45", "HH3"),
cores = nCores
)
The display
and colorLabels
functions in EBImage
make it very easy to examine the performance of the cell segmentation. The great thing about display
is that if used in an interactive session it is very easy to zoom in and out of the image.
EBImage::display(colorLabels(masks[[1]]))