For the demonstration of escape, we will use the example “pbmc_small” data from Seurat and also generate a
SingleCellExperiment object from it.
suppressPackageStartupMessages(library(escape)) suppressPackageStartupMessages(library(dittoSeq)) suppressPackageStartupMessages(library(SingleCellExperiment)) suppressPackageStartupMessages(library(Seurat)) suppressPackageStartupMessages(library(SeuratObject)) pbmc_small <- get("pbmc_small") pbmc_small <- DietSeurat(suppressMessages(UpdateSeuratObject(pbmc_small))) sce <- as.SingleCellExperiment(pbmc_small, assay = "RNA")
The first step in the process of performing gene set enrichment analysis is identifying the gene sets we would like to use. The function
getGeneSets() allows users to isolate a whole or multiple libraries from a list of GSEABase GeneSetCollection objects. We can do this for gene set collections from the built-in Molecular Signature Database by setting the parameter library equal to library/libraries of interest. For multiple libraries, just set library = c(“Library1”, “Library2”, etc).
- Individual pathways/gene sets can be isolated from the libraries selected, by setting gene.sets = the name(s) of the gene sets of interest.
- Subcategories of the invidual libaries can be selected using the subcategory parameter.
If the sequencing of the single-cell data is performed on a species other than “Homo sapiens”, make sure to use the species parameter in
getGeneSets() in order to get the correct gene nomenclature.
GS.hallmark <- getGeneSets(library = "H")
data("escape.gene.sets", package="escape") gene.sets <- escape.gene.sets
gene.sets <- list(Tcell_signature = c("CD2","CD3E","CD3D"), Myeloid_signature = c("SPI1","FCER1G","CSF1R"))
The next step is performing the enrichment on the RNA count data. The function
enrichIt() can handle either a matrix of raw count data or will pull that data directly from a SingleCellExperiment or Seurat object. The gene.sets parameter in the function is the GeneSets, either generated from
getGeneSets() or from the user. The enrichment scores will be calculated across all individual cells and groups is the n size to break the enrichment by while the cores is the number of cores to perform in parallel during the enrichment calculation too.
enrichIt() can utilize two distinct methods for quantification using the method parameter - either the “ssGSEA” method described by Barbie et al 2009 or “UCell” described by Andreatta and Carmona 2021.
To prevent issues with enrichment calculations for gene sets with a low number of genes represented in the count data of a single-cell object - min.size was instituted to remove gene sets with less than the indicated genes.
Important Note: This is computationally intensive and is highly dependent on the number of cells and the number of gene sets included.
ES.seurat <- enrichIt(obj = pbmc_small, gene.sets = GS.hallmark, groups = 1000, cores = 2, min.size = 5)
##  "Using sets of 1000 cells. Running 1 times." ## Setting parallel calculations through a SnowParam back-end ## with workers=2 and tasks=100. ## Estimating ssGSEA scores for 17 gene sets. ##  "Calculating ranks..." ##  "Calculating absolute values from ranks..."
ES.sce <- enrichIt(obj = sce, gene.sets = GS.hallmark, method = "UCell", groups = 1000, cores = 2, min.size = 5)
##  "Using sets of 1000 cells. Running 1 times."
We can then easily add these results back to our Seurat or SCE object.
## if working with a Seurat object pbmc_small <- Seurat::AddMetaData(pbmc_small, ES.seurat) ## if working with an SCE object met.data <- merge(colData(sce), ES.sce, by = "row.names", all=TRUE) row.names(met.data) <- met.data$Row.names met.data$Row.names <- NULL colData(sce) <- met.data
The easiest way to generate almost any visualization for single cell data is via dittoSeq, which is an extremely flexible visualization package for both bulk and single-cell RNA-seq data that works very well for both expression data and metadata. Better yet, it can handle both SingleCellExperiment and Seurat objects.
To keep things consistent, we’ll define a pleasing color scheme.
colors <- colorRampPalette(c("#0D0887FF","#7E03A8FF","#CC4678FF","#F89441FF","#F0F921FF"))
A simple way to approach visualizations for enrichment results is the heatmap, especially if you are using a number of gene sets or libraries.
dittoHeatmap(pbmc_small, genes = NULL, metas = names(ES.seurat), annot.by = "groups", fontsize = 7, cluster_cols = TRUE, heatmap.colors = colors(50))
A user can also produce a heatmap with select gene sets by providing specific names to the metas parameter. For example, we can isolated gene sets involved immune response.
dittoHeatmap(sce, genes = NULL, metas = c("HALLMARK_IL2_STAT5_SIGNALING", "HALLMARK_IL6_JAK_STAT3_SIGNALING", "HALLMARK_INFLAMMATORY_RESPONSE"), annot.by = "groups", fontsize = 7, heatmap.colors = colors(50))
Another way to visualize a subset of gene set enrichment would be to graph the distribution of enrichment using violin, jitter, boxplot, or ridgeplots. We can also compare between categorical variables using the group.by parameter.
multi_dittoPlot(sce, vars = c("HALLMARK_IL2_STAT5_SIGNALING", "HALLMARK_IL6_JAK_STAT3_SIGNALING", "HALLMARK_INFLAMMATORY_RESPONSE"), group.by = "groups", plots = c("jitter", "vlnplot", "boxplot"), ylab = "Enrichment Scores", theme = theme_classic() + theme(plot.title = element_text(size = 10)))