DESpace is an intuitive framework for identifying spatially variable (SV) genes (SVGs) via edgeR (Robinson, McCarthy, and Smyth 2009), one of the most common methods for performing differential expression analyses.
Based on pre-annotated spatial clusters as summarized spatial information, DESpace models gene expression using a negative binomial (NB), via edgeR (Robinson, McCarthy, and Smyth 2009), with spatial clusters as covariates. SV genes (SVGs) are then identified by testing the significance of spatial clusters.
Our approach assumes that the spatial structure can be summarized by spatial clusters, which should reproduce the key features of the tissue (e.g., white matter and layers in brain cortex). A significant test of these covariates indicates that space influences gene expression, hence identifying spatially variable genes.
Our model is flexible and robust, and is significantly faster than the most SV methods. Furthermore, to the best of our knowledge, it is the only SV approach that allows: - performing a SV test on each individual spatial cluster, hence identifying the key regions affected by spatial variability; - jointly fitting multiple samples, targeting genes with consistent spatial patterns across biological replicates.
Below, we illustrate en example usage of the package.
DESpace
is implemented as a R package within Bioconductor, which is the main venue for omics analyses, and we use various other Bioconductor packages (e.g., SpatialLIBD, and edgeR).
DESpace
package is available on Bioconductor and can be installed with the following command:
if (!requireNamespace("BiocManager", quietly = TRUE)) {
install.packages("BiocManager")
}
BiocManager::install("DESpace")
## Check that you have a valid Bioconductor installation
BiocManager::valid()
The development version of DESpace
can also be installed from the Bioconductor-devel branch or from GitHub.
To access the R code used in the vignettes, type:
browseVignettes("DESpace")
Questions relative to DESpace should be reported as a new issue at BugReports.
To cite DESpace, type:
citation("DESpace")
Load R packages:
suppressMessages({
library(DESpace)
library(ggplot2)
library(SpatialExperiment)
})
As an example dataset, we consider a human dorsolateral pre-frontal cortex (DLPFC) spatial transcriptomics dataset from the 10x Genomics Visium platform, including three neurotypical adult donors (i.e., biological replicates), with four images per subject (Maynard 2021). The full dataset consists of 12 samples, which can be accessed via spatialLIBD Bioconductor package.
Here, we consider a subset of the original data, consisting of three biological replicates: 1 image for each of the three brain subjects.
Initially, in Section 3 individual sample , we fit our approach on a single sample, whose data is stored in spe3
whereas all 3 samples will later be jointly used in Section 4 Multiple samples.
# Connect to ExperimentHub
ehub <- ExperimentHub::ExperimentHub()
# Download the full real data (about 2.1 GB in RAM) use:
spe_all <- spatialLIBD::fetch_data(type = "spe", eh = ehub)
# Create three spe objects, one per sample:
spe1 <- spe_all[, colData(spe_all)$sample_id == '151507']
spe2 <- spe_all[, colData(spe_all)$sample_id == '151669']
spe3 <- spe_all[, colData(spe_all)$sample_id == '151673']
rm(spe_all)
# Select small set of random genes for faster runtime in this example
set.seed(123)
sel_genes <- sample(dim(spe1)[1],2000)
spe1 <- spe1[sel_genes,]
spe2 <- spe2[sel_genes,]
spe3 <- spe3[sel_genes,]
# For covenience, we use “gene names” instead of “gene ids”:
rownames(spe1) <- rowData(spe1)$gene_name
rownames(spe2) <- rowData(spe2)$gene_name
rownames(spe3) <- rowData(spe3)$gene_name
# Specify column names of spatial coordinates in colData(spe)
coordinates <- c("array_row", "array_col")
# Specify column names of spatial clusters in colData(spe)
spatial_cluster <- 'layer_guess_reordered'
The spatial tissues of each sample were manually annotated in the original manuscript (Maynard 2021), and spots were labeled into one of the following categories: white matter (WM) and layers 1 to 6.
The manual annotations are stored in column layer_guess_reordered
of the colData
, while columns array_col
and array_row
provide the spatial coordinates of spots.
# We select a subset of columns
keep_col <- c(coordinates,spatial_cluster,"expr_chrM_ratio","cell_count")
head(colData(spe3)[keep_col])
## DataFrame with 6 rows and 5 columns
## array_row array_col layer_guess_reordered expr_chrM_ratio
## <integer> <integer> <factor> <numeric>
## AAACAAGTATCTCCCA-1 50 102 Layer3 0.166351
## AAACAATCTACTAGCA-1 3 43 Layer1 0.122376
## AAACACCAATAACTGC-1 59 19 WM 0.114089
## AAACAGAGCGACTCCT-1 14 94 Layer3 0.242223
## AAACAGCTTTCAGAAG-1 43 9 Layer5 0.152174
## AAACAGGGTCTATATT-1 47 13 Layer6 0.155095
## cell_count
## <integer>
## AAACAAGTATCTCCCA-1 6
## AAACAATCTACTAGCA-1 16
## AAACACCAATAACTGC-1 5
## AAACAGAGCGACTCCT-1 2
## AAACAGCTTTCAGAAG-1 4
## AAACAGGGTCTATATT-1 6
Quality control (QC) procedures at the spot and gene level aim to remove both low-quality spots, and lowly abundant genes. For QC, we adhere to the instructions from “Orchestrating Spatially Resolved Transcriptomics Analysis with Bioconductor” (OSTA). The library size, UMI counts, ratio of mitochondrial chromosome (chM) expression, and number of cells per spot are used to identify low-quality spots.
# Sample 1:
# Calculate per-spot QC metrics and store in colData
spe1 <- scuttle::addPerCellQC(spe1,)
# Remove combined set of low-quality spots
spe1 <- spe1[, !(colData(spe1)$sum < 10 | # library size
colData(spe1)$detected < 10 | # number of expressed genes
colData(spe1)$expr_chrM_ratio > 0.30| # mitochondrial expression ratio
colData(spe1)$cell_count > 10)] # number of cells per spot
# Sample 2:
# Calculate per-spot QC metrics and store in colData
spe2 <- scuttle::addPerCellQC(spe2,)
# Remove combined set of low-quality spots
spe2 <- spe2[, !(colData(spe2)$sum < 20 |
colData(spe2)$detected < 15 |
colData(spe2)$expr_chrM_ratio > 0.35|
colData(spe2)$cell_count > 8)]
# Sample 3:
spe3 <- scuttle::addPerCellQC(spe3,)
# Remove combined set of low-quality spots
spe3 <- spe3[, !(colData(spe3)$sum < 25 |
colData(spe3)$detected < 25 |
colData(spe3)$expr_chrM_ratio > 0.3|
colData(spe3)$cell_count > 15)]
Then, we discard lowly abundant genes, which were detected in less than 20 spots.
# For each sample i:
for(i in seq_len(3)){
spe_i <- eval(parse(text = paste0("spe", i)))
# Select QC threshold for lowly expressed genes: at least 20 non-zero spots:
qc_low_gene <- rowSums(assays(spe_i)$counts > 0) >= 20
# Remove lowly abundant genes
spe_i <- spe_i[qc_low_gene,]
assign(paste0("spe", i), spe_i)
message("Dimension of spe", i, ": ", dim(spe_i)[1], ", ", dim(spe_i)[2])
}
## Dimension of spe1: 847, 4174
## Dimension of spe2: 868, 3635
## Dimension of spe3: 908, 3601
We fit our approach to discover SVGs in an individual sample. In Section 4 Multiple samples, we will show how to jointly embed multiple replicates.
This framework relies on spatial clusters being accessible and successfully summarizing the primary spatial characteristics of the data. In most datasets, these spatial features are either accessible or can be easily generated with spatial clustering algorithms.
If manual annotations are provided (e.g., annotated by a pathologist), we can directly use those.
With the spe
or spe
object that contains coordinates of the spot-level data, we can visualize spatial clusters.
# View LIBD layers for one sample
CD <- as.data.frame(colData(spe3))
ggplot(CD,
aes(x=array_col,y=array_row,
color=factor(layer_guess_reordered))) +
geom_point() +
theme_void() + scale_y_reverse() +
theme(legend.position="bottom") +
labs(color = "", title = paste0("Manually annotated spatial clusters"))