You can install the latest version of spacexr from Bioconductor:
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("spacexr")
Robust Cell Type Decomposition (RCTD) is a statistical method for decomposing cell type mixtures in spatial transcriptomics data. In this vignette, we will use a simulated dataset to demonstrate how you can run RCTD on spatial transcriptomics data and visualize your results.
We will use the term pixel, synonymous with spot or bead, to refer to any observation that measures gene expression at a fixed location. We do not use the term cell because, in spatial transcriptomics data, single pixels may contain mixtures of multiple cells. The purpose of RCTD is to estimate the proportions of different cell types in each pixel of a dataset.
library(ggplot2)
library(SpatialExperiment)
library(SummarizedExperiment)
library(spacexr)
Before running RCTD, we must first load in:
The reference data should be stored in a SummarizedExperiment object with:
Required components:
assay
: A matrix (or dgCMatrix
) of gene expression counts, with genes as
rows and cells as columns. Row names should be unique gene names, and column
names should be unique cell barcodes. Counts should be untransformed integers.
colData
: A factor column containing cell type annotations for each cell in
the reference. The column name is assumed to be “cell_type”, but another name
may be specified using the cell_type_col
parameter in createRctd
.
Optional components:
colData
: A numeric column named nUMI
, containing total UMI counts for
each cell. If not provided, nUMI
will be calculated as the column sums of the
counts matrix.In practice, your reference could be an annotated dataset from:
Here, we load counts and cell type annotations from the rctdSim
object, but in a real-world setting, the reference information may be
distributed across several files or R objects.
# Load simulated data.
data("rctdSim")
# Create SummarizedExperiment.
reference_se <- SummarizedExperiment(
assays = list(counts = rctdSim$reference_counts),
colData = rctdSim$reference_cell_types
)
# Examine reference. (optional)
print(dim(assay(reference_se))) # Gene expression matrix dimensions
#> [1] 750 75
table(colData(reference_se)$cell_type) # Number of occurrences of each cell type
#>
#> ct1 ct2 ct3
#> 25 25 25
In this vignette, we have a reference dataset of 750 genes for 75 cells. We
find that there are three cell types (ct1
, ct2
, and ct3
), each appearing
25 times in the reference.
Next, we will load the spatial transcriptomics data into a SpatialExperiment object, though we could use any SummarizedExperiment object with:
Required components:
assay
: A matrix (or dgCMatrix
) of gene expression counts, with genes as
rows and pixels as columns. Row names should be unique gene names, and column
names should be unique pixel barcodes. Counts should be untransformed integers.Optional components:
spatialCoords
: A matrix containing x and y coordinates for each pixel.
colData
: A numeric column named nUMI
, containing total UMI counts for
each pixel. If not provided, nUMI
will be calculated as the column sums of the
counts matrix.
Here, we load coordinates and counts from the rctdSim
object. Again,
in practice, this step will depend on how you store your spatial transcriptomics
data.
# Create SpatialExperiment.
spatial_spe <- SpatialExperiment(
assay = rctdSim$spatial_rna_counts,
spatialCoords = rctdSim$spatial_rna_coords
)
# Examine pixels. (optional)
print(dim(assay(spatial_spe))) # Gene expression matrix dimensions
#> [1] 750 12
ggplot(spatialCoords(spatial_spe), aes(x, y)) +
geom_point(size = 5) +
theme(axis.text = element_blank(), axis.ticks = element_blank()
)
We find that we have a dataset of 750 genes for 12 pixels, plotted above.
The ground truth for our cell type mixtures is given in the
SpatialExperiment object rctdSim$proportions_spe
.
(The ground truth has been stored in the same format as the RCTD output, which
will be discussed later.) Note that this ground truth will not be known in
practice—it is what we will need to estimate with RCTD.
unique(t(assay(rctdSim$proportions_spe)))
#> ct1 ct2 ct3
#> mx11 0.9 0.1 0.0
#> mx21 0.2 0.4 0.4
We find that there are two distinct cell type mixtures in our dataset:
ct1
and 10% ct2
.ct1
and 40% ct2
and ct3
each.We can visualize these mixtures with the plotAllWeights
function, which
plots each pixel as a pie chart of cell types. We will return to this function
later to visualize our RCTD results.
plotAllWeights(
rctdSim$proportions_spe,
r = 0.05, lwd = 0, title = "Ground Truth"
)
Running RCTD is a two-step process:
createRctd
function, which:
runRctd
.Before running RCTD, we perform some preprocessing with createRctd
. Some
important configuration options include:
gene_cutoff
, fc_cutoff
: Used for filtering genes in platform effect
normalization. gene_cutoff
filters for average expression and fc_cutoff
filters for log fold change across cell types.gene_cutoff_reg
, fc_cutoff_reg
: Similar to above, but for the RCTD step.UMI_min
, UMI_max
: Minimum and maximum UMI count per pixel.# Preprocess data
rctd_data <- createRctd(spatial_spe, reference_se)
Now we can run RCTD on the preprocessed data with the runRctd
function.
RCTD has three modes, each suited for different spatial technologies:
"doublet"
: Assigns 1-2 cell types per pixel. Best for high-resolution
technologies like Slide-seq and MERFISH."multi"
: Like doublet mode but can assign more cell types (up to
max_multi_types
). Best for lower-resolution technologies like 100-micron
Visium."full"
: No restrictions on number of cell types.The rctd_mode
argument is used to specify the mode in which the RCTD algorithm
runs. Other important configuration options include:
max_cores
: Number of cores used for parallel processing. We recommend
setting this to at least 4 to improve efficiency. If set to 1, parallel
processing is not used.max_multi_types
: Maximum number of cell types per pixel (multi mode only).For the purposes of demonstration, we’ll decompose the data into doublets in this vignette.
results_spe <- runRctd(rctd_data, rctd_mode = "doublet", max_cores = 1)
RCTD returns a SpatialExperiment object containing:
Three assays (one in full mode):
weights
: Cell type proportions restricted according to the specified
modeweights_unconfident
: Cell type proportions restricted according to the
specified mode, including unconfident predictions (not available in full
mode)weights_full
: Unrestricted cell type proportions (not available in full
mode, use weights
instead)Assays have cell types as rows and pixels as columns, with values representing the proportion (0 to 1) of each cell type in each pixel. Assay columns sum to 1 (except for rejected pixels, which sum to 0).
NOTE: Weights are transposed compared to the output of the original dmcable/spacexr package on GitHub.
Spatial coordinates for each pixel stored in spatialCoords
.
Additional colData
including:
spot_class
: Classification as singlet
, doublet_certain
,
doublet_uncertain
, or reject
first_type
, second_type
: Predicted cell typesmin_score
, singlet_score
for advanced
userscell_type_list
: List of cell types per pixelconf_list
: List of whether cell type predictions are confidentmin_score
for advanced usersWe can view the doublet weights for our data:
# Cell type mixture 1
assay(results_spe, "weights")[, 1:6]
#> 3 x 6 sparse Matrix of class "dgCMatrix"
#> mx11 mx12 mx13 mx14 mx15 mx16
#> ct1 1 1 1 1 1 1
#> ct2 . . . . . .
#> ct3 . . . . . .
# Cell type mixture 2
assay(results_spe, "weights")[, 7:12]
#> 3 x 6 sparse Matrix of class "dgCMatrix"
#> mx21 mx22 mx23 mx24 mx25 mx26
#> ct1 . . . . . .
#> ct2 0.4935949 0.475943 0.4685428 0.4577326 0.5103718 0.4795314
#> ct3 0.5064051 0.524057 0.5314572 0.5422674 0.4896282 0.5204686
You can access the full, unrestricted weights using
assay(results_spe, "weights_full")
. We will visualize these weights later. You
may use assay(results_spe, "weights_unconfident")
to access the less confident
predictions (e.g., the second_type
weight for singlets and uncertain
doublets).
Below, we can view the results of RCTD deconvolution:
classification_df <- data.frame(
pixel = colnames(assay(results_spe)),
spot_class = colData(results_spe)$spot_class,
first_type = colData(results_spe)$first_type,
second_type = colData(results_spe)$second_type
)
classification_df
#> pixel spot_class first_type second_type
#> 1 mx11 singlet ct1 ct2
#> 2 mx12 singlet ct1 ct2
#> 3 mx13 singlet ct1 ct2
#> 4 mx14 singlet ct1 ct2
#> 5 mx15 singlet ct1 ct2
#> 6 mx16 singlet ct1 ct2
#> 7 mx21 doublet_certain ct2 ct3
#> 8 mx22 doublet_certain ct3 ct2
#> 9 mx23 doublet_certain ct3 ct2
#> 10 mx24 doublet_certain ct3 ct2
#> 11 mx25 doublet_certain ct2 ct3
#> 12 mx26 doublet_certain ct3 ct2
In particular, this data frame provides us with the following information:
spot_class
: A factor variable with the following levels of classification:
singlet
(1 cell type in the pixel)doublet_certain
(2 cell types in the pixel)doublet_uncertain
(2 cell types in the pixel, but only confident of 1)reject
(no prediction given for pixel)reject
pixels should not be used, and doublet_uncertain
pixels should only be used for applications that do not require knowledge of
all cell types in a pixel.first_type
: the first cell type predicted on the bead (for all spot_class
conditions except reject
).second_type
: the second cell type predicted on the bead for doublet
spot_class
conditions (not a confident prediction for doublet_uncertain
).We can see that RCTD generally succeeded in identifying the components of the
cell mixtures, although pixels with 90% ct1
and 10% ct2
were classified
as singlets. (As an exercise, try changing the doublet_threshold
parameter
in runRctd
so that these mixtures will be identified as doublets.)
We can visualize the results of our RCTD run with the plotAllWeights
function. Simply pass in the RCTD results (and some aesthetic arguments), and
the cell type proportions will be plotted as pie charts for each pixel.
plotAllWeights(results_spe, r = 0.05, lwd = 0, title = "Doublet Results")
Note that all mixtures with 90% ct1
(on the left) were classified as singlets,
so only ct1
is plotted. And for the other mixtures (on the right), only cell
types ct2
and ct3
are plotted (without any ct1
) due to the restrictions of
doublet mode. We can visualize the full, unrestricted proportions by passing in
the name of the relevant assay ("weights_full"
) to the plotAllWeights
function:
plotAllWeights(
results_spe, "weights_full", r = 0.05, lwd = 0, title = "Full Results"
)
This is very close to the ground truth!
Finally, we can visualize how the proportion of a specific cell type varies
across space with the plotCellTypeWeight
function. We show an example with
the dummy cell type ct1
.
plotCellTypeWeight(
results_spe, "ct1", "weights_full",
size = 5, stroke = 0.5, title = "\nDensity of Cell Type 1\n"
)
You are now ready to apply RCTD to your own spatial transcriptomics datasets!
sessionInfo()
#> R Under development (unstable) (2025-03-13 r87965)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.2 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: America/New_York
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] spacexr_0.99.3 SpatialExperiment_1.17.0
#> [3] SingleCellExperiment_1.29.2 SummarizedExperiment_1.37.0
#> [5] Biobase_2.67.0 GenomicRanges_1.59.1
#> [7] GenomeInfoDb_1.43.4 IRanges_2.41.3
#> [9] S4Vectors_0.45.4 BiocGenerics_0.53.6
#> [11] generics_0.1.3 MatrixGenerics_1.19.1
#> [13] matrixStats_1.5.0 ggplot2_3.5.1
#> [15] BiocStyle_2.35.0
#>
#> loaded via a namespace (and not attached):
#> [1] tidyselect_1.2.1 dplyr_1.1.4 farver_2.1.2
#> [4] blob_1.2.4 filelock_1.0.3 fastmap_1.2.0
#> [7] BiocFileCache_2.15.1 tweenr_2.0.3 digest_0.6.37
#> [10] lifecycle_1.0.4 RSQLite_2.3.9 magrittr_2.0.3
#> [13] compiler_4.6.0 rlang_1.1.5 sass_0.4.9
#> [16] tools_4.6.0 yaml_2.3.10 knitr_1.50
#> [19] S4Arrays_1.7.3 labeling_0.4.3 bit_4.6.0
#> [22] curl_6.2.2 DelayedArray_0.33.6 abind_1.4-8
#> [25] withr_3.0.2 purrr_1.0.4 grid_4.6.0
#> [28] polyclip_1.10-7 colorspace_2.1-1 scales_1.3.0
#> [31] MASS_7.3-65 tinytex_0.56 cli_3.6.4
#> [34] rmarkdown_2.29 crayon_1.5.3 httr_1.4.7
#> [37] rjson_0.2.23 DBI_1.2.3 cachem_1.1.0
#> [40] ggforce_0.4.2 BiocManager_1.30.25 XVector_0.47.2
#> [43] vctrs_0.6.5 yulab.utils_0.2.0 Matrix_1.7-3
#> [46] jsonlite_2.0.0 bookdown_0.42 bit64_4.6.0-1
#> [49] magick_2.8.6 jquerylib_0.1.4 tidyr_1.3.1
#> [52] glue_1.8.0 gtable_0.3.6 UCSC.utils_1.3.1
#> [55] quadprog_1.5-8 munsell_0.5.1 tibble_3.2.1
#> [58] pillar_1.10.2 htmltools_0.5.8.1 GenomeInfoDbData_1.2.14
#> [61] R6_2.6.1 dbplyr_2.5.0 evaluate_1.0.3
#> [64] lattice_0.22-7 memoise_2.0.1 ggfun_0.1.8
#> [67] bslib_0.9.0 Rcpp_1.0.14 SparseArray_1.7.7
#> [70] scatterpie_0.2.4 xfun_0.52 fs_1.6.5
#> [73] pkgconfig_2.0.3