Contents

1 Installation

You can install the latest version of spacexr from Bioconductor:

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("spacexr")

2 Introduction

Robust Cell Type Decomposition (RCTD) is a statistical method for decomposing cell type mixtures in spatial transcriptomics data. In this vignette, we will use a simulated dataset to demonstrate how you can run RCTD on spatial transcriptomics data and visualize your results.

We will use the term pixel, synonymous with spot or bead, to refer to any observation that measures gene expression at a fixed location. We do not use the term cell because, in spatial transcriptomics data, single pixels may contain mixtures of multiple cells. The purpose of RCTD is to estimate the proportions of different cell types in each pixel of a dataset.

3 Setup

library(ggplot2)
library(SpatialExperiment)
library(SummarizedExperiment)

library(spacexr)

4 Data Preprocessing

Before running RCTD, we must first load in:

4.1 Reference

The reference data should be stored in a SummarizedExperiment object with:

Required components:

  1. assay: A matrix (or dgCMatrix) of gene expression counts, with genes as rows and cells as columns. Row names should be unique gene names, and column names should be unique cell barcodes. Counts should be untransformed integers.

  2. colData: A factor column containing cell type annotations for each cell in the reference. The column name is assumed to be “cell_type”, but another name may be specified using the cell_type_col parameter in createRctd.

Optional components:

  1. colData: A numeric column named nUMI, containing total UMI counts for each cell. If not provided, nUMI will be calculated as the column sums of the counts matrix.

In practice, your reference could be an annotated dataset from:

  • Single-nucleus RNA sequencing (snRNA-seq)
  • Single-cell RNA sequencing (scRNA-seq)
  • Cell type-specific bulk RNA sequencing

Here, we load counts and cell type annotations from the rctdSim object, but in a real-world setting, the reference information may be distributed across several files or R objects.

# Load simulated data.
data("rctdSim")

# Create SummarizedExperiment.
reference_se <- SummarizedExperiment(
    assays = list(counts = rctdSim$reference_counts),
    colData = rctdSim$reference_cell_types
)

# Examine reference. (optional)
print(dim(assay(reference_se))) # Gene expression matrix dimensions
#> [1] 750  75
table(colData(reference_se)$cell_type) # Number of occurrences of each cell type
#> 
#> ct1 ct2 ct3 
#>  25  25  25

In this vignette, we have a reference dataset of 750 genes for 75 cells. We find that there are three cell types (ct1, ct2, and ct3), each appearing 25 times in the reference.

4.2 Spatial Transcriptomics Data

Next, we will load the spatial transcriptomics data into a SpatialExperiment object, though we could use any SummarizedExperiment object with:

Required components:

  1. assay: A matrix (or dgCMatrix) of gene expression counts, with genes as rows and pixels as columns. Row names should be unique gene names, and column names should be unique pixel barcodes. Counts should be untransformed integers.

Optional components:

  1. spatialCoords: A matrix containing x and y coordinates for each pixel.

  2. colData: A numeric column named nUMI, containing total UMI counts for each pixel. If not provided, nUMI will be calculated as the column sums of the counts matrix.

Here, we load coordinates and counts from the rctdSim object. Again, in practice, this step will depend on how you store your spatial transcriptomics data.

# Create SpatialExperiment.
spatial_spe <- SpatialExperiment(
    assay = rctdSim$spatial_rna_counts,
    spatialCoords = rctdSim$spatial_rna_coords
)

# Examine pixels. (optional)
print(dim(assay(spatial_spe))) # Gene expression matrix dimensions
#> [1] 750  12
ggplot(spatialCoords(spatial_spe), aes(x, y)) + 
    geom_point(size = 5) + 
    theme(axis.text = element_blank(), axis.ticks = element_blank()
)


We find that we have a dataset of 750 genes for 12 pixels, plotted above.

4.2.1 Visualizing Ground Truth

The ground truth for our cell type mixtures is given in the SpatialExperiment object rctdSim$proportions_spe. (The ground truth has been stored in the same format as the RCTD output, which will be discussed later.) Note that this ground truth will not be known in practice—it is what we will need to estimate with RCTD.

unique(t(assay(rctdSim$proportions_spe)))
#>      ct1 ct2 ct3
#> mx11 0.9 0.1 0.0
#> mx21 0.2 0.4 0.4

We find that there are two distinct cell type mixtures in our dataset:

  • 90% ct1 and 10% ct2.
  • 20% ct1 and 40% ct2 and ct3 each.

We can visualize these mixtures with the plotAllWeights function, which plots each pixel as a pie chart of cell types. We will return to this function later to visualize our RCTD results.

plotAllWeights(
    rctdSim$proportions_spe,
    r = 0.05, lwd = 0, title = "Ground Truth"
)


5 Running RCTD

Running RCTD is a two-step process:

  1. Preprocess the data using the createRctd function, which:
    • Filters pixels and genes based on UMI counts and other thresholds.
    • Identifies differentially expressed genes.
    • Creates cell type profiles from the reference data.
  2. Run the deconvolution with runRctd.

5.1 Step 1: Preprocess Data

Before running RCTD, we perform some preprocessing with createRctd. Some important configuration options include:

  • gene_cutoff, fc_cutoff: Used for filtering genes in platform effect normalization. gene_cutoff filters for average expression and fc_cutoff filters for log fold change across cell types.
  • gene_cutoff_reg, fc_cutoff_reg: Similar to above, but for the RCTD step.
  • UMI_min, UMI_max: Minimum and maximum UMI count per pixel.
# Preprocess data
rctd_data <- createRctd(spatial_spe, reference_se)

5.2 Step 2: Run RCTD

Now we can run RCTD on the preprocessed data with the runRctd function.

RCTD has three modes, each suited for different spatial technologies:

  • "doublet": Assigns 1-2 cell types per pixel. Best for high-resolution technologies like Slide-seq and MERFISH.
  • "multi": Like doublet mode but can assign more cell types (up to max_multi_types). Best for lower-resolution technologies like 100-micron Visium.
  • "full": No restrictions on number of cell types.

The rctd_mode argument is used to specify the mode in which the RCTD algorithm runs. Other important configuration options include:

  • max_cores: Number of cores used for parallel processing. We recommend setting this to at least 4 to improve efficiency. If set to 1, parallel processing is not used.
  • max_multi_types: Maximum number of cell types per pixel (multi mode only).

For the purposes of demonstration, we’ll decompose the data into doublets in this vignette.

results_spe <- runRctd(rctd_data, rctd_mode = "doublet", max_cores = 1)

6 RCTD Results

RCTD returns a SpatialExperiment object containing:

6.1 Cell Type Weights

We can view the doublet weights for our data:

# Cell type mixture 1
assay(results_spe, "weights")[, 1:6]
#> 3 x 6 sparse Matrix of class "dgCMatrix"
#>     mx11 mx12 mx13 mx14 mx15 mx16
#> ct1    1    1    1    1    1    1
#> ct2    .    .    .    .    .    .
#> ct3    .    .    .    .    .    .

# Cell type mixture 2
assay(results_spe, "weights")[, 7:12]
#> 3 x 6 sparse Matrix of class "dgCMatrix"
#>          mx21     mx22      mx23      mx24      mx25      mx26
#> ct1 .         .        .         .         .         .        
#> ct2 0.4935949 0.475943 0.4685428 0.4577326 0.5103718 0.4795314
#> ct3 0.5064051 0.524057 0.5314572 0.5422674 0.4896282 0.5204686

You can access the full, unrestricted weights using assay(results_spe, "weights_full"). We will visualize these weights later. You may use assay(results_spe, "weights_unconfident") to access the less confident predictions (e.g., the second_type weight for singlets and uncertain doublets).

6.2 Classifications

Below, we can view the results of RCTD deconvolution:

classification_df <- data.frame(
    pixel = colnames(assay(results_spe)),
    spot_class = colData(results_spe)$spot_class,
    first_type = colData(results_spe)$first_type,
    second_type = colData(results_spe)$second_type
)
classification_df
#>    pixel      spot_class first_type second_type
#> 1   mx11         singlet        ct1         ct2
#> 2   mx12         singlet        ct1         ct2
#> 3   mx13         singlet        ct1         ct2
#> 4   mx14         singlet        ct1         ct2
#> 5   mx15         singlet        ct1         ct2
#> 6   mx16         singlet        ct1         ct2
#> 7   mx21 doublet_certain        ct2         ct3
#> 8   mx22 doublet_certain        ct3         ct2
#> 9   mx23 doublet_certain        ct3         ct2
#> 10  mx24 doublet_certain        ct3         ct2
#> 11  mx25 doublet_certain        ct2         ct3
#> 12  mx26 doublet_certain        ct3         ct2

In particular, this data frame provides us with the following information:

  • spot_class: A factor variable with the following levels of classification:
    • singlet (1 cell type in the pixel)
    • doublet_certain (2 cell types in the pixel)
    • doublet_uncertain (2 cell types in the pixel, but only confident of 1)
    • reject (no prediction given for pixel)
    Typically, reject pixels should not be used, and doublet_uncertain pixels should only be used for applications that do not require knowledge of all cell types in a pixel.
  • first_type: the first cell type predicted on the bead (for all spot_class conditions except reject).
  • second_type: the second cell type predicted on the bead for doublet spot_class conditions (not a confident prediction for doublet_uncertain).

We can see that RCTD generally succeeded in identifying the components of the cell mixtures, although pixels with 90% ct1 and 10% ct2 were classified as singlets. (As an exercise, try changing the doublet_threshold parameter in runRctd so that these mixtures will be identified as doublets.)

7 Visualization

7.1 All Cell Types

We can visualize the results of our RCTD run with the plotAllWeights function. Simply pass in the RCTD results (and some aesthetic arguments), and the cell type proportions will be plotted as pie charts for each pixel.

plotAllWeights(results_spe, r = 0.05, lwd = 0, title = "Doublet Results")



Note that all mixtures with 90% ct1 (on the left) were classified as singlets, so only ct1 is plotted. And for the other mixtures (on the right), only cell types ct2 and ct3 are plotted (without any ct1) due to the restrictions of doublet mode. We can visualize the full, unrestricted proportions by passing in the name of the relevant assay ("weights_full") to the plotAllWeights function:

plotAllWeights(
    results_spe, "weights_full", r = 0.05, lwd = 0, title = "Full Results"
)



This is very close to the ground truth!

7.2 Single Cell Type

Finally, we can visualize how the proportion of a specific cell type varies across space with the plotCellTypeWeight function. We show an example with the dummy cell type ct1.

plotCellTypeWeight(
    results_spe, "ct1", "weights_full",
    size = 5, stroke = 0.5, title = "\nDensity of Cell Type 1\n"
)



You are now ready to apply RCTD to your own spatial transcriptomics datasets!

7.3 Session Information

sessionInfo()
#> R Under development (unstable) (2025-03-13 r87965)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.2 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: America/New_York
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] spacexr_0.99.3              SpatialExperiment_1.17.0   
#>  [3] SingleCellExperiment_1.29.2 SummarizedExperiment_1.37.0
#>  [5] Biobase_2.67.0              GenomicRanges_1.59.1       
#>  [7] GenomeInfoDb_1.43.4         IRanges_2.41.3             
#>  [9] S4Vectors_0.45.4            BiocGenerics_0.53.6        
#> [11] generics_0.1.3              MatrixGenerics_1.19.1      
#> [13] matrixStats_1.5.0           ggplot2_3.5.1              
#> [15] BiocStyle_2.35.0           
#> 
#> loaded via a namespace (and not attached):
#>  [1] tidyselect_1.2.1        dplyr_1.1.4             farver_2.1.2           
#>  [4] blob_1.2.4              filelock_1.0.3          fastmap_1.2.0          
#>  [7] BiocFileCache_2.15.1    tweenr_2.0.3            digest_0.6.37          
#> [10] lifecycle_1.0.4         RSQLite_2.3.9           magrittr_2.0.3         
#> [13] compiler_4.6.0          rlang_1.1.5             sass_0.4.9             
#> [16] tools_4.6.0             yaml_2.3.10             knitr_1.50             
#> [19] S4Arrays_1.7.3          labeling_0.4.3          bit_4.6.0              
#> [22] curl_6.2.2              DelayedArray_0.33.6     abind_1.4-8            
#> [25] withr_3.0.2             purrr_1.0.4             grid_4.6.0             
#> [28] polyclip_1.10-7         colorspace_2.1-1        scales_1.3.0           
#> [31] MASS_7.3-65             tinytex_0.56            cli_3.6.4              
#> [34] rmarkdown_2.29          crayon_1.5.3            httr_1.4.7             
#> [37] rjson_0.2.23            DBI_1.2.3               cachem_1.1.0           
#> [40] ggforce_0.4.2           BiocManager_1.30.25     XVector_0.47.2         
#> [43] vctrs_0.6.5             yulab.utils_0.2.0       Matrix_1.7-3           
#> [46] jsonlite_2.0.0          bookdown_0.42           bit64_4.6.0-1          
#> [49] magick_2.8.6            jquerylib_0.1.4         tidyr_1.3.1            
#> [52] glue_1.8.0              gtable_0.3.6            UCSC.utils_1.3.1       
#> [55] quadprog_1.5-8          munsell_0.5.1           tibble_3.2.1           
#> [58] pillar_1.10.2           htmltools_0.5.8.1       GenomeInfoDbData_1.2.14
#> [61] R6_2.6.1                dbplyr_2.5.0            evaluate_1.0.3         
#> [64] lattice_0.22-7          memoise_2.0.1           ggfun_0.1.8            
#> [67] bslib_0.9.0             Rcpp_1.0.14             SparseArray_1.7.7      
#> [70] scatterpie_0.2.4        xfun_0.52               fs_1.6.5               
#> [73] pkgconfig_2.0.3