Here, we demonstrate BANKSY analysis on 10x Visium data of the human dorsolateral prefrontal cortex from Maynard et al (2018). The data comprise 12 samples obtained from 3 subjects, with manual annotation of the layers in each sample. We will focus on 4 of the 12 samples from subject 3, demonstrating multi-sample analysis with BANKSY.
library(Banksy)
library(SummarizedExperiment)
library(SpatialExperiment)
library(Seurat)
library(scater)
library(cowplot)
library(ggplot2)
We fetch the data for all 12 DLPFC samples with the spatialLIBD package. This might take awhile.
library(spatialLIBD)
library(ExperimentHub)
ehub <- ExperimentHub::ExperimentHub()
spe <- spatialLIBD::fetch_data(type = "spe", eh = ehub)
After the download is completed, we trim the SpatialExperiment object, retaining only the counts and some metadata such as the sample identifier and pathology annotations. This saves some memory.
imgData(spe) <- NULL
assay(spe, "logcounts") <- NULL
reducedDims(spe) <- NULL
rowData(spe) <- NULL
colData(spe) <- DataFrame(
sample_id = spe$sample_id,
clust_annotation = factor(
addNA(spe$layer_guess_reordered_short),
exclude = NULL, labels = seq(8)
),
in_tissue = spe$in_tissue,
row.names = colnames(spe)
)
invisible(gc())
Next, subset spe
to samples from the last subject (samples 151673
,
151674
, 151675
, 151676
). This stores each sample in its own
SpatialExperiment object, all placed in a list.
sample_names <- as.character(151673:151676)
spe_list <- lapply(sample_names, function(x) spe[, spe$sample_id == x])
rm(spe)
invisible(gc())
Using Seurat, we perform basic normalisation of the data, and select the top 2000 highly variable features from each sample. Other methods for normalisation and feature selection may also be used. We take the union of these features for downstream analysis.
#' Normalize data
seu_list <- lapply(spe_list, function(x) {
x <- as.Seurat(x, data = NULL)
NormalizeData(x, scale.factor = 5000, normalization.method = 'RC')
})
#' Compute HVGs
hvgs <- lapply(seu_list, function(x) {
VariableFeatures(FindVariableFeatures(x, nfeatures = 2000))
})
hvgs <- Reduce(union, hvgs)
#' Add data to SpatialExperiment and subset to HVGs
aname <- "normcounts"
spe_list <- Map(function(spe, seu) {
assay(spe, aname) <- GetAssayData(seu)
spe[hvgs,]
}, spe_list, seu_list)
rm(seu_list)
invisible(gc())
To run BANKSY across multiple samples, we first compute the BANKSY neighborhood
feature matrices for each sample separately. We use k_geom=6
corresponding to
the first-order neighbors in 10x Visium assays (k_geom=18
corresponding to
first and second-order neighbors may also be used).
compute_agf <- FALSE
k_geom <- 6
spe_list <- lapply(spe_list, computeBanksy, assay_name = aname,
compute_agf = compute_agf, k_geom = k_geom)
We then merge the samples to perform joint dimensional reduction and clustering:
spe_joint <- do.call(cbind, spe_list)
rm(spe_list)
invisible(gc())
When running multi-sample BANKSY PCA, the group
argument may be provided.
This specifies the grouping variable for the cells or spots across the samples.
Features belonging to cells or spots corresponding to each level of the
grouping variable will be z-scaled separately. In this case, sample_id
in
colData(spe_joint)
gives the grouping based on the sample of origin.
lambda <- 0.2
use_agf <- FALSE
spe_joint <- runBanksyPCA(spe_joint, use_agf = use_agf, lambda = lambda, group = "sample_id", seed = 1000)
Run UMAP on the BANKSY embedding:
spe_joint <- runBanksyUMAP(spe_joint, use_agf = use_agf, lambda = lambda, seed = 1000)
Finally, obtain cluster labels for spots across all 4 samples. We use
connectClusters
for visual comparison of the manual annotations and BANKSY
clusters.
res <- 0.7
spe_joint <- clusterBanksy(spe_joint, use_agf = use_agf, lambda = lambda, resolution = res, seed = 1000)
cnm <- sprintf("clust_M%s_lam%s_k50_res%s", as.numeric(use_agf), lambda, res)
spe_joint <- connectClusters(spe_joint)
Once joint clustering is performed, we split the samples into their own
SpatialExperiment
objects:
spe_list <- lapply(sample_names, function(x) spe_joint[, spe_joint$sample_id == x])
rm(spe_joint)
invisible(gc())
As an optional step, we smooth the cluster labels of each sample separately. This can be done if smooth spatial domains are expected in the biological sample or tissue in question.
spe_list <- lapply(spe_list, smoothLabels, cluster_names = cnm, k = 6L, verbose = FALSE)
names(spe_list) <- paste0("sample_", sample_names)
The raw and smoothed cluster labels are stored in the colData
slot of each
SingleCellExperiment
or SpatialExperiment
object.
#> DataFrame with 6 rows and 5 columns
#> sample_id clust_annotation in_tissue
#> <character> <factor> <logical>
#> AAACAAGTATCTCCCA-1 151673 3 TRUE
#> AAACAATCTACTAGCA-1 151673 1 TRUE
#> AAACACCAATAACTGC-1 151673 7 TRUE
#> AAACAGAGCGACTCCT-1 151673 3 TRUE
#> AAACAGCTTTCAGAAG-1 151673 5 TRUE
#> AAACAGGGTCTATATT-1 151673 6 TRUE
#> clust_M0_lam0.2_k50_res0.7 clust_M0_lam0.2_k50_res0.7_smooth
#> <factor> <factor>
#> AAACAAGTATCTCCCA-1 3 3
#> AAACAATCTACTAGCA-1 1 1
#> AAACACCAATAACTGC-1 7 7
#> AAACAGAGCGACTCCT-1 3 3
#> AAACAGCTTTCAGAAG-1 6 6
#> AAACAGGGTCTATATT-1 9 9
We can compare BANKSY clusters to pathology annotations using several cluster
comparison measures such as the adjusted Rand index (ARI) or normalized mutual
information (NMI) with compareClusters
. The function computes the selected
comparison measure between all pairs of cluster labels:
compareClusters(spe_list$sample_151673, func = 'ARI')
#> clust_annotation clust_M0_lam0.2_k50_res0.7
#> clust_annotation 1.000 0.535
#> clust_M0_lam0.2_k50_res0.7 0.535 1.000
#> clust_M0_lam0.2_k50_res0.7_smooth 0.554 0.883
#> clust_M0_lam0.2_k50_res0.7_smooth
#> clust_annotation 0.554
#> clust_M0_lam0.2_k50_res0.7 0.883
#> clust_M0_lam0.2_k50_res0.7_smooth 1.000
We evaluate the ARI and NMI for each sample:
ari <- sapply(spe_list, function(x) as.numeric(tail(compareClusters(x, func = "ARI")[, 1], n = 1)))
ari
#> sample_151673 sample_151674 sample_151675 sample_151676
#> 0.554 0.557 0.495 0.516
nmi <- sapply(spe_list, function(x) as.numeric(tail(compareClusters(x, func = "NMI")[, 1], n = 1)))
nmi
#> sample_151673 sample_151674 sample_151675 sample_151676
#> 0.667 0.672 0.636 0.635
Visualise pathology annotation and BANKSY cluster on spatial coordinates with the ggspavis package:
# Use scater:::.get_palette('tableau10medium')
pal <- c(
"#729ECE", "#FF9E4A", "#67BF5C", "#ED665D", "#AD8BC9",
"#A8786E", "#ED97CA", "#A2A2A2", "#CDCC5D", "#6DCCDA"
)
plot_bank <- lapply(spe_list, function(x) {
df <- cbind.data.frame(
clust=colData(x)[[sprintf("%s_smooth", cnm)]], spatialCoords(x))
ggplot(df, aes(x=pxl_row_in_fullres, y=pxl_col_in_fullres, col=clust)) +
geom_point(size = 0.5) +
scale_color_manual(values = pal) +
theme_classic() +
theme(
legend.position = "none",
axis.text.x=element_blank(),
axis.text.y=element_blank(),
axis.ticks=element_blank(),
axis.title.x=element_blank(),
axis.title.y=element_blank()) +
labs(title = "BANKSY clusters") +
coord_equal()
})
plot_anno <- lapply(spe_list, function(x) {
df <- cbind.data.frame(
clust=colData(x)[['clust_annotation']], spatialCoords(x))
ggplot(df, aes(x=pxl_row_in_fullres, y=pxl_col_in_fullres, col=clust)) +
geom_point(size = 0.5) +
scale_color_manual(values = pal) +
theme_classic() +
theme(
legend.position = "none",
axis.text.x=element_blank(),
axis.text.y=element_blank(),
axis.ticks=element_blank(),
axis.title.x=element_blank(),
axis.title.y=element_blank()) +
labs(title = sprintf("Sample %s", x$sample_id[1])) +
coord_equal()
})
plot_list <- c(plot_anno, plot_bank)
plot_grid(plotlist = plot_list, ncol = 4, byrow = TRUE)
Visualize joint UMAPs for each sample:
umap_bank <- lapply(spe_list, function(x) {
plotReducedDim(x,
"UMAP_M0_lam0.2",
colour_by = sprintf("%s_smooth", cnm),
point_size = 0.5
) +
theme(legend.position = "none") +
labs(title = "BANKSY clusters")
})
umap_anno <- lapply(spe_list, function(x) {
plotReducedDim(x,
"UMAP_M0_lam0.2",
colour_by = "clust_annotation",
point_size = 0.5
) +
theme(legend.position = "none") +
labs(title = sprintf("Sample %s", x$sample_id[1]))
})
umap_list <- c(umap_anno, umap_bank)
plot_grid(plotlist = umap_list, ncol = 4, byrow = TRUE)
Vignette runtime:
#> Time difference of 3.050704 mins
sessionInfo()
#> R version 4.4.0 RC (2024-04-16 r86468)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 22.04.4 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.20-bioc/R/lib/libRblas.so
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: America/New_York
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] ExperimentHub_2.13.0 AnnotationHub_3.13.0
#> [3] BiocFileCache_2.13.0 dbplyr_2.5.0
#> [5] spatialLIBD_1.15.4 cowplot_1.1.3
#> [7] scater_1.33.0 ggplot2_3.5.1
#> [9] harmony_1.2.0 Rcpp_1.0.12
#> [11] data.table_1.15.4 scran_1.33.0
#> [13] scuttle_1.15.0 Seurat_5.0.3
#> [15] SeuratObject_5.0.1 sp_2.1-4
#> [17] SpatialExperiment_1.15.0 SingleCellExperiment_1.27.0
#> [19] SummarizedExperiment_1.35.0 Biobase_2.65.0
#> [21] GenomicRanges_1.57.0 GenomeInfoDb_1.41.0
#> [23] IRanges_2.39.0 S4Vectors_0.43.0
#> [25] BiocGenerics_0.51.0 MatrixGenerics_1.17.0
#> [27] matrixStats_1.3.0 Banksy_1.1.0
#> [29] BiocStyle_2.33.0
#>
#> loaded via a namespace (and not attached):
#> [1] bitops_1.0-7 spatstat.sparse_3.0-3
#> [3] doParallel_1.0.17 httr_1.4.7
#> [5] RColorBrewer_1.1-3 tools_4.4.0
#> [7] sctransform_0.4.1 DT_0.33
#> [9] utf8_1.2.4 R6_2.5.1
#> [11] lazyeval_0.2.2 uwot_0.2.2
#> [13] withr_3.0.0 gridExtra_2.3
#> [15] progressr_0.14.0 cli_3.6.2
#> [17] spatstat.explore_3.2-7 fastDummies_1.7.3
#> [19] labeling_0.4.3 sass_0.4.9
#> [21] spatstat.data_3.0-4 ggridges_0.5.6
#> [23] pbapply_1.7-2 Rsamtools_2.21.0
#> [25] dbscan_1.1-12 aricode_1.0.3
#> [27] dichromat_2.0-0.1 sessioninfo_1.2.2
#> [29] parallelly_1.37.1 attempt_0.3.1
#> [31] maps_3.4.2 limma_3.61.0
#> [33] pals_1.8 RSQLite_2.3.6
#> [35] BiocIO_1.15.0 generics_0.1.3
#> [37] ica_1.0-3 spatstat.random_3.2-3
#> [39] dplyr_1.1.4 Matrix_1.7-0
#> [41] ggbeeswarm_0.7.2 fansi_1.0.6
#> [43] abind_1.4-5 lifecycle_1.0.4
#> [45] yaml_2.3.8 edgeR_4.3.0
#> [47] SparseArray_1.5.0 Rtsne_0.17
#> [49] paletteer_1.6.0 grid_4.4.0
#> [51] blob_1.2.4 promises_1.3.0
#> [53] dqrng_0.3.2 crayon_1.5.2
#> [55] miniUI_0.1.1.1 lattice_0.22-6
#> [57] beachmat_2.21.0 mapproj_1.2.11
#> [59] KEGGREST_1.45.0 magick_2.8.3
#> [61] pillar_1.9.0 knitr_1.46
#> [63] metapod_1.13.0 rjson_0.2.21
#> [65] future.apply_1.11.2 codetools_0.2-20
#> [67] leiden_0.4.3.1 glue_1.7.0
#> [69] vctrs_0.6.5 png_0.1-8
#> [71] spam_2.10-0 gtable_0.3.5
#> [73] rematch2_2.1.2 cachem_1.0.8
#> [75] xfun_0.43 S4Arrays_1.5.0
#> [77] mime_0.12 survival_3.6-4
#> [79] RcppHungarian_0.3 iterators_1.0.14
#> [81] tinytex_0.50 fields_15.2
#> [83] statmod_1.5.0 bluster_1.15.0
#> [85] fitdistrplus_1.1-11 ROCR_1.0-11
#> [87] nlme_3.1-164 bit64_4.0.5
#> [89] filelock_1.0.3 RcppAnnoy_0.0.22
#> [91] bslib_0.7.0 irlba_2.3.5.1
#> [93] vipor_0.4.7 KernSmooth_2.23-22
#> [95] colorspace_2.1-0 DBI_1.2.2
#> [97] tidyselect_1.2.1 bit_4.0.5
#> [99] compiler_4.4.0 curl_5.2.1
#> [101] BiocNeighbors_1.23.0 DelayedArray_0.31.0
#> [103] plotly_4.10.4 rtracklayer_1.65.0
#> [105] bookdown_0.39 scales_1.3.0
#> [107] lmtest_0.9-40 rappdirs_0.3.3
#> [109] stringr_1.5.1 digest_0.6.35
#> [111] goftest_1.2-3 spatstat.utils_3.0-4
#> [113] rmarkdown_2.26 benchmarkmeData_1.0.4
#> [115] RhpcBLASctl_0.23-42 XVector_0.45.0
#> [117] htmltools_0.5.8.1 pkgconfig_2.0.3
#> [119] sparseMatrixStats_1.17.0 highr_0.10
#> [121] fastmap_1.1.1 rlang_1.1.3
#> [123] htmlwidgets_1.6.4 UCSC.utils_1.1.0
#> [125] shiny_1.8.1.1 DelayedMatrixStats_1.27.0
#> [127] farver_2.1.1 jquerylib_0.1.4
#> [129] zoo_1.8-12 jsonlite_1.8.8
#> [131] BiocParallel_1.39.0 mclust_6.1.1
#> [133] config_0.3.2 RCurl_1.98-1.14
#> [135] BiocSingular_1.21.0 magrittr_2.0.3
#> [137] GenomeInfoDbData_1.2.12 dotCall64_1.1-1
#> [139] patchwork_1.2.0 munsell_0.5.1
#> [141] viridis_0.6.5 reticulate_1.36.1
#> [143] leidenAlg_1.1.3 stringi_1.8.3
#> [145] zlibbioc_1.51.0 MASS_7.3-60.2
#> [147] plyr_1.8.9 parallel_4.4.0
#> [149] listenv_0.9.1 ggrepel_0.9.5
#> [151] deldir_2.0-4 Biostrings_2.73.0
#> [153] sccore_1.0.5 splines_4.4.0
#> [155] tensor_1.5 locfit_1.5-9.9
#> [157] igraph_2.0.3 spatstat.geom_3.2-9
#> [159] RcppHNSW_0.6.0 reshape2_1.4.4
#> [161] ScaledMatrix_1.13.0 XML_3.99-0.16.1
#> [163] BiocVersion_3.20.0 evaluate_0.23
#> [165] golem_0.4.1 BiocManager_1.30.22
#> [167] foreach_1.5.2 httpuv_1.6.15
#> [169] RANN_2.6.1 tidyr_1.3.1
#> [171] purrr_1.0.2 polyclip_1.10-6
#> [173] benchmarkme_1.0.8 future_1.33.2
#> [175] scattermore_1.2 rsvd_1.0.5
#> [177] xtable_1.8-4 restfulr_0.0.15
#> [179] RSpectra_0.16-1 later_1.3.2
#> [181] viridisLite_0.4.2 tibble_3.2.1
#> [183] GenomicAlignments_1.41.0 memoise_2.0.1
#> [185] beeswarm_0.4.0 AnnotationDbi_1.67.0
#> [187] cluster_2.1.6 shinyWidgets_0.8.6
#> [189] globals_0.16.3