The imcdatasets package provides access to publicly available datasets generated using imaging mass cytometry (IMC) (???).
IMC is a technology that enables measurement of up to 50 markers from tissue sections at a resolution of 1 \(\mu m\) (???). In classical processing pipelines, such as the ImcSegmentationPipeline or steinbock, the multichannel images are segmented to generate cells masks. These masks are then used to extract single cell features from the multichannel images.
Each dataset in imcdatasets
is composed of three elements that can be
retrieved separately:
1. Single-cell data in the form of a SingleCellExperiment
or
SpatialExperiment
class object (named sce.rds
).
2. Multichannel images in the form of a CytoImageList
class object (named
images.rds
).
3. Cell segmentation masks in the form of a CytoImageList
class object (named
masks.rds
).
The listDatasets()
function returns all available datasets in imcdatasets
,
along with associated information. The FunctionCall
column gives the name of
the R function that enables to load the dataset.
datasets <- listDatasets()
datasets <- as.data.frame(datasets)
datasets$FunctionCall <- sprintf("`%s`", datasets$FunctionCall)
knitr::kable(datasets)
FunctionCall | Species | Tissue | NumberOfCells | NumberOfImages | NumberOfChannels | Reference |
---|---|---|---|---|---|---|
Damond_2019_Pancreas() |
Human | Pancreas | 252059 | 100 | 38 | (???) |
HochSchulz_2022_Melanoma() |
Human | Metastatic melanoma | 325881 | 50 | 41 | (???) |
JacksonFischer_2020_BreastCancer() |
Human | Primary breast tumour | 285851 | 100 | 42 | (???) |
Zanotelli_2020_Spheroids() |
Human | Cell line spheroids | 229047 | 517 | 51 | (???) |
IMMUcan_2022_CancerExample() |
Human | Primary tumor | 46825 | 14 | 40 | None |
Users can import the datasets by calling a single function and specifying the type of data to retrieve. The following examples highlight accessing an example dataset linked to the IMMUcan project.
Importing single-cell expression data and metadata
sce <- IMMUcan_2022_CancerExample("sce")
sce
## class: SingleCellExperiment
## dim: 40 47794
## metadata(5): color_vectors cluster_codes SOM_codes delta_area
## filterSpatialContext
## assays(2): counts exprs
## rownames(40): MPO H3 ... DNA1 DNA2
## rowData names(17): channel metal ... ilastik deepcell
## colnames(47794): 1_1 1_2 ... 14_2844 14_2845
## colData names(43): sample_id ObjectNumber ... cell_x cell_y
## reducedDimNames(8): UMAP TSNE ... seurat UMAP_seurat
## mainExpName: IMMUcan_2022_CancerExample_v1
## altExpNames(0):
Importing multichannel images
images <- IMMUcan_2022_CancerExample("images")
images
## CytoImageList containing 14 image(s)
## names(14): Patient1_001 Patient1_002 Patient1_003 Patient2_001 Patient2_002 Patient2_003 Patient2_004 Patient3_001 Patient3_002 Patient3_003 Patient4_005 Patient4_006 Patient4_007 Patient4_008
## Each image contains 40 channel(s)
## channelNames(40): MPO H3 SMA CD16 CD38 HLA_DR CD27 CD15 CD45RA CD163 B2M CD20 CD68 IDO1 CD3e LAG3 CD11c PD_1 PDGFRB CD7 GZMB PD_L1 TCF7 CD45RO FOXP3 ICOS CD8a CA9 CD33 Ki67 VISTA CD40 CD4 CD14 CDH1 CD303 CD206 c_PARP DNA1 DNA2
Importing cell segmentation masks
masks <- IMMUcan_2022_CancerExample("masks")
masks
## CytoImageList containing 14 image(s)
## names(14): Patient1_001 Patient1_002 Patient1_003 Patient2_001 Patient2_002 Patient2_003 Patient2_004 Patient3_001 Patient3_002 Patient3_003 Patient4_005 Patient4_006 Patient4_007 Patient4_008
## Each image contains 1 channel
On disk storage
Objects containing multi-channel images and segmentation masks can furthermore be stored on disk rather than in memory. Nevertheless, they need to be loaded into memory once before writing them to disk. This process takes longer than keeping them in memory but reduces memory requirements during downstream analysis.
To write images or masks to disk, set on_disk = TRUE
and specify a path
where images/masks will be stored as .h5 files:
# Create temporary location
cur_path <- tempdir()
masks <- IMMUcan_2022_CancerExample(data_type = "masks", on_disk = TRUE,
h5FilesPath = cur_path)
masks
## CytoImageList containing 14 image(s)
## names(14): Patient1_001 Patient1_002 Patient1_003 Patient2_001 Patient2_002 Patient2_003 Patient2_004 Patient3_001 Patient3_002 Patient3_003 Patient4_005 Patient4_006 Patient4_007 Patient4_008
## Each image contains 1 channel
Additional information about each dataset is available in the help page:
?IMMUcan_2022_CancerExample
The metadata associated with a specific data object can be displayed as follows:
IMMUcan_2022_CancerExample(data_type = "sce", metadata = TRUE)
IMMUcan_2022_CancerExample(data_type = "images", metadata = TRUE)
IMMUcan_2022_CancerExample(data_type = "masks", metadata = TRUE)
The SingleCellExperiment
class objects can be used for data analysis. For
more information, please refer to the SingleCellExperiment
package and to the Orchestrating Single-Cell Analysis with Bioconductor workflow.
The CytoImageList
class objects can be used for plotting cell and pixel
information. Some typical use cases are given below. For more information,
please see the cytomapper package and the
associated vignette.
Subsetting the images and masks
cur_images <- images[1:5]
cur_masks <- masks[1:5]
Plotting pixel information
The images
objects can be used to display pixel-level data.
plotPixels(
cur_images,
colour_by = c("CD8a", "CD68", "CDH1"),
bcg = list(
CD8a = c(0,4,1),
CD68 = c(0,5,1),
CDH1 = c(0,5,1)
)
)
Plotting cell information
The masks
and sce
objects can be combined to display cell-level data.
plotCells(
cur_masks, object = sce,
img_id = "image_number", cell_id = "cell_number",
colour_by = c("CD8a", "CD68", "CDH1"),
exprs_values = "exprs"
)
Outlining cells on images
Cell information can be displayed on top of images by combining the images
,
masks
and sce
objects.
plotPixels(
cur_images, mask = cur_masks, object = sce,
img_id = "image_number", cell_id = "cell_number",
outline_by = "cell_type",
colour_by = c("CD8a", "CD68", "CDH1"),
bcg = list(
CD8a = c(0,5,1),
CD68 = c(0,5,1),
CDH1 = c(0,5,1)
)
)
## R version 4.3.1 (2023-06-16)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.3 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.18-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] imcdatasets_1.10.0 SpatialExperiment_1.12.0
## [3] cytomapper_1.14.0 EBImage_4.44.0
## [5] SingleCellExperiment_1.24.0 SummarizedExperiment_1.32.0
## [7] Biobase_2.62.0 GenomicRanges_1.54.0
## [9] GenomeInfoDb_1.38.0 IRanges_2.36.0
## [11] S4Vectors_0.40.0 BiocGenerics_0.48.0
## [13] MatrixGenerics_1.14.0 matrixStats_1.0.0
## [15] BiocStyle_2.30.0
##
## loaded via a namespace (and not attached):
## [1] DBI_1.1.3 bitops_1.0-7
## [3] gridExtra_2.3 rlang_1.1.1
## [5] magrittr_2.0.3 svgPanZoom_0.3.4
## [7] shinydashboard_0.7.2 RSQLite_2.3.1
## [9] compiler_4.3.1 png_0.1-8
## [11] systemfonts_1.0.5 fftwtools_0.9-11
## [13] vctrs_0.6.4 pkgconfig_2.0.3
## [15] crayon_1.5.2 fastmap_1.1.1
## [17] dbplyr_2.3.4 magick_2.8.1
## [19] XVector_0.42.0 ellipsis_0.3.2
## [21] utf8_1.2.4 promises_1.2.1
## [23] rmarkdown_2.25 ggbeeswarm_0.7.2
## [25] purrr_1.0.2 bit_4.0.5
## [27] xfun_0.40 zlibbioc_1.48.0
## [29] cachem_1.0.8 jsonlite_1.8.7
## [31] blob_1.2.4 later_1.3.1
## [33] rhdf5filters_1.14.0 DelayedArray_0.28.0
## [35] interactiveDisplayBase_1.40.0 Rhdf5lib_1.24.0
## [37] BiocParallel_1.36.0 jpeg_0.1-10
## [39] tiff_0.1-11 terra_1.7-55
## [41] parallel_4.3.1 R6_2.5.1
## [43] bslib_0.5.1 RColorBrewer_1.1-3
## [45] jquerylib_0.1.4 Rcpp_1.0.11
## [47] bookdown_0.36 knitr_1.44
## [49] httpuv_1.6.12 Matrix_1.6-1.1
## [51] nnls_1.5 tidyselect_1.2.0
## [53] abind_1.4-5 yaml_2.3.7
## [55] viridis_0.6.4 codetools_0.2-19
## [57] curl_5.1.0 lattice_0.22-5
## [59] tibble_3.2.1 withr_2.5.1
## [61] KEGGREST_1.42.0 shiny_1.7.5.1
## [63] evaluate_0.22 BiocFileCache_2.10.0
## [65] Biostrings_2.70.1 ExperimentHub_2.10.0
## [67] filelock_1.0.2 pillar_1.9.0
## [69] BiocManager_1.30.22 generics_0.1.3
## [71] sp_2.1-1 RCurl_1.98-1.12
## [73] BiocVersion_3.18.0 ggplot2_3.4.4
## [75] munsell_0.5.0 scales_1.2.1
## [77] xtable_1.8-4 glue_1.6.2
## [79] tools_4.3.1 AnnotationHub_3.10.0
## [81] locfit_1.5-9.8 rhdf5_2.46.0
## [83] grid_4.3.1 AnnotationDbi_1.64.0
## [85] colorspace_2.1-0 GenomeInfoDbData_1.2.11
## [87] raster_3.6-26 beeswarm_0.4.0
## [89] HDF5Array_1.30.0 vipor_0.4.5
## [91] cli_3.6.1 rappdirs_0.3.3
## [93] fansi_1.0.5 S4Arrays_1.2.0
## [95] viridisLite_0.4.2 svglite_2.1.2
## [97] dplyr_1.1.3 gtable_0.3.4
## [99] sass_0.4.7 digest_0.6.33
## [101] SparseArray_1.2.0 rjson_0.2.21
## [103] htmlwidgets_1.6.2 memoise_2.0.1
## [105] htmltools_0.5.6.1 lifecycle_1.0.3
## [107] httr_1.4.7 mime_0.12
## [109] bit64_4.0.5