1 Introduction

The imcdatasets package provides access to publicly available datasets generated using imaging mass cytometry (IMC) (???).

IMC is a technology that enables measurement of up to 50 markers from tissue sections at a resolution of 1 \(\mu m\) (???). In classical processing pipelines, such as the ImcSegmentationPipeline or steinbock, the multichannel images are segmented to generate cells masks. These masks are then used to extract single cell features from the multichannel images.

Each dataset in imcdatasets is composed of three elements that can be retrieved separately:
1. Single-cell data in the form of a SingleCellExperiment or SpatialExperiment class object (named sce.rds).
2. Multichannel images in the form of a CytoImageList class object (named images.rds).
3. Cell segmentation masks in the form of a CytoImageList class object (named masks.rds).

2 Available datasets

The listDatasets() function returns all available datasets in imcdatasets, along with associated information. The FunctionCall column gives the name of the R function that enables to load the dataset.

datasets <- listDatasets()
datasets <- as.data.frame(datasets)
datasets$FunctionCall <- sprintf("`%s`", datasets$FunctionCall)
knitr::kable(datasets)
FunctionCall Species Tissue NumberOfCells NumberOfImages NumberOfChannels Reference
Damond_2019_Pancreas() Human Pancreas 252059 100 38 (???)
HochSchulz_2022_Melanoma() Human Metastatic melanoma 325881 50 41 (???)
JacksonFischer_2020_BreastCancer() Human Primary breast tumour 285851 100 42 (???)
Zanotelli_2020_Spheroids() Human Cell line spheroids 229047 517 51 (???)
IMMUcan_2022_CancerExample() Human Primary tumor 46825 14 40 None

3 Retrieving data

Users can import the datasets by calling a single function and specifying the type of data to retrieve. The following examples highlight accessing an example dataset linked to the IMMUcan project.

Importing single-cell expression data and metadata

sce <- IMMUcan_2022_CancerExample("sce")
sce
## class: SingleCellExperiment 
## dim: 40 47794 
## metadata(5): color_vectors cluster_codes SOM_codes delta_area
##   filterSpatialContext
## assays(2): counts exprs
## rownames(40): MPO H3 ... DNA1 DNA2
## rowData names(17): channel metal ... ilastik deepcell
## colnames(47794): 1_1 1_2 ... 14_2844 14_2845
## colData names(43): sample_id ObjectNumber ... cell_x cell_y
## reducedDimNames(8): UMAP TSNE ... seurat UMAP_seurat
## mainExpName: IMMUcan_2022_CancerExample_v1
## altExpNames(0):

Importing multichannel images

images <- IMMUcan_2022_CancerExample("images")
images
## CytoImageList containing 14 image(s)
## names(14): Patient1_001 Patient1_002 Patient1_003 Patient2_001 Patient2_002 Patient2_003 Patient2_004 Patient3_001 Patient3_002 Patient3_003 Patient4_005 Patient4_006 Patient4_007 Patient4_008 
## Each image contains 40 channel(s)
## channelNames(40): MPO H3 SMA CD16 CD38 HLA_DR CD27 CD15 CD45RA CD163 B2M CD20 CD68 IDO1 CD3e LAG3 CD11c PD_1 PDGFRB CD7 GZMB PD_L1 TCF7 CD45RO FOXP3 ICOS CD8a CA9 CD33 Ki67 VISTA CD40 CD4 CD14 CDH1 CD303 CD206 c_PARP DNA1 DNA2

Importing cell segmentation masks

masks <- IMMUcan_2022_CancerExample("masks")
masks
## CytoImageList containing 14 image(s)
## names(14): Patient1_001 Patient1_002 Patient1_003 Patient2_001 Patient2_002 Patient2_003 Patient2_004 Patient3_001 Patient3_002 Patient3_003 Patient4_005 Patient4_006 Patient4_007 Patient4_008 
## Each image contains 1 channel

On disk storage

Objects containing multi-channel images and segmentation masks can furthermore be stored on disk rather than in memory. Nevertheless, they need to be loaded into memory once before writing them to disk. This process takes longer than keeping them in memory but reduces memory requirements during downstream analysis.

To write images or masks to disk, set on_disk = TRUE and specify a path where images/masks will be stored as .h5 files:

# Create temporary location
cur_path <- tempdir()

masks <- IMMUcan_2022_CancerExample(data_type = "masks", on_disk = TRUE,
    h5FilesPath = cur_path)
masks
## CytoImageList containing 14 image(s)
## names(14): Patient1_001 Patient1_002 Patient1_003 Patient2_001 Patient2_002 Patient2_003 Patient2_004 Patient3_001 Patient3_002 Patient3_003 Patient4_005 Patient4_006 Patient4_007 Patient4_008 
## Each image contains 1 channel

4 Dataset info and metadata

Additional information about each dataset is available in the help page:

?IMMUcan_2022_CancerExample

The metadata associated with a specific data object can be displayed as follows:

IMMUcan_2022_CancerExample(data_type = "sce", metadata = TRUE)
IMMUcan_2022_CancerExample(data_type = "images", metadata = TRUE)
IMMUcan_2022_CancerExample(data_type = "masks", metadata = TRUE)

5 Usage

The SingleCellExperiment class objects can be used for data analysis. For more information, please refer to the SingleCellExperiment package and to the Orchestrating Single-Cell Analysis with Bioconductor workflow.

The CytoImageList class objects can be used for plotting cell and pixel information. Some typical use cases are given below. For more information, please see the cytomapper package and the associated vignette.

Subsetting the images and masks

cur_images <- images[1:5]
cur_masks <- masks[1:5]

Plotting pixel information

The images objects can be used to display pixel-level data.

plotPixels(
    cur_images,
    colour_by = c("CD8a", "CD68", "CDH1"),
    bcg = list(
        CD8a = c(0,4,1),
        CD68 = c(0,5,1),
        CDH1 = c(0,5,1)
    )
)