1 Introduction

The imcdatasets package provides access to publicly available datasets generated using imaging mass cytometry (IMC) (???).

IMC is a technology that enables measurement of up to 50 markers from tissue sections at a resolution of 1 \(\mu m\) (???). In classical processing pipelines, such as the ImcSegmentationPipeline or steinbock, the multichannel images are segmented to generate cells masks. These masks are then used to extract single cell features from the multichannel images.

Each dataset in imcdatasets is composed of three elements that can be retrieved separately:
1. Single-cell data in the form of a SingleCellExperiment or SpatialExperiment class object (named sce.rds).
2. Multichannel images in the form of a CytoImageList class object (named images.rds).
3. Cell segmentation masks in the form of a CytoImageList class object (named masks.rds).

2 Available datasets

The listDatasets() function returns all available datasets in imcdatasets, along with associated information. The FunctionCall column gives the name of the R function that enables to load the dataset.

datasets <- listDatasets()
datasets <- as.data.frame(datasets)
datasets$FunctionCall <- sprintf("`%s`", datasets$FunctionCall)
knitr::kable(datasets)
FunctionCall Species Tissue NumberOfCells NumberOfImages NumberOfChannels Reference
Damond_2019_Pancreas() Human Pancreas 252059 100 38 (???)
JacksonFischer_2020_BreastCancer() Human Primary breast tumour 285851 100 42 (???)
Zanotelli_2020_Spheroids() Human Cell line spheroids 229047 517 51 (???)

3 Retrieving data

Users can import the datasets by calling a single function and specifying the type of data to retrieve. The following examples highlight accessing the dataset provided by Damond, N. et al., A Map of Human Type 1 Diabetes Progression by Imaging Mass Cytometry (???).

Importing single-cell expression data and metadata

sce <- Damond_2019_Pancreas("sce")
sce
## class: SingleCellExperiment 
## dim: 38 252059 
## metadata(0):
## assays(3): counts exprs quant_norm
## rownames(38): H3 SMA ... DNA1 DNA2
## rowData names(6): channel metal ... antibody_clone full_name
## colnames(252059): 138_1 138_2 ... 319_1149 319_1150
## colData names(28): cell_id image_name ... patient_ethnicity patient_BMI
## reducedDimNames(0):
## mainExpName: Damond_2019_Pancreas
## altExpNames(0):

Importing multichannel images

images <- Damond_2019_Pancreas("images")
images
## CytoImageList containing 100 image(s)
## names(100): E02 E03 E04 E05 E06 E07 E08 E09 E10 E11 E12 E13 E14 E15 E16 E17 E18 E19 E20 E21 E22 E23 E24 E25 E26 E27 E28 E29 E30 E31 E32 E33 E34 G01 G02 G03 G04 G05 G06 G07 G08 G09 G10 G11 G12 G13 G14 G15 G16 G17 G18 G19 G20 G21 G22 G23 G24 G25 G26 G27 G28 G29 G30 G31 G32 G33 J01 J02 J03 J04 J05 J06 J07 J08 J09 J10 J11 J12 J13 J14 J15 J16 J17 J18 J19 J20 J21 J22 J23 J24 J25 J26 J27 J28 J29 J30 J31 J32 J33 J34 
## Each image contains 38 channel(s)
## channelNames(38): H3 SMA INS CD38 CD44 PCSK2 CD99 CD68 MPO SLC2A1 CD20 AMY2A CD3e PPY PIN PD_1 GCG PDX1 SST SYP KRT19 CD45 FOXP3 CD45RA CD8a CA9 IAPP Ki67 NKX6_1 p_HH3 CD4 CD31 CDH1 PTPRN p_Rb cPARP_cCASP3 DNA1 DNA2

Importing cell segmentation masks

masks <- Damond_2019_Pancreas("masks")
masks
## CytoImageList containing 100 image(s)
## names(100): E02 E03 E04 E05 E06 E07 E08 E09 E10 E11 E12 E13 E14 E15 E16 E17 E18 E19 E20 E21 E22 E23 E24 E25 E26 E27 E28 E29 E30 E31 E32 E33 E34 G01 G02 G03 G04 G05 G06 G07 G08 G09 G10 G11 G12 G13 G14 G15 G16 G17 G18 G19 G20 G21 G22 G23 G24 G25 G26 G27 G28 G29 G30 G31 G32 G33 J01 J02 J03 J04 J05 J06 J07 J08 J09 J10 J11 J12 J13 J14 J15 J16 J17 J18 J19 J20 J21 J22 J23 J24 J25 J26 J27 J28 J29 J30 J31 J32 J33 J34 
## Each image contains 1 channel

On disk storage

Objects containing multi-channel images and segmentation masks can furthermore be stored on disk rather than in memory. Nevertheless, they need to be loaded into memory once before writing them to disk. This process takes longer than keeping them in memory but reduces memory requirements during downstream analysis.

To write images or masks to disk, set on_disk = TRUE and specify a path where images/masks will be stored as .h5 files:

# Create temporary location
cur_path <- tempdir()

masks <- Damond_2019_Pancreas(data_type = "masks", on_disk = TRUE,
    h5FilesPath = cur_path)
masks
## CytoImageList containing 100 image(s)
## names(100): E02 E03 E04 E05 E06 E07 E08 E09 E10 E11 E12 E13 E14 E15 E16 E17 E18 E19 E20 E21 E22 E23 E24 E25 E26 E27 E28 E29 E30 E31 E32 E33 E34 G01 G02 G03 G04 G05 G06 G07 G08 G09 G10 G11 G12 G13 G14 G15 G16 G17 G18 G19 G20 G21 G22 G23 G24 G25 G26 G27 G28 G29 G30 G31 G32 G33 J01 J02 J03 J04 J05 J06 J07 J08 J09 J10 J11 J12 J13 J14 J15 J16 J17 J18 J19 J20 J21 J22 J23 J24 J25 J26 J27 J28 J29 J30 J31 J32 J33 J34 
## Each image contains 1 channel

4 Dataset info and metadata

Additional information about each dataset is available in the help page:

?Damond_2019_Pancreas

The metadata associated with a specific data object can be displayed as follows:

Damond_2019_Pancreas(data_type = "sce", metadata = TRUE)
Damond_2019_Pancreas(data_type = "images", metadata = TRUE)
Damond_2019_Pancreas(data_type = "masks", metadata = TRUE)

5 Usage

The SingleCellExperiment class objects can be used for data analysis. For more information, please refer to the SingleCellExperiment package and to the Orchestrating Single-Cell Analysis with Bioconductor workflow.

The CytoImageList class objects can be used for plotting cell and pixel information. Some typical use cases are given below. For more information, please see the cytomapper package and the associated vignette.

Subsetting the images and masks

cur_images <- images[1:5]
cur_masks <- masks[1:5]

Plotting pixel information

The images objects can be used to display pixel-level data.

plotPixels(
    cur_images,
    colour_by = c("CDH1", "CD99", "H3"),
    bcg = list(
        CD99 = c(0,2,1),
        CDH1 = c(0,8,1),
        H3 = c(0,5,1)
    )
)