0.1 Introduction

CITE-seq data provide RNA and surface protein counts for the same cells. This tutorial shows how MuData can be integrated into with Bioconductor workflows to analyse CITE-seq data.

0.2 Installation

The most recent dev build can be installed from GitHub:

library(remotes)
remotes::install_github("ilia-kats/MuData")

Stable version of MuData will be available in future bioconductor versions.

0.3 Loading libraries

library(MuData)
library(SingleCellExperiment)
library(MultiAssayExperiment)
library(CiteFuse)
library(scater)

library(rhdf5)

0.4 Loading data

We will use CITE-seq data available within CiteFuse Bioconductor package.

data("CITEseq_example", package = "CiteFuse")
lapply(CITEseq_example, dim)
#> $RNA
#> [1] 19521   500
#> 
#> $ADT
#> [1]  49 500
#> 
#> $HTO
#> [1]   4 500

This dataset contains three matrices — one with RNA counts, one with antibody-derived tags (ADT) counts and one with hashtag oligonucleotide (HTO) counts.

0.5 Processing count matrices

While CITE-seq analysis workflows such as CiteFuse should be consulted for more details, below we exemplify simple data transformations in order to demonstrate how their output can be saved to an H5MU file later on.

Following the CiteFuse tutorial, we start with creating a SingleCellExperiment object with the three matrices:

sce_citeseq <- preprocessing(CITEseq_example)
sce_citeseq
#> class: SingleCellExperiment 
#> dim: 19521 500 
#> metadata(0):
#> assays(1): counts
#> rownames(19521): hg19_AL627309.1 hg19_AL669831.5 ... hg19_MT-ND6
#>   hg19_MT-CYB
#> rowData names(0):
#> colnames(500): AAGCCGCGTTGTCTTT GATCGCGGTTATCGGT ... TTGGCAACACTAGTAC
#>   GCTGCGAGTTGTGGCC
#> colData names(0):
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(2): ADT HTO

We will add a new assay with normalised RNA counts:

sce_citeseq <- scater::logNormCounts(sce_citeseq)
sce_citeseq  # new assay: logcounts
#> class: SingleCellExperiment 
#> dim: 19521 500 
#> metadata(0):
#> assays(2): counts logcounts
#> rownames(19521): hg19_AL627309.1 hg19_AL669831.5 ... hg19_MT-ND6
#>   hg19_MT-CYB
#> rowData names(0):
#> colnames(500): AAGCCGCGTTGTCTTT GATCGCGGTTATCGGT ... TTGGCAACACTAGTAC
#>   GCTGCGAGTTGTGGCC
#> colData names(1): sizeFactor
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(2): ADT HTO

To the ADT modality, we will add an assay with normalised counts:

sce_citeseq <- CiteFuse::normaliseExprs(
  sce_citeseq, altExp_name = "ADT", transform = "log"
)
altExp(sce_citeseq, "ADT")  # new assay: logcounts
#> class: SummarizedExperiment 
#> dim: 49 500 
#> metadata(0):
#> assays(2): counts logcounts
#> rownames(49): B220 (CD45R) B7-H1 (PD-L1) ... TCRb TCRg
#> rowData names(0):
#> colnames(500): AAGCCGCGTTGTCTTT GATCGCGGTTATCGGT ... TTGGCAACACTAGTAC
#>   GCTGCGAGTTGTGGCC
#> colData names(0):

We will also generate reduced dimensions:

sce_citeseq <- scater::runPCA(
  sce_citeseq, exprs_values = "logcounts", ncomponents = 20
)
scater::plotReducedDim(sce_citeseq, dimred = "PCA", 
                       by_exprs_values = "logcounts", colour_by = "CD27")