1 Introduction

As a SingleCellExperiment-derived package, MAST can easily be inserted into workflows with packages such as scran, scater, zinbwave, SCnorm and others. Moreover, subclassing SingleCellExperiment/SummarizedExperiment provides a flexible abstraction for the assay that contains the actual expression data. It can use sparse Matrix and HDF5 as backends to save memory.

To use MAST with such packages, you just need to upcast the SingleCellExperiment to MAST’s subclass SingleCellAssay with the function SceToSingleCellAssay that handles the coercion and checks the object for validity. Going the other direction, generally SingleCellAssays should work in packages that use SingleCellExperiment, but if in doubt you could down-cast with as(sca, 'SingleCellExperiment').

1.1 Log-transformation is expected in MAST

The main gotcha in all this is that some SingleCellExperiment-derived packages assume integer counts have been provided, while MAST assumes that log-transformed approximately scale-normalized data is provided. We find that MAST performs best with log-transformed, scale-normalized data that has been thresholded, such as \(\log_2(\text{transcripts per million} + 1)\).

We address this by:

  • testing for log-like data for objects up-cast to SingleCellAssay
  • explicitly naming the slot of the assay containing such putatively log-like data
  • by default operating on the slot with such log-like data

1.2 Examples

In what follows, we show an example of using scater to plot some QC metrics, SCnorm to normalize data, and, and conversion to a Seurat object.

2 From MAST to Scater

Scater McCarthy et al. (2017) is a package that provides functions for QC, normalization and visualization of single cell RNAseq data.

library(MAST)

knitr::opts_chunk$set(message = FALSE,error = FALSE,warning = FALSE)
data(maits, package='MAST')
unlog <- function(x) ceiling(2^x - 1)
sca_raw =  FromMatrix(t(maits$expressionmat), maits$cdat, maits$fdat)
## Assuming data assay in position 1, with name et is log-transformed.
assays(sca_raw)$counts = unlog(assay(sca_raw))
assayNames(sca_raw)

Here we make an object with assays counts and et. By default, MAST will operate on the et assay, but scran wants count-like data for some of its QC. The et data are log2 + 1 transcripts per million (TPM), as output by RSEM.

We could specify the assay name at creation with sca_raw = FromMatrix(list(logTPM = t(maits$expressionmat)), maits$cdat, maits$fdat) or rename an object that contains appropriately transformed data with assayNames(sca_raw) = c('logTPM', 'counts').

Before calling scater functionality, you might pause to consider if some features should belong in special control sets, such as mitochrondial genes, or spike-ins.

library(scater)
sca_raw = addPerCellQC(sca_raw)
plotColData(sca_raw, y="detected", x="total")

Evidently some features were filtered, so not all cells contain 1 million counts.

sca_raw = runPCA(sca_raw, ncomponents=5, exprs_values = 'et')
plotReducedDim(sca_raw, dimred = 'PCA', colour_by = 'condition')