MAST 1.33.0
As a SingleCellExperiment
-derived package, MAST
can easily be
inserted into workflows with packages such as
scran
, scater
, zinbwave
, SCnorm
and others. Moreover, subclassing SingleCellExperiment/SummarizedExperiment provides a flexible abstraction for the assay
that contains the actual expression data. It can use sparse Matrix
and HDF5 as backends to save memory.
To use MAST with such packages, you just need to upcast the SingleCellExperiment
to MAST’s subclass SingleCellAssay
with the function SceToSingleCellAssay
that handles the coercion and checks the object for validity. Going the other direction, generally SingleCellAssay
s should work in packages that use SingleCellExperiment
, but if in doubt you could down-cast with as(sca, 'SingleCellExperiment')
.
The main gotcha in all this is that some SingleCellExperiment-derived packages assume integer counts have been provided, while MAST assumes that log-transformed approximately scale-normalized data is provided. We find that MAST performs best with log-transformed, scale-normalized data that has been thresholded, such as \(\log_2(\text{transcripts per million} + 1)\).
We address this by:
SingleCellAssay
assay
containing such
putatively log-like dataIn what follows, we show an example of using scater
to plot some QC
metrics, SCnorm
to normalize data, and, and conversion
to a Seurat
object.
Scater McCarthy et al. (2017) is a package that provides functions for QC, normalization and visualization of single cell RNAseq data.
library(MAST)
knitr::opts_chunk$set(message = FALSE,error = FALSE,warning = FALSE)
data(maits, package='MAST')
unlog <- function(x) ceiling(2^x - 1)
sca_raw = FromMatrix(t(maits$expressionmat), maits$cdat, maits$fdat)
## Assuming data assay in position 1, with name et is log-transformed.
assays(sca_raw)$counts = unlog(assay(sca_raw))
assayNames(sca_raw)
Here we make an object with assays counts
and et
. By default,
MAST
will operate on the et
assay, but scran wants count-like data
for some of its QC. The et
data are log2 + 1 transcripts per
million (TPM), as output by RSEM.
We could specify the assay name at creation with sca_raw = FromMatrix(list(logTPM = t(maits$expressionmat)), maits$cdat, maits$fdat)
or rename an object that contains appropriately transformed data with
assayNames(sca_raw) = c('logTPM', 'counts')
.
Before calling scater
functionality, you might pause to
consider if some features should belong in special control
sets,
such as mitochrondial genes, or spike-ins.
library(scater)
sca_raw = addPerCellQC(sca_raw)
plotColData(sca_raw, y="detected", x="total")
Evidently some features were filtered, so not all cells contain 1 million counts.
sca_raw = runPCA(sca_raw, ncomponents=5, exprs_values = 'et')
plotReducedDim(sca_raw, dimred = 'PCA', colour_by = 'condition')