SingleR {SingleR}R Documentation

Annotate scRNA-seq data

Description

Returns the best annotation for each cell in a test dataset, given a labelled reference dataset in the same feature space.

Usage

SingleR(
  test,
  ref,
  labels,
  method = c("single", "cluster"),
  clusters = NULL,
  genes = "de",
  de.method = "classic",
  de.n = NULL,
  de.args = list(),
  quantile = 0.8,
  fine.tune = TRUE,
  tune.thresh = 0.05,
  sd.thresh = 1,
  prune = TRUE,
  assay.type.test = "logcounts",
  assay.type.ref = "logcounts",
  check.missing = TRUE,
  BNPARAM = KmknnParam(),
  BPPARAM = SerialParam()
)

Arguments

test

A numeric matrix of single-cell expression values where rows are genes and columns are cells. Alternatively, a SummarizedExperiment object containing such a matrix.

ref

A numeric matrix of expression values where rows are genes and columns are reference samples (individual cells or bulk samples). Each row should be named with the gene name. In general, the expression values are expected to be log-transformed, see Details.

Alternatively, a SummarizedExperiment object containing such a matrix.

Alternatively, a list or List of SummarizedExperiment objects or numeric matrices containing multiple references, in which case the row names are expected to be the same across all objects.

labels

A character vector or factor of known labels for all samples in ref.

Alternatively, if ref is a list, labels should be a list of the same length. Each element should contain a character vector or factor specifying the label for the corresponding entry of ref.

method

String specifying whether annotation should be performed on single cells in test, or whether they should be aggregated into cluster-level profiles prior to annotation.

clusters

A character vector or factor of cluster identities for each cell in test. Only used if method="cluster".

genes, sd.thresh

Arguments controlling the genes that are used for annotation, see trainSingleR.

de.method

String specifying how DE genes should be detected between pairs of labels. Defaults to "classic", which sorts genes by the log-fold changes and takes the top de.n. Setting to "wilcox" or "t" will use Wilcoxon ranked sum test or Welch t-test between labels, respectively, and take the top de.n upregulated genes per comparison.

de.n

An integer scalar specifying the number of DE genes to use when genes="de". If de.method="classic", defaults to 500 * (2/3) ^ log2(N) where N is the number of unique labels. Otherwise, defaults to 10.

de.args

Named list of additional arguments to pass to pairwiseTTests or pairwiseWilcox when de.method="wilcox" or "t".

quantile, fine.tune, tune.thresh, prune

Further arguments to pass to classifySingleR.

assay.type.test

An integer scalar or string specifying the assay of test containing the relevant expression matrix, if test is a SummarizedExperiment object.

assay.type.ref

An integer scalar or string specifying the assay of ref containing the relevant expression matrix, if ref is a SummarizedExperiment object.

check.missing

Logical scalar indicating whether rows should be checked for missing values (and if found, removed).

BNPARAM

A BiocNeighborParam object specifying the algorithm to use for building nearest neighbor indices.

BPPARAM

A BiocParallelParam object specifying how parallelization should be performed, if any.

Details

If method="single", this function is effectively just a convenient wrapper around trainSingleR and classifySingleR.

If method="cluster", per-cell profiles are summed to obtain per-cluster profiles and annotation is performed on these clusters.

The function will automatically restrict the analysis to the intersection of the genes available in both ref and test. If this intersection is empty (e.g., because the two datasets use different annotation in their row names), an error will be raised.

ref can contain both single-cell or bulk data, but in the case of the former, read the Note in ?trainSingleR.

Value

A DataFrame is returned containing the annotation statistics for each cell or cluster (row). This is identical to the output of classifySingleR.

Author(s)

Aaron Lun, based on code by Dvir Aran.

References

Aran D, Looney AP, Liu L et al. (2019). Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage. Nat. Immunology 20, 163–172.

Examples

##############################
## Mocking up training data ##
##############################

Ngroups <- 5
Ngenes <- 1000
means <- matrix(rnorm(Ngenes*Ngroups), nrow=Ngenes)
means[1:900,] <- 0
colnames(means) <- LETTERS[1:5]

g <- rep(LETTERS[1:5], each=4)
ref <- SummarizedExperiment(
    list(counts=matrix(rpois(1000*length(g), 
        lambda=10*2^means[,g]), ncol=length(g))),
    colData=DataFrame(label=g)
)
rownames(ref) <- sprintf("GENE_%s", seq_len(nrow(ref)))

ref <- scater::logNormCounts(ref)
trained <- trainSingleR(ref, ref$label)

###############################
## Mocking up some test data ##
###############################

N <- 100
g <- sample(LETTERS[1:5], N, replace=TRUE)
test <- SummarizedExperiment(
    list(counts=matrix(rpois(1000*N, lambda=2^means[,g]), ncol=N)),
    colData=DataFrame(cluster=g)
)

rownames(test) <- sprintf("GENE_%s", seq_len(nrow(test)))
test <- scater::logNormCounts(test)

###############################
## Performing classification ##
###############################

pred <- SingleR(test, ref, labels=ref$label)
table(predicted=pred$labels, truth=g)

pred2 <- SingleR(test, ref, labels=ref$label, 
    method="cluster", clusters=test$cluster) 
table(predicted=pred2$labels, truth=rownames(pred2))


[Package SingleR version 1.0.6 Index]