overlapExprs {scran} | R Documentation |
Compute the gene-specific overlap in expression profiles between two groups of cells.
## S4 method for signature 'ANY' overlapExprs(x, groups, design=NULL, residuals=FALSE, tol=1e-8, BPPARAM=SerialParam(), subset.row=NULL, lower.bound=NULL) ## S4 method for signature 'SingleCellExperiment' overlapExprs(x, ..., subset.row=NULL, lower.bound=NULL, assay.type="logcounts", get.spikes=FALSE)
x |
A numeric matrix of expression values, where each column corresponds to a cell and each row corresponds to an endogenous gene. Alternatively, a SingleCellExperiment object containing such a matrix. |
groups |
A vector of group assignments for all cells. |
design |
A numeric matrix containing blocking terms, i.e., uninteresting factors driving expression across cells. |
residuals |
A logical scalar indicating whether overlaps should be computed between residuals of a linear model. |
tol |
A numeric scalar specifying the tolerance with which ties are considered. |
BPPARAM |
A BiocParallelParam object to use in |
subset.row |
A logical, integer or character scalar indicating the rows of |
lower.bound |
A numeric scalar specifying the theoretical lower bound of values in |
... |
Additional arguments to pass to the matrix method. |
assay.type |
A string specifying which assay values to use, e.g., |
get.spikes |
A logical scalar specifying whether decomposition should be performed for spike-ins. |
For two groups of cells A and B, consider the distribution of expression values for gene X across those cells. The overlap proportion is defined as the probability that a randomly selected cell in A has a greater expression value of X than a randomly selected cell in B. Overlap proportions near 0 or 1 indicate that the expression distributions are well-separated. In particular, large proportions indicate that most cells of the first group (A) express the gene more highly than most cells of the second group (B).
This function computes, for each gene, the overlap proportions between all pairs of groups in groups
.
It is designed to complement findMarkers
, which reports the log-fold changes between groups.
This is useful for prioritizing candidate markers that are distinctive to one group or another, without needing to plot the expression values.
Expression values that are tied between groups are considered to be 50% likely to be greater in either group.
Thus, if all values were tied, the overlap proportion would be 0.5.
The tolerance with which ties are considered can be set by changing tol
.
By default, spike-in transcripts are ignored in overlapExprs,SingleCellExperiment-method
with get.spikes=FALSE
.
This is overridden by any non-NULL
value of subset.row
.
A named list of numeric matrices.
Each matrix corresponds to a group (A) in groups
and contains one row per gene in x
(or the subset specified by subset.row
).
Each column corresponds to another group (B) in groups
.
The matrix entries contain overlap proportions between groups A and B for each gene.
If the experiment has known (and uninteresting) factors of variation, these can be included in design
.
The approach used to remove these factors depends on the design matrix.
If there is only one factor in design
, the levels of the factor are defined as separate blocks.
Overlaps between groups are computed within each block, and a weighted mean (based on the number of cells in each block) of the overlaps is taken across all blocks.
This approach avoids the need for linear modelling and the associated assumptions regarding normality and correct model specification.
In particular, it avoids problems with breaking of ties when counts or expression values are converted to residuals.
However, it also makes less use of information, e.g., we ignore any blocks containing cells from only one group.
NA
proportions may be observed for a pair of groups if there is no block that contains cells from that pair.
For designs containing multiple factors or covariates, a linear model is fitted to the expression values with design
.
Overlap proportions are then computed using the residuals of the fitted model.
This approach is not ideal, requiring log-transformed x
and setting of lower.bound
- see ?correlatePairs
for a related discussion.
It can also be used for one-way layouts by setting residuals=TRUE
.
Aaron Lun
# Using the mocked-up data 'y2' from this example. example(computeSpikeFactors) y2 <- normalize(y2) groups <- sample(3, ncol(y2), replace=TRUE) out <- overlapExprs(y2, groups, subset.row=1:10)