fitTrendVar {scran} | R Documentation |
Fit a mean-dependent trend to the variances of the log-normalized expression values derived from count data.
fitTrendVar(means, vars, min.mean = 0.1, parametric = TRUE, nls.args = list(), ...)
means |
A numeric vector containing the mean log-expression value for each gene. |
vars |
A numeric vector containing the variance of log-expression values for each gene. |
min.mean |
A numeric scalar specifying the minimum mean to use for trend fitting. |
parametric |
A logical scalar indicating whether a parametric fit should be attempted. |
nls.args |
A list of parameters to pass to |
... |
Further arguments to pass to |
This function fits a mean-dependent trend to the variance of the log-normalized expression for the selected features.
The fitted trend can then be used to decompose the variance of each gene into biological and technical components,
as done in modelGeneVar
and modelGeneVarWithSpikes
.
If parametric=TRUE
, a non-linear curve of the form
y = ax/(x^n + b)
is fitted to the variances against the means using nls
.
Starting values and the number of iterations are automatically set if not explicitly specified in nls.args
.
weightedLowess
is then applied to the log-ratios of the variance to the fitted value for each gene.
The aim is to use the parametric curve to reduce the sharpness of the expected mean-variance relationship[for easier smoothing.
Conversely, the parametric form is not exact, so the smoothers will model any remaining trends in the residuals.
If parametric=FALSE
, smoothing is performed directly on the log-variances using weightedLowess
.
Genes with mean log-expression below min.mean
are not used in trend fitting.
This aims to remove the majority of low-abundance genes and preserve the sensitivity of span-based smoothers at moderate-to-high abundances.
It also protects against discreteness, which can interfere with estimation of the variability of the variance estimates and accurate scaling of the trend.
Filtering is applied on the mean log-expression to avoid introducing spurious trends at the filter boundary.
The default threshold is chosen based on the point at which discreteness is observed in variance estimates from Poisson-distributed counts.
For heterogeneous droplet data, a lower threshold of 0.001-0.01 may be more appropriate.
When extrapolating to values below the smallest observed mean (or min.mean
), the output function will approach zero as the mean approaches zero.
This reflects the fact that the variance should be zero at a log-expression of zero (assuming a pseudo-count of 1 was used).
When extrapolating to values above the largest observed mean, the output function will be set to the fitted value of the trend at the largest mean.
All fitting (with nls
and weightedLowess
) is performed by weighting each observation according to the inverse of the density of observations at the same mean.
This avoids problems with differences in the distribution of means that would otherwise favor good fits in highly dense intervals at the expense of sparser intervals.
Note that these densities are computed after filtering on min.mean
.
A named list is returned containing:
trend
:A function that returns the fitted value of the trend at any value of the mean.
std.dev
:A numeric scalar containing the robust standard deviation of the ratio of var
to the fitted value of the trend across all features used for trend fitting.
Aaron Lun
modelGeneVar
and modelGeneVarWithSpikes
, where this function is used.
library(scater) sce <- mockSCE() sce <- logNormCounts(sce) # Fitting a trend: library(DelayedMatrixStats) means <- rowMeans(logcounts(sce)) vars <- rowVars(logcounts(sce)) fit <- fitTrendVar(means, vars) # Comparing the two trend fits: plot(means, vars, pch=16, cex=0.5, xlab="Mean", ylab="Variance") curve(fit$trend(x), add=TRUE, col="dodgerblue", lwd=3)