batchelor 1.21.1
The batchelor package provides the batchCorrect()
generic that dispatches on its PARAM
argument.
Users writing code using batchCorrect()
can easily change one method for another by simply modifying the class of object supplied as PARAM
.
For example:
B1 <- matrix(rnorm(10000), ncol=50) # Batch 1
B2 <- matrix(rnorm(10000), ncol=50) # Batch 2
# Switching easily between batch correction methods.
m.out <- batchCorrect(B1, B2, PARAM=ClassicMnnParam())
f.out <- batchCorrect(B1, B2, PARAM=FastMnnParam(d=20))
r.out <- batchCorrect(B1, B2, PARAM=RescaleParam(pseudo.count=0))
Developers of other packages can extend this further by adding their batch correction methods to this dispatch system. This improves interoperability across packages by allowing users to easily experiment with different methods.
You will need to Imports: batchelor, methods
in your DESCRIPTION
file.
You will also need to add import(methods)
, importFrom(batchelor, "batchCorrect")
and importClassesFrom(batchelor, "BatchelorParam")
in your NAMESPACE
file.
Obviously, you will also need to have a function that implements your batch correction method. For demonstration purposes, we will use an identity function that simply returns the input values1 Not a very good correction, but that’s not the point right now.. This is implemented like so:
noCorrect <- function(...)
# Takes a set of batches and returns them without modification.
{
do.call(cbind, list(...))
}
BatchelorParam
subclassWe need to define a new BatchelorParam
subclass that instructs the batchCorrect()
generic to dispatch to our new method.
This is most easily done like so:
NothingParam <- setClass("NothingParam", contains="BatchelorParam")
Note that BatchelorParam
itself is derived from a SimpleList
and can be modified with standard list operators like $
.
nothing <- NothingParam()
nothing
## NothingParam of length 0
nothing$some_value <- 1
nothing
## NothingParam of length 1
## names(1): some_value
If no parameters are set, the default values in the function will be used2 Here there are none in noCorrect()
, but presumably your function is more complex than that..
Additional slots can be specified in the class definition if there are important parameters that need to be manually specified by the user.
batchCorrect
methodThe batchCorrect()
generic looks like this:
batchCorrect
## standardGeneric for "batchCorrect" defined from package "batchelor"
##
## function (..., batch = NULL, restrict = NULL, subset.row = NULL,
## correct.all = FALSE, assay.type = NULL, PARAM)
## standardGeneric("batchCorrect")
## <bytecode: 0x5594958443c0>
## <environment: 0x559495856b30>
## Methods may be defined for arguments: PARAM
## Use showMethods(batchCorrect) for currently available ones.
Any implemented method must accept one or more matrix-like objects containing single-cell gene expression matrices in ...
.
Rows are assumed to be genes and columns are assumed to be cells.
If only one object is supplied, batch
must be specified to indicate the batches to which each cell belongs.
Alternatively, one or more SingleCellExperiment
objects can be supplied, containing the gene expression matrix in the assay.type
assay.
These should not be mixed with matrix-like objects, i.e., if one object is a SingleCellExperiment
, all objects should be SingleCellExperiment
s.
The subset.row=
argument specifies a subset of genes on which to perform the correction.
The correct.all=
argument specifies whether corrected values should be returned for all genes, by “extrapolating” from the results for the genes that were used3 If your method cannot support this option, setting it to TRUE
should raise an error..
See the Output section below for the expected output from each combination of these settings.
The restrict=
argument allows users to compute the correction using only a subset of cells in each batch (e.g., from cell controls).
The correction is then “extrapolated” to all cells in the batch4 Again, if your method cannot support this, any non-NULL
value of restrict
should raise an error., such that corrected values are returned for all cells.
Any implemented method must return a SingleCellExperiment
where the first assay contains corrected gene expression values for all genes.
Corrected values should be returned for all genes if correct.all=TRUE
or subset.row=NULL
.
If correct.all=FALSE
and subset.row
is not NULL
, values should only be returned for the selected genes.
Cells should be reported in the same order that they are supplied in ...
.
In cases with multiple batches, the cell identities are simply concatenated from successive objects in their specified order,
i.e., all cells from the first object (in their provided order), then all cells from the second object, and so on.
If there is only a single batch, the order of cells in that batch should be preserved.
The output object should have row names equal to the row names of the input objects.
Column names should be equivalent to the concatenated column names of the input objects, unless all are NULL
, in which case the column names in the output can be NULL
.
In situations where some input objects have column names, and others are NULL
, those missing column names should be filled in with empty strings.
This represents the expected behaviour when cbind
ing multiple matrices together.
Finally, the colData
slot should contain ‘batch’, a vector specifying the batch of origin for each cell.
Finally, we define a method that calls our noCorrect
function while satisfying all of the above input/output requirements.
To be clear, it is not mandatory to lay out the code as shown below; this is simply one way that all the requirements can be satisfied.
We have used some internal batchelor functions for brevity - please contact us if you find these useful and want them to be exported.
setMethod("batchCorrect", "NothingParam", function(..., batch = NULL,
restrict=NULL, subset.row = NULL, correct.all = FALSE,
assay.type = "logcounts", PARAM)
{
batches <- list(...)
checkBatchConsistency(batches)
# Pulling out information from the SCE objects.
is.sce <- checkIfSCE(batches)
if (any(is.sce)) {
batches[is.sce] <- lapply(batches[is.sce], assay, i=assay.type)
}
# Subsetting by 'batch', if only one object is supplied.
do.split <- length(batches)==1L
if (do.split) {
divided <- divideIntoBatches(batches[[1]], batch=batch, restrict=restrict)
batches <- divided$batches
restrict <- divided$restricted
}
# Subsetting by row.
# This is a per-gene "method", so correct.all=TRUE will ignore subset.row.
# More complex methods will need to handle this differently.
if (correct.all) {
subset.row <- NULL
} else if (!is.null(subset.row)) {
subset.row <- normalizeSingleBracketSubscript(originals[[1]], subset.row)
batches <- lapply(batches, "[", i=subset.row, , drop=FALSE)
}
# Don't really need to consider restrict!=NULL here, as this function
# doesn't do anything with the cells anyway.
output <- do.call(noCorrect, batches)
# Reordering the output for correctness if it was previously split.
if (do.split) {
d.reo <- divided$reorder
output <- output[,d.reo,drop=FALSE]
}
ncells.per.batch <- vapply(batches, FUN=ncol, FUN.VALUE=0L)
batch.names <- names(batches)
if (is.null(batch.names)) {
batch.names <- seq_along(batches)
}
SingleCellExperiment(list(corrected=output),
colData=DataFrame(batch=rep(batch.names, ncells.per.batch)))
})
And it works5 In a strictly programming sense, as the method itself does no correction at all.:
n.out <- batchCorrect(B1, B2, PARAM=NothingParam())
n.out
## class: SingleCellExperiment
## dim: 200 100
## metadata(0):
## assays(1): corrected
## rownames: NULL
## rowData names(0):
## colnames: NULL
## colData names(1): batch
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
Remember to export both the new method and the NothingParam
class and constructor.
sessionInfo()
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.4 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.20-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] scater_1.33.2 ggplot2_3.5.1
## [3] scran_1.33.0 scuttle_1.15.1
## [5] scRNAseq_2.19.1 batchelor_1.21.1
## [7] SingleCellExperiment_1.27.2 SummarizedExperiment_1.35.1
## [9] Biobase_2.65.0 GenomicRanges_1.57.1
## [11] GenomeInfoDb_1.41.1 IRanges_2.39.1
## [13] S4Vectors_0.43.1 BiocGenerics_0.51.0
## [15] MatrixGenerics_1.17.0 matrixStats_1.3.0
## [17] knitr_1.48 BiocStyle_2.33.1
##
## loaded via a namespace (and not attached):
## [1] jsonlite_1.8.8 magrittr_2.0.3
## [3] magick_2.8.3 ggbeeswarm_0.7.2
## [5] GenomicFeatures_1.57.0 gypsum_1.1.6
## [7] farver_2.1.2 rmarkdown_2.27
## [9] BiocIO_1.15.0 zlibbioc_1.51.1
## [11] vctrs_0.6.5 memoise_2.0.1
## [13] Rsamtools_2.21.0 DelayedMatrixStats_1.27.1
## [15] RCurl_1.98-1.14 tinytex_0.51
## [17] htmltools_0.5.8.1 S4Arrays_1.5.3
## [19] AnnotationHub_3.13.0 curl_5.2.1
## [21] BiocNeighbors_1.23.0 Rhdf5lib_1.27.0
## [23] SparseArray_1.5.16 rhdf5_2.49.0
## [25] sass_0.4.9 alabaster.base_1.5.3
## [27] bslib_0.7.0 alabaster.sce_1.5.1
## [29] httr2_1.0.1 cachem_1.1.0
## [31] ResidualMatrix_1.15.1 GenomicAlignments_1.41.0
## [33] igraph_2.0.3 lifecycle_1.0.4
## [35] pkgconfig_2.0.3 rsvd_1.0.5
## [37] Matrix_1.7-0 R6_2.5.1
## [39] fastmap_1.2.0 GenomeInfoDbData_1.2.12
## [41] digest_0.6.36 colorspace_2.1-0
## [43] AnnotationDbi_1.67.0 dqrng_0.4.1
## [45] irlba_2.3.5.1 ExperimentHub_2.13.0
## [47] RSQLite_2.3.7 beachmat_2.21.3
## [49] labeling_0.4.3 filelock_1.0.3
## [51] fansi_1.0.6 httr_1.4.7
## [53] abind_1.4-5 compiler_4.4.1
## [55] withr_3.0.0 bit64_4.0.5
## [57] BiocParallel_1.39.0 viridis_0.6.5
## [59] DBI_1.2.3 highr_0.11
## [61] HDF5Array_1.33.3 alabaster.ranges_1.5.2
## [63] alabaster.schemas_1.5.0 rappdirs_0.3.3
## [65] DelayedArray_0.31.6 rjson_0.2.21
## [67] bluster_1.15.0 tools_4.4.1
## [69] vipor_0.4.7 beeswarm_0.4.0
## [71] glue_1.7.0 restfulr_0.0.15
## [73] rhdf5filters_1.17.0 grid_4.4.1
## [75] Rtsne_0.17 cluster_2.1.6
## [77] generics_0.1.3 gtable_0.3.5
## [79] ensembldb_2.29.0 metapod_1.13.0
## [81] BiocSingular_1.21.2 ScaledMatrix_1.13.0
## [83] utf8_1.2.4 XVector_0.45.0
## [85] ggrepel_0.9.5 BiocVersion_3.20.0
## [87] pillar_1.9.0 limma_3.61.2
## [89] dplyr_1.1.4 BiocFileCache_2.13.0
## [91] lattice_0.22-6 rtracklayer_1.65.0
## [93] bit_4.0.5 tidyselect_1.2.1
## [95] locfit_1.5-9.10 Biostrings_2.73.1
## [97] gridExtra_2.3 bookdown_0.40
## [99] ProtGenerics_1.37.0 edgeR_4.3.4
## [101] xfun_0.45 statmod_1.5.0
## [103] UCSC.utils_1.1.0 lazyeval_0.2.2
## [105] yaml_2.3.9 evaluate_0.24.0
## [107] codetools_0.2-20 tibble_3.2.1
## [109] alabaster.matrix_1.5.4 BiocManager_1.30.23
## [111] cli_3.6.3 munsell_0.5.1
## [113] jquerylib_0.1.4 Rcpp_1.0.12
## [115] dbplyr_2.5.0 png_0.1-8
## [117] XML_3.99-0.17 parallel_4.4.1
## [119] blob_1.2.4 AnnotationFilter_1.29.0
## [121] sparseMatrixStats_1.17.2 bitops_1.0-7
## [123] viridisLite_0.4.2 alabaster.se_1.5.3
## [125] scales_1.3.0 crayon_1.5.3
## [127] rlang_1.1.4 cowplot_1.1.3
## [129] KEGGREST_1.45.1