findMutualNN {batchelor} | R Documentation |
Find mutual nearest neighbors (MNN) across two data sets.
findMutualNN(data1, data2, k1, k2 = k1, BNPARAM = KmknnParam(), BPPARAM = SerialParam())
data1 |
A numeric matrix containing samples (e.g., cells) in the rows and variables/dimensions in the columns. |
data2 |
A numeric matrix like |
k1 |
Integer scalar specifying the number of neighbors to search for in |
k2 |
Integer scalar specifying the number of neighbors to search for in |
BNPARAM |
A BiocNeighborParam object specifying the neighbour search algorithm to use. |
BPPARAM |
A BiocParallelParam object specifying how parallelization should be performed. |
The concept of a MNN pair can be explained by considering cells in each of two data sets.
For each cell in data set 1, the set of k2
nearest cells in data set 2 is identified, based on the Euclidean distance in expression space.
For each cell in data set 2, the set of k1
nearest cells in data set 1 is similarly identified.
Two cells in different batches are considered to be MNNs if each cell is in the other's set.
The value of k
can be interpreted as the minimum size of a subpopulation in each batch.
Larger values allow for more MNN pairs to be obtained, which improves the stability of batch correction in fastMNN
and mnnCorrect
.
It also increases robustness against non-orthogonality, which would otherwise result in MNN pairs being detected on the “surface” of the distribution.
Obviously, though, values of k
should not be too large, as this would result in MNN pairs being inappropriately identified between biologically distinct populations.
A list containing the integer vectors first
and second
.
Corresponding entries in first
and second
specify a MNN pair of cells from data1
and data2
, respectively.
Aaron Lun
queryKNN
for the underlying neighbor search code.
B1 <- matrix(rnorm(10000), ncol=50) # Batch 1 B2 <- matrix(rnorm(10000), ncol=50) # Batch 2 out <- findMutualNN(B1, B2, k1=20) head(out$first) head(out$second)