The bluster package provides a flexible and extensible framework for clustering in Bioconductor packages/workflows.
At its core is the
clusterRows() generic that controls dispatch to different clustering algorithms.
We will demonstrate on some single-cell RNA sequencing data from the scRNAseq package;
our aim is to cluster cells into cell populations based on their PC coordinates.
library(scRNAseq) sce <- ZeiselBrainData() # Trusting the authors' quality control, and going straight to normalization. library(scuttle) sce <- logNormCounts(sce) # Feature selection based on highly variable genes. library(scran) dec <- modelGeneVar(sce) hvgs <- getTopHVGs(dec, n=1000) # Dimensionality reduction for work (PCA) and pleasure (t-SNE). set.seed(1000) library(scater) sce <- runPCA(sce, ncomponents=20, subset_row=hvgs) sce <- runUMAP(sce, dimred="PCA") mat <- reducedDim(sce, "PCA") dim(mat)
##  3005 20
Our first algorithm is good old hierarchical clustering, as implemented using
hclust() from the stats package.
This automatically sets the cut height to half the dendrogram height.
library(bluster) hclust.out <- clusterRows(mat, HclustParam()) plotUMAP(sce, colour_by=I(hclust.out))
Advanced users can achieve greater control of the procedure by passing more parameters to the
Here, we use Ward’s criterion for the agglomeration with a dynamic tree cut from the dynamicTreeCut package.
hp2 <- HclustParam(method="ward.D2", cut.dynamic=TRUE) hp2
## class: HclustParam ## metric: euclidean ## method: ward.D2 ## cut.fun: cutreeDynamic ## cut.params(0):
hclust.out <- clusterRows(mat, hp2) plotUMAP(sce, colour_by=I(hclust.out))
Our next algorithm is \(k\)-means clustering, as implemented using the
This requires us to pass in the number of clusters, either as a number:
set.seed(100) kmeans.out <- clusterRows(mat, KmeansParam(10)) plotUMAP(sce, colour_by=I(kmeans.out))