The external R package igraph is required for the computation of the normalized mutual information to assess the results of the clustering.

library(igraph)
## 
## Attaching package: 'igraph'
## The following objects are masked from 'package:stats':
## 
##     decompose, spectrum
## The following object is masked from 'package:base':
## 
##     union
library(SIMLR)

We now run SIMLR as an example on an input dataset from Buettner, Florian, et al. “Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells.” Nature biotechnology 33.2 (2015): 155-160. For this dataset we have a ground true of 3 cell populations, i.e., clusters.

set.seed(11111)
data(BuettnerFlorian)
example = SIMLR(X = BuettnerFlorian$in_X, c = BuettnerFlorian$n_clust, cores.ratio = 0)
## Computing the multiple Kernels.
## Performing network diffiusion.
## Iteration:  1 
## Iteration:  2 
## Iteration:  3 
## Iteration:  4 
## Iteration:  5 
## Iteration:  6 
## Iteration:  7 
## Iteration:  8 
## Iteration:  9 
## Iteration:  10 
## Iteration:  11
## Warning in SIMLR(X = BuettnerFlorian$in_X, c = BuettnerFlorian$n_clust, : Maybe
## you should set a larger value of c.
## Performing t-SNE.
## Epoch: Iteration # 100  error is:  0.1293406 
## Epoch: Iteration # 200  error is:  0.07376446 
## Epoch: Iteration # 300  error is:  0.05963893 
## Epoch: Iteration # 400  error is:  0.05958424 
## Epoch: Iteration # 500  error is:  0.05953258 
## Epoch: Iteration # 600  error is:  0.05948551 
## Epoch: Iteration # 700  error is:  0.05944139 
## Epoch: Iteration # 800  error is:  0.05940026 
## Epoch: Iteration # 900  error is:  0.05936119 
## Epoch: Iteration # 1000  error is:  0.05932415 
## Performing Kmeans.
## Performing t-SNE.
## Epoch: Iteration # 100  error is:  10.98628 
## Epoch: Iteration # 200  error is:  0.8364971 
## Epoch: Iteration # 300  error is:  0.6978426 
## Epoch: Iteration # 400  error is:  0.5367967 
## Epoch: Iteration # 500  error is:  0.419038 
## Epoch: Iteration # 600  error is:  0.4244949 
## Epoch: Iteration # 700  error is:  0.4438908 
## Epoch: Iteration # 800  error is:  0.437931 
## Epoch: Iteration # 900  error is:  0.383611 
## Epoch: Iteration # 1000  error is:  0.5312541

We now compute the normalized mutual information between the inferred clusters by SIMLR and the true ones. This measure with values in [0,1], allows us to assess the performance of the clustering with higher values reflecting better performance.

nmi_1 = compare(BuettnerFlorian$true_labs[,1], example$y$cluster, method="nmi")
print(nmi_1)
## [1] 0.888298

As a further understanding of the results, we now visualize the cell populations in a plot.

plot(example$ydata, 
    col = c(topo.colors(BuettnerFlorian$n_clust))[BuettnerFlorian$true_labs[,1]], 
    xlab = "SIMLR component 1",
    ylab = "SIMLR component 2",
    pch = 20,
    main="SIMILR 2D visualization for BuettnerFlorian")
Visualization of the 3 cell populations retrieved by SIMLR on the dataset by Florian, et al.

Figure 1: Visualization of the 3 cell populations retrieved by SIMLR on the dataset by Florian, et al

We also run SIMLR feature ranking on the same inputs to get a rank of the key genes with the related pvalues.

set.seed(11111)
ranks = SIMLR_Feature_Ranking(A=BuettnerFlorian$results$S,X=BuettnerFlorian$in_X)
## 1 
## 2 
## 3 
## 4 
## 5 
## 6 
## 7 
## 8 
## 9 
## 10 
## 11 
## 12 
## 13 
## 14 
## 15 
## 16 
## 17 
## 18 
## 19 
## 20 
## 21 
## 22 
## 23 
## 24 
## 25 
## 26 
## 27 
## 28 
## 29 
## 30 
## 31 
## 32 
## 33 
## 34 
## 35 
## 36 
## 37 
## 38 
## 39 
## 40 
## 41 
## 42 
## 43 
## 44 
## 45 
## 46 
## 47 
## 48 
## 49 
## 50 
## 51 
## 52 
## 53 
## 54 
## 55 
## 56 
## 57 
## 58 
## 59 
## 60 
## 61 
## 62 
## 63 
## 64 
## 65 
## 66 
## 67 
## 68 
## 69 
## 70 
## 71 
## 72 
## 73 
## 74 
## 75 
## 76 
## 77 
## 78 
## 79 
## 80 
## 81 
## 82 
## 83 
## 84 
## 85 
## 86 
## 87 
## 88 
## 89 
## 90 
## 91 
## 92 
## 93 
## 94 
## 95 
## 96 
## 97 
## 98 
## 99 
## 100
head(ranks$pval)
## [1] 1.086748e-128  1.189327e-90  5.504924e-80  4.652359e-75  5.593957e-73
## [6]  3.373056e-69
head(ranks$aggR)
## [1] 5701 1689 7549   57 2653 8081

Similarly, we show an example for SIMLR large scale on an input dataset being a reduced version of the dataset provided in Buettner, Zeisel, Amit, et al. “Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq.” Science 347.6226 (2015): 1138-1142. For this dataset we have a ground true of 9 cell populations, i.e., clusters. But we should notice that here we use for computational reasons a reduced version of the data, so please refer to the original publication for the full data.

set.seed(11111)
data(ZeiselAmit)
example_large_scale = SIMLR_Large_Scale(X = ZeiselAmit$in_X, c = ZeiselAmit$n_clust, kk = 10)
## Performing fast PCA.
## Performing k-nearest neighbour search.
## Computing the multiple Kernels.
## Performing the iterative procedure  5  times.
## Iteration:  1 
## Iteration:  2 
## Iteration:  3 
## Iteration:  4 
## Iteration:  5 
## Performing Kmeans.
## Performing t-SNE.
## The main loop will be now performed with a maximum of 300 iterations.
## Performing iteration 1.
## Performing iteration 2.
## Performing iteration 3.
## Performing iteration 4.
## Performing iteration 5.
## Performing iteration 6.
## Performing iteration 7.
## Performing iteration 8.
## Performing iteration 9.
## Performing iteration 10.
## Performing iteration 11.
## Performing iteration 12.
## Performing iteration 13.
## Performing iteration 14.
## Performing iteration 15.
## Performing iteration 16.
## Performing iteration 17.
## Performing iteration 18.
## Performing iteration 19.
## Performing iteration 20.
## Performing iteration 21.
## Performing iteration 22.
## Performing iteration 23.
## Performing iteration 24.
## Performing iteration 25.
## Performing iteration 26.
## Performing iteration 27.
## Performing iteration 28.
## Performing iteration 29.
## Performing iteration 30.
## Performing iteration 31.
## Performing iteration 32.
## Performing iteration 33.
## Performing iteration 34.
## Performing iteration 35.
## Performing iteration 36.
## Performing iteration 37.
## Performing iteration 38.
## Performing iteration 39.
## Performing iteration 40.
## Performing iteration 41.
## Performing iteration 42.
## Performing iteration 43.
## Performing iteration 44.
## Performing iteration 45.
## Performing iteration 46.
## Performing iteration 47.
## Performing iteration 48.
## Performing iteration 49.
## Performing iteration 50.
## Performing iteration 51.
## Performing iteration 52.
## Performing iteration 53.
## Performing iteration 54.
## Performing iteration 55.
## Performing iteration 56.
## Performing iteration 57.
## Performing iteration 58.
## Performing iteration 59.
## Performing iteration 60.
## Performing iteration 61.
## Performing iteration 62.
## Performing iteration 63.
## Performing iteration 64.
## Performing iteration 65.
## Performing iteration 66.
## Performing iteration 67.
## Performing iteration 68.
## Performing iteration 69.
## Performing iteration 70.
## Performing iteration 71.
## Performing iteration 72.
## Performing iteration 73.
## Performing iteration 74.
## Performing iteration 75.
## Performing iteration 76.
## Performing iteration 77.
## Performing iteration 78.
## Performing iteration 79.
## Performing iteration 80.
## Performing iteration 81.
## Performing iteration 82.
## Performing iteration 83.
## Performing iteration 84.
## Performing iteration 85.
## Performing iteration 86.
## Performing iteration 87.
## Performing iteration 88.
## Performing iteration 89.
## Performing iteration 90.
## P