SIMLR 1.33.0
The external R package igraph is required for the computation of the normalized mutual information to assess the results of the clustering.
library(igraph)
##
## Attaching package: 'igraph'
## The following objects are masked from 'package:stats':
##
## decompose, spectrum
## The following object is masked from 'package:base':
##
## union
library(SIMLR)
We now run SIMLR as an example on an input dataset from Buettner, Florian, et al. “Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells.” Nature biotechnology 33.2 (2015): 155-160. For this dataset we have a ground true of 3 cell populations, i.e., clusters.
set.seed(11111)
data(BuettnerFlorian)
example = SIMLR(X = BuettnerFlorian$in_X, c = BuettnerFlorian$n_clust, cores.ratio = 0)
## Computing the multiple Kernels.
## Performing network diffiusion.
## Iteration: 1
## Iteration: 2
## Iteration: 3
## Iteration: 4
## Iteration: 5
## Iteration: 6
## Iteration: 7
## Iteration: 8
## Iteration: 9
## Iteration: 10
## Iteration: 11
## Warning in SIMLR(X = BuettnerFlorian$in_X, c = BuettnerFlorian$n_clust, : Maybe
## you should set a larger value of c.
## Performing t-SNE.
## Epoch: Iteration # 100 error is: 0.1293406
## Epoch: Iteration # 200 error is: 0.07376446
## Epoch: Iteration # 300 error is: 0.05963893
## Epoch: Iteration # 400 error is: 0.05958424
## Epoch: Iteration # 500 error is: 0.05953258
## Epoch: Iteration # 600 error is: 0.05948551
## Epoch: Iteration # 700 error is: 0.05944139
## Epoch: Iteration # 800 error is: 0.05940026
## Epoch: Iteration # 900 error is: 0.05936119
## Epoch: Iteration # 1000 error is: 0.05932415
## Performing Kmeans.
## Performing t-SNE.
## Epoch: Iteration # 100 error is: 10.98628
## Epoch: Iteration # 200 error is: 0.8364971
## Epoch: Iteration # 300 error is: 0.6978426
## Epoch: Iteration # 400 error is: 0.5367967
## Epoch: Iteration # 500 error is: 0.419038
## Epoch: Iteration # 600 error is: 0.4244949
## Epoch: Iteration # 700 error is: 0.4438908
## Epoch: Iteration # 800 error is: 0.437931
## Epoch: Iteration # 900 error is: 0.383611
## Epoch: Iteration # 1000 error is: 0.5312541
We now compute the normalized mutual information between the inferred clusters by SIMLR and the true ones. This measure with values in [0,1], allows us to assess the performance of the clustering with higher values reflecting better performance.
nmi_1 = compare(BuettnerFlorian$true_labs[,1], example$y$cluster, method="nmi")
print(nmi_1)
## [1] 0.888298
As a further understanding of the results, we now visualize the cell populations in a plot.
plot(example$ydata,
col = c(topo.colors(BuettnerFlorian$n_clust))[BuettnerFlorian$true_labs[,1]],
xlab = "SIMLR component 1",
ylab = "SIMLR component 2",
pch = 20,
main="SIMILR 2D visualization for BuettnerFlorian")
Figure 1: Visualization of the 3 cell populations retrieved by SIMLR on the dataset by Florian, et al
We also run SIMLR feature ranking on the same inputs to get a rank of the key genes with the related pvalues.
set.seed(11111)
ranks = SIMLR_Feature_Ranking(A=BuettnerFlorian$results$S,X=BuettnerFlorian$in_X)
## 1
## 2
## 3
## 4
## 5
## 6
## 7
## 8
## 9
## 10
## 11
## 12
## 13
## 14
## 15
## 16
## 17
## 18
## 19
## 20
## 21
## 22
## 23
## 24
## 25
## 26
## 27
## 28
## 29
## 30
## 31
## 32
## 33
## 34
## 35
## 36
## 37
## 38
## 39
## 40
## 41
## 42
## 43
## 44
## 45
## 46
## 47
## 48
## 49
## 50
## 51
## 52
## 53
## 54
## 55
## 56
## 57
## 58
## 59
## 60
## 61
## 62
## 63
## 64
## 65
## 66
## 67
## 68
## 69
## 70
## 71
## 72
## 73
## 74
## 75
## 76
## 77
## 78
## 79
## 80
## 81
## 82
## 83
## 84
## 85
## 86
## 87
## 88
## 89
## 90
## 91
## 92
## 93
## 94
## 95
## 96
## 97
## 98
## 99
## 100
head(ranks$pval)
## [1] 1.086748e-128 1.189327e-90 5.504924e-80 4.652359e-75 5.593957e-73
## [6] 3.373056e-69
head(ranks$aggR)
## [1] 5701 1689 7549 57 2653 8081
Similarly, we show an example for SIMLR large scale on an input dataset being a reduced version of the dataset provided in Buettner, Zeisel, Amit, et al. “Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq.” Science 347.6226 (2015): 1138-1142. For this dataset we have a ground true of 9 cell populations, i.e., clusters. But we should notice that here we use for computational reasons a reduced version of the data, so please refer to the original publication for the full data.
set.seed(11111)
data(ZeiselAmit)
example_large_scale = SIMLR_Large_Scale(X = ZeiselAmit$in_X, c = ZeiselAmit$n_clust, kk = 10)
## Performing fast PCA.
## Performing k-nearest neighbour search.
## Computing the multiple Kernels.
## Performing the iterative procedure 5 times.
## Iteration: 1
## Iteration: 2
## Iteration: 3
## Iteration: 4
## Iteration: 5
## Performing Kmeans.
## Performing t-SNE.
## The main loop will be now performed with a maximum of 300 iterations.
## Performing iteration 1.
## Performing iteration 2.
## Performing iteration 3.
## Performing iteration 4.
## Performing iteration 5.
## Performing iteration 6.
## Performing iteration 7.
## Performing iteration 8.
## Performing iteration 9.
## Performing iteration 10.
## Performing iteration 11.
## Performing iteration 12.
## Performing iteration 13.
## Performing iteration 14.
## Performing iteration 15.
## Performing iteration 16.
## Performing iteration 17.
## Performing iteration 18.
## Performing iteration 19.
## Performing iteration 20.
## Performing iteration 21.
## Performing iteration 22.
## Performing iteration 23.
## Performing iteration 24.
## Performing iteration 25.
## Performing iteration 26.
## Performing iteration 27.
## Performing iteration 28.
## Performing iteration 29.
## Performing iteration 30.
## Performing iteration 31.
## Performing iteration 32.
## Performing iteration 33.
## Performing iteration 34.
## Performing iteration 35.
## Performing iteration 36.
## Performing iteration 37.
## Performing iteration 38.
## Performing iteration 39.
## Performing iteration 40.
## Performing iteration 41.
## Performing iteration 42.
## Performing iteration 43.
## Performing iteration 44.
## Performing iteration 45.
## Performing iteration 46.
## Performing iteration 47.
## Performing iteration 48.
## Performing iteration 49.
## Performing iteration 50.
## Performing iteration 51.
## Performing iteration 52.
## Performing iteration 53.
## Performing iteration 54.
## Performing iteration 55.
## Performing iteration 56.
## Performing iteration 57.
## Performing iteration 58.
## Performing iteration 59.
## Performing iteration 60.
## Performing iteration 61.
## Performing iteration 62.
## Performing iteration 63.
## Performing iteration 64.
## Performing iteration 65.
## Performing iteration 66.
## Performing iteration 67.
## Performing iteration 68.
## Performing iteration 69.
## Performing iteration 70.
## Performing iteration 71.
## Performing iteration 72.
## Performing iteration 73.
## Performing iteration 74.
## Performing iteration 75.
## Performing iteration 76.
## Performing iteration 77.
## Performing iteration 78.
## Performing iteration 79.
## Performing iteration 80.
## Performing iteration 81.
## Performing iteration 82.
## Performing iteration 83.
## Performing iteration 84.
## Performing iteration 85.
## Performing iteration 86.
## Performing iteration 87.
## Performing iteration 88.
## Performing iteration 89.
## Performing iteration 90.
## P