M3C {M3C} | R Documentation |
This is the M3C core function, which is a reference-based consensus clustering algorithm. The basic idea is to use a multi-core enabled Monte Carlo simulation to drive the creation of a null distribution of stability scores. The Monte Carlo simulations maintains the feature correlation structure of the input data. Then the null distribution is used to compare the reference scores with the real scores and an empirical p value is calculated for every value of K to test the null hypothesis K=1. We derive the Relative Cluster Stability Index (RCSI) as a metric for selecting K, which is based on a comparison against the reference mean.
M3C(mydata, montecarlo = TRUE, cores = 1, iters = 100, maxK = 10, des = NULL, ref_method = c("reverse-pca", "chol"), repsref = 100, repsreal = 100, clusteralg = c("pam", "km", "spectral", "hc"), distance = "euclidean", pacx1 = 0.1, pacx2 = 0.9, printres = FALSE, printheatmaps = FALSE, showheatmaps = FALSE, seed = NULL, removeplots = FALSE, dend = FALSE, silent = FALSE, doanalysis = FALSE, analysistype = c("survival", "kw", "chi"), variable = NULL)
mydata |
Data frame or matrix: Contains the data, with samples as columns and rows as features |
montecarlo |
Logical flag: whether to run the Monte Carlo simulation or not (recommended: TRUE) |
cores |
Numerical value: how many cores to split the monte carlo simulation over |
iters |
Numerical value: how many Monte Carlo iterations to perform (default: 100, recommended: 5-200) |
maxK |
Numerical value: the maximum number of clusters to test for, K (default: 10) |
des |
Data frame: contains annotation data for the input data for automatic reordering |
ref_method |
Character string: refers to which reference method to use (recommended: leaving as default) |
repsref |
Numerical value: how many resampling reps to use for reference (default: 100, recommended: 100-250) |
repsreal |
Numerical value: how many resampling reps to use for real data (default: 100, recommended: 100-250) |
clusteralg |
String: dictates which inner clustering algorithm to use for M3C |
distance |
String: dictates which distance metric to use for M3C (recommended: leaving as default) |
pacx1 |
Numerical value: The 1st x co-ordinate for calculating the pac score from the CDF (default: 0.1) |
pacx2 |
Numerical value: The 2nd x co-ordinate for calculating the pac score from the CDF (default: 0.9) |
printres |
Logical flag: whether to print all results into current directory |
printheatmaps |
Logical flag: whether to print all the heatmaps into current directory |
showheatmaps |
Logical flag: whether to show the heatmaps on screen |
seed |
Numerical value: fixes the seed if you want to repeat results, set the seed to 123 for example here |
removeplots |
Logical flag: whether to remove all plots |
dend |
Logical flag: whether to compute the dendrogram and p values for the optimal K or not |
silent |
Logical flag: whether to remove messages or not |
doanalysis |
Logical flag: whether to analyse the clinical variable supplied (univariate only) |
analysistype |
Character string: refers to which kind of statistical analysis to do on the data, survival, Kruskal-Wallis (kw), or chi-squared (chi) |
variable |
Character string: if not doing survival what is the dependant variable (column name) called in the data frame |
A list, containing: 1) the stability results and 2) all the output data (another list) 3) reference stability scores (see vignette for more details on how to easily access)
res <- M3C(mydata, cores=1, iters=100, ref_method = 'reverse-pca', montecarlo = TRUE,printres = FALSE, maxK = 10, showheatmaps = FALSE, repsreal = 100, repsref = 100,printheatmaps = FALSE, seed = 123, des = desx)