CAM {CAMTHC} | R Documentation |
This function performs a fully unsupervised computational deconvolution to identify marker genes that define each of the multiple subpopulations, and estimate the proportions of these subpopulations in the mixture tissues as well as their respective expression profiles.
CAM(data, K = NULL, corner.strategy = 2, dim.rdc = 10, thres.low = 0.05, thres.high = 0.95, cluster.method = c("K-Means", "apcluster"), cluster.num = 50, MG.num.thres = 20, lof.thres = 0.02, cores = NULL)
data |
Matrix of mixture expression profiles. Data frame, SummarizedExperiment or ExpressionSet object will be internally coerced into a matrix. Each row is a gene and each column is a sample. Data should be in non-log linear space with non-negative numerical values (i.e. >= 0). Missing values are not supported. All-zero rows will be removed internally. |
K |
The candidate subpopulation number(s), e.g. K = 2:8. |
corner.strategy |
The method to find corners of convex hull. 1: minimum sum of margin-of-errors; 2: minimum sum of reconstruction errors. The default is 2. |
dim.rdc |
Reduced data dimension; should be not less than maximum candidate K. |
thres.low |
The lower bound of percentage of genes to keep for CAM with ranked norm. The value should be between 0 and 1. The default is 0.05. |
thres.high |
The higher bound of percentage of genes to keep for CAM with ranked norm. The value should be between 0 and 1. The default is 0.95. |
cluster.method |
The method to do clustering.
The default "K-Means" will use |
cluster.num |
The number of clusters; should be much larger than K. The default is 50. |
MG.num.thres |
The clusters with the gene number smaller than MG.num.thres will be treated as outliers. The default is 20. |
lof.thres |
Remove local outlier using |
cores |
The number of system cores for parallel computing. If not provided, one core for each element in K will be invoked. Zero value will disable parallel computing. |
This function includes three necessary steps to decompose a matrix
of mixture expression profiles: data preprocessing, marker gene cluster
search, and matrix decomposition. They are implemented in
CAMPrep
, CAMMGCluster
and
CAMASest
, separately.
More details can be found in the help document of each function.
For this function, you needs to specify the range of possible subpopulation numbers and the percentage of low/high-expressed genes to be removed. Typically, 30% ~ 50% low-expressed genes can be removed from gene expression data. The removal of high-expressed genes has much less impact on results, and usually set to be 0% ~ 10%.
This function can also analyze other molecular expression data, such as proteomics data. Much less low-expressed proteins need to be removed, e.g. 0% ~ 10%, due to a limited number of proteins without missing values.
An object of class "CAMObj
" containing the following
components:
PrepResult |
An object of class " |
MGResult |
A list of " |
ASestResult |
A list of " |
#obtain data data(ratMix3) data <- ratMix3$X #set seed to generate reproducible results set.seed(111) #CAM with known subpopulation number rCAM <- CAM(data, K = 3, dim.rdc = 3, thres.low = 0.30, thres.high = 0.95) #A larger dim.rdc can improve performance but increase time complexity #CAM with a range of subpopulation number rCAM <- CAM(data, K = 2:5, dim.rdc = 10, thres.low = 0.30, thres.high = 0.95) #Use "apcluster" to aggregate gene vectors in CAM rCAM <- CAM(data, K = 2:5, dim.rdc = 10, thres.low = 0.30, thres.high = 0.95, cluster.method = 'apcluster')