entropy {BioQC} | R Documentation |
These functions calculate Shannon entropy and related concepts, including diversity, specificity, and specialization. They can be used to quantify gene expression profiles.
entropy(vector) entropyDiversity(mat, norm=FALSE) entropySpecificity(mat, norm=FALSE) sampleSpecialization(mat, norm=TRUE)
vector |
A vector of numbers, or characters. Discrete probability of each item is calculated and the Shannon entropy is returned. |
mat |
A matrix (usually an expression matrix), with genes (features) in rows and samples in columns. |
norm |
Logical value. If set to |
Shannon entropy can be used as measures of gene expression specificity, as well as measures of tissue diversity and specialization. See references below.
We use 2
as base for the entropy calculation, because in this
base the unit of entropy is bit.
entropy
returns one entropy value. entropyDiversity
and
sampleSpecialization
returns a vector as long as the column
number of the input matrix. entropySpecificity
returns a vector
of the length of the row number of the input matrix, namely the
specificity score of genes.
Jitao David Zhang <jitao_david.zhang@roche.com>
Martinez and Reyes-Valdes (2008) Defining diversity, specialization, and gene specificity in transcriptomes through information theory. PNAS 105(28):9709–9714
myVec0 <- 1:9 entropy(myVec0) ## log2(9) myVec1 <- rep(1, 9) entropy(myVec1) myMat <- rbind(c(3,4,5),c(6,6,6), c(0,2,4)) entropySpecificity(myMat) entropySpecificity(myMat, norm=TRUE) entropyDiversity(myMat) entropyDiversity(myMat, norm=TRUE) sampleSpecialization(myMat) sampleSpecialization(myMat,norm=TRUE) myRandomMat <- matrix(runif(1000), ncol=20) entropySpecificity(myRandomMat) entropySpecificity(myRandomMat, norm=TRUE) entropyDiversity(myRandomMat) entropyDiversity(myRandomMat, norm=TRUE) sampleSpecialization(myRandomMat) sampleSpecialization(myRandomMat,norm=TRUE)