compare_motifs {universalmotif} | R Documentation |
Compare motifs using four available metrics: Pearson correlation coefficient (Pietrokovski 1996), Euclidean distance (Choi et al. 2004), Sandelin-Wasserman similarity (Sandelin and Wasserman 2004), and Kullback-Leibler divergence (Roepcke et al. 2005).
compare_motifs(motifs, compare.to, db.scores, use.freq = 1, use.type = "PPM", method = "MPCC", tryRC = TRUE, min.overlap = 6, min.mean.ic = 0.5, relative_entropy = FALSE, normalise.scores = FALSE, max.p = 0.01, max.e = 10, progress = TRUE, BP = FALSE)
motifs |
See |
compare.to |
|
db.scores |
|
use.freq |
|
use.type |
|
method |
|
tryRC |
|
min.overlap |
|
min.mean.ic |
|
relative_entropy |
|
normalise.scores |
|
max.p |
|
max.e |
|
progress |
|
BP |
|
Comparisons are calculated between two motifs at a time. All possible alignments are scored, and the best score is reported. Scores are calculated per position and summed, unless the 'mean' version of the specific metric is chosen. If using a similarity metric, then the sum of scores will favour comparisons between longer motifs; and for distance metrics, the sum of scores will favour comparisons between short motifs. This can be avoided by using the 'mean' of scores.
PCC: Pearson correlation coefficient
Per position:
PCC = sum(col1 * col2) / sqrt(sum(col1^2) * sum(col2^2))
MPCC: Mean PCC
MPCC = mean(PCC)
EUCL: Euclidian distance
Per position:
EUCL = sqrt(sum((col1 - col2)^2)) / sqrt(2)
MEUCL: Mean EUCL
MEUCL = sum(EUCL) / ncol(alignment)
SW: Sandelin-Wasserman similarity
Per position:
SW = 2 - sum((col1 - col2)^2)
MSW: Mean SW
MSW = mean(SW)
KL: Kullback-Leibler divergence
Per position:
KL = 0.5 * sum(col1 * log(col1/col2) + col2 * log(col2/col1))
MKL: Mean Kullback-Leibler divergence
MKL = mean(KL)
To note regarding p-values: p-values are pre-computed using the
make_DBscores
function. If not given, then uses a set of internal
precomputed p-values from the JASPAR2018 CORE motifs. These precalculated
scores are dependent on the length of the motifs being compared; this takes
into account that comparing small motifs with larger motifs leads to higher
scores, since the probability of finding a higher scoring alignment is
higher.
The default p-values have been precalculated for regular DNA motifs; they
are of little use for motifs with a different number of alphabet letters
(or even the multifreq
slot).
matrix
if compare.to
is missing; data.frame
otherwise.
PCC: 0 represents complete distance, >0 similarity.
MPCC: 0 represents complete distance, 1 complete similarity.
EUCL: 0 represents complete similarity, >0 distance.
MEUCL: 0 represents complete similarity, sqrt(2) complete distance.
SW: 0 represents complete distance, >0 similarity.
MSW: 0 represents complete distance, 2 complete similarity.
KL: 0 represents complete similarity, >0 distance.
MKL: 0 represents complete similarity, >0 complete distance.
Benjamin Jean-Marie Tremblay, b2tremblay@uwaterloo.ca
Choi I, Kwon J, Kim S (2004). “Local feature frequency profile: a method to measure structural similarity in proteins.” PNAS, 101, 3797–3802.
Khan A, Fornes O, Stigliani A, Gheorghe M, Castro-Mondragon J, van der Lee R, Bessy A, Cheneby J, Kulkarni SR, Tan G, Baranasic D, Arenillas D, Sandelin A, Vandepoele K, Lenhard B, Ballester B, Wasserman W, Parcy F, Mathelier A (2018). “JASPAR 2018: update of the open-access database of transcription factor binding profiles and its web framework.” Nucleic Acids Research, 46, D260-D266.
Pietrokovski S (1996). “Searching databases of conserved sequence regions by aligning protein multiple-alignments.” Nucleic Acids Research, 24, 3836–3845.
Roepcke S, Grossmann S, Rahmann S, Vingron M (2005). “T-Reg Comparator: an analysis tool for the comparison of position weight matrices.” Nucleic Acids Research, 33, W438–W441.
Sandelin A, Wasserman W (2004). “Constrained binding site diversity within families of transcription factors enhances pattern discovery bioinformatics.” Journal of Molecular Biology, 338(2), 207–215.
convert_motifs()
, motif_tree()
, view_motifs()
motif1 <- create_motif() motif2 <- create_motif() motif1vs2 <- compare_motifs(list(motif1, motif2), method = "MPCC") ## to get a dist object: as.dist(1 - motif1vs2)