motif_pvalue {universalmotif} | R Documentation |
For calculating p-values/logodds scores for any number of motifs.
motif_pvalue(motifs, score, pvalue, bkg.probs, use.freq = 1, k = 6, progress = TRUE, BP = FALSE)
motifs |
See |
score |
|
pvalue |
|
bkg.probs |
|
use.freq |
|
k |
|
progress |
|
BP |
|
Calculating p-values for motifs can be very computationally intensive. This is due to how p-values must be calculated: for a given score, all possible sequences which score equal or higher must be found, and the probability for each of these sequences (based on background probabilities) summed. For a DNA motif of length 10, the number of possible unique sequences is 4^10 = 1,048,576. Finding all possible sequences higher than a given score can be done very efficiently and quickly with a branch-and-bound algorithm, but as the motif length increases this calculation becomes impractical. To get around this, the p-value calculation can be approximated.
In order to calculate p-values for longer motifs, this function uses the
approximation proposed by Hartmann et al. (2013), where
the motif is subset, p-values calculated for the subsets, and finally
combined for a total p-value. The smaller the size of the subsets, the
faster the calculation; but also, the bigger the approximation. This can be
controlled by setting k
. In fact, for smaller motifs (< 13 positions)
calculating exact p-values can be done individually in reasonable time by
setting k = 12
.
To calculate a score based on a given p-value, the function simply guesses different scores until it finds one which when used to calculate a p-value, returns a p-value reasonably close to the given p-value. For low p-values, this usually only takes a couple of guesses.
Note that the approximation is mostly on the conservative side. The higher
the number of motif subsets, the worse the approximation. Furthermore,
the speed of motif_pvalue()
increases with a decreasing p-value.
numeric
A vector of scores/p-values.
Benjamin Jean-Marie Tremblay, b2tremblay@uwaterloo.ca
Hartmann H, Guthöhrlein E, Siebert M, and S. Luehr J. Söding (2013). “P-value-based regulatory motif discovery using positional weight matrices.” Genome Research, 23, 181–194.
data(examplemotif) ## p-value/score calculations are performed using the PWM version of the ## motif; these calculations do not work if any -Inf values are present examplemotif["pseudocount"] <- 1 # or examplemotif <- BiocGenerics::normalize(examplemotif) ## get a minimum score based on a p-value motif_pvalue(examplemotif, pvalue = 0.001) ## get the probability of a particular sequence hit motif_pvalue(examplemotif, score = 0) ## the calculations can be performed for multiple motifs motif_pvalue(list(examplemotif, examplemotif), pvalue = c(0.001, 0.0001))