scan_sequences {universalmotif} | R Documentation |
For sequences of any alphabet, scan them using the PWM matrices of a set of input motifs.
scan_sequences(motifs, sequences, threshold = 0.001, threshold.type = "pvalue", RC = FALSE, use.freq = 1, verbose = 1, progress = TRUE, BP = FALSE)
motifs |
See |
sequences |
|
threshold |
|
threshold.type |
|
RC |
|
use.freq |
|
verbose |
|
progress |
|
BP |
|
Similar to Biostrings::matchPWM()
, the scanning method uses
logodds scoring. (To see the scoring matrix for any motif, simply
run convert_type(motif, "PWM")
; for a multifreq
scoring
matrix: apply(motif["multifreq"]$
2, 2, ppm_to_pwm)
). In order
to score a sequence, at each position within a sequence of length equal
to the length of the motif, the scores for each base are summed. If the
score sum is above the desired threshold, it is kept.
If threshold.type = 'logodds'
, then to calculate the minimum
allowed score the maximum possible score for a motif is multiplied
by the value set by threshold
. To determine the maximum
possible score a motif (of type PWM), run
sum(apply(motif['motif'], 2, max))
.
If threshold.type = 'pvalue'
, then threshold logodds scores are
generated using motif_pvalue()
.
Note: memory usage increases exponentially with increasing k
.
data.frame
with each row representing one hit; if the input
sequences are DNAStringSet
or
RNAStringSet
, then an
additional column with the strand is included.
Benjamin Jean-Marie Tremblay, b2tremblay@uwaterloo.ca
Pagès H, Aboyoun P, Gentleman R, DebRoy S (2018). Biostrings: Efficient manipulation of biological strings. R package version 2.48.0.
add_multifreq()
, Biostrings::matchPWM()
,
enrich_motifs()
, motif_pvalue()
## any alphabet can be used ## Not run: set.seed(1) alphabet <- paste(c(letters), collapse = "") motif <- create_motif("hello", alphabet = alphabet) sequences <- create_sequences(alphabet, seqnum = 1000, seqlen = 100000) scan_sequences(motif, sequences) ## End(Not run)