add_multifreq {universalmotif} | R Documentation |
If the original sequences are available for a particular motif, then they can be used to generate higher-order PPM matrices.
add_multifreq(motif, sequences, add.k = 2:3, RC = FALSE, threshold = 0.01, threshold.type = "logodds", motifs.perseq = 1)
motif |
See |
sequences |
|
add.k |
|
RC |
|
threshold |
|
threshold.type |
|
motifs.perseq |
|
At each position in the motif, then the probability of each k-let
covering from the initial position to ncol - 1 is calculated. Only
positions within the motif are considered; this means that the
final k-let probability matrix will have ncol - 1 fewer columns.
Calculating k-let probabilities for the missing columns would be
trivial however, as you would only need the background frequencies.
Since these would not be useful for scan_sequences()
though, they are not calculated.
Currently add_multifreq()
does not try to stay faithful to the default
motif matrix when generating multifreq matrices. This means that if the
sequences used for training are completely different from the actual
motif, the multifreq matrices will be as well. However this is only really
a problem if you supply add_multifreq()
with a set of sequences of the
same length as the motif; in this case add_multifreq()
is forced to
create the multifreq matrices from these sequences. Otherwise
add_multifreq()
will scan the input sequences for the motif and use the
best matches to construct the multifreq matrices.
This 'multifreq' representation is only really useful within the
universalmotif enrivonment. Despite this, if you wish it can be
preserved in text using write_motifs()
.
Note: the number of rows for each k-let matrix is n^k, with n being the number of letters in the alphabet being used. This means that the size of the k-let matrix can become quite large as k increases. For example, if one were to wish to represent a DNA motif of length 10 as a 10-let, this would require a matrix with 1,048,576 rows (though at this point if what you want is to search for exact sequence matches, the motif format itself is not very useful).
A universalmotif object with filled multifreq
slot.
Benjamin Jean-Marie Tremblay, b2tremblay@uwaterloo.ca
scan_sequences()
, convert_motifs()
, write_motifs()
sequences <- create_sequences(seqlen = 10) motif <- create_motif() motif.trained <- add_multifreq(motif, sequences, add.k = 2:4) ## peek at the 2-let matrix: motif.trained["multifreq"]$`2`