compoundPoissonDist {motifcounter} | R Documentation |
This function approximates the distribution of the number of motif hits that emerges from a random DNA sequence of a given length.
compoundPoissonDist(seqlen, overlap, method = "kopp")
seqlen |
Integer-valued vector that defines the lengths of the
individual sequences. For a given DNAStringSet,
this information can be retrieved using |
overlap |
An Overlap object. |
method |
String that defines which method shall be invoked: 'pape' or 'kopp' (see description). Default: method = 'kopp'. |
The distribution can be determined in two alternative ways:
A re-implemented version of the algorithm that was described in Pape et al. Compound poisson approximation of the number of occurrences of a position frequency matrix (PFM) on both strands. 2008 can be invoked using method='pape'. The main purpose of this implementation concerns benchmarking an improved approximation. In contrast to the original model, this implementation can be used with general order-d Markov models.
We provide an improved compound Poisson approximation that
uses more appropriate statistical assumptions concerning
overlapping motif hits and that can be used with order-d
background models as well. The improved version is used by default
with method='kopp'.
Note: Only method='kopp' supports the computation
of the distribution of the number of motif hits w.r.t. scanning
a single DNA strand (see probOverlapHit
).
List containing
Distribution of the number of hits
# Load sequences seqfile = system.file("extdata", "seq.fasta", package = "motifcounter") seqs = Biostrings::readDNAStringSet(seqfile) # Load motif motiffile = system.file("extdata", "x31.tab", package = "motifcounter") motif = t(as.matrix(read.table(motiffile))) # Load background model bg = readBackground(seqs, 1) # Use 100 individual sequences of length 150 bp each seqlen = rep(150, 100) # Compute overlapping probabilities # for scanning the forward DNA strand only op = motifcounter:::probOverlapHit(motif, bg, singlestranded = TRUE) # Computes the compound Poisson distribution dist = motifcounter:::compoundPoissonDist(seqlen, op) #plot(1:length(dist$dist)-1, dist$dist) # Compute overlapping probabilities # for scanning the forward DNA strand only op = motifcounter:::probOverlapHit(motif, bg, singlestranded = FALSE) # Computes the compound Poisson distribution dist = motifcounter:::compoundPoissonDist(seqlen, op) #plot(1:length(dist$dist)-1, dist$dist)