inferPolyASites {TAPseq} | R Documentation |
Infer polyA sites from 10X, Drop-seq or similar 3' enriched sequencing data. Simple function that looks for peaks in read coverage to estimate potential polyA sites. Default parameters are chosen because they work reasonably well with the example data, but they should typically be empirically selected by verifying the output.
inferPolyASites( genes, bam, polyA_downstream = 100, min_cvrg = 0, wdsize = 200, by = 1, extend_downstream = 0, perc_threshold = 0.9, parallel = FALSE )
genes |
|
bam |
Path to .bam file containing aligned reads used for polyA site estimation. |
polyA_downstream |
(numeric) How far downstream of a peak in coverage are polyA sites expected? Somewhat depends on input DNA fragment size. (default: 100). |
min_cvrg |
(numeric) Minimal coverage for peaks to be considered for polyA site estimation (default: 0). |
wdsize |
(numeric) Window size to estimate sequencing coverage along transcripts (default: 200). |
by |
(numeric) Steps in basepairs in which the sliding window should be moved along transcripts to estimate smooth coverage (default: 1). |
extend_downstream |
(numeric) To which amount should transcript annotations be extended downstream when estimating polyA sites (default: 0). A reasonable value (e.g. 100-200 bp) allows to account for polyA sites that fall a few basepairs downstream of terminal exons. |
perc_threshold |
(numeric) Only sequencing coverage peaks within |
parallel |
(logical) Triggers parallel computing using the
|
A GRanges
object containing coordinates of
estimated polyadenylation sites.
library(GenomicRanges) # protein-coding exons of genes within chr11 region data("chr11_genes") target_genes <- split(chr11_genes, f = chr11_genes$gene_name) # subset of target genes for quick example target_genes <- target_genes[18:27] # bam file containing aligned Drop-seq reads dropseq_bam <- system.file("extdata", "chr11_k562_dropseq.bam", package = "TAPseq") # infer polyA sites for all target genes with adjusted parameters. parameter values depend on the # input data and at this stage it's best to try different settings and check the results polyA_sites <- inferPolyASites(target_genes, bam = dropseq_bam, polyA_downstream = 50, wdsize = 100, min_cvrg = 1, parallel = TRUE)