Introduction

Visualization of next generation sequencing (NGS) data at various genomic features on a genome-wide scale provides an effective way of exploring and communicating experimental results on one hand, yet poses as a tremendous challenge on the other hand, due to the huge amount of data to be processed. Existing software tools like deeptools, ngs.plot, CoverageView and metagene2, while having attractive features and perform reasonably well in relatively simple scenarios, like plotting coverage profiles of fixed genomic loci or regions, have serious limitations in terms of efficiency and flexibility. For instance, deeptools requires 3 steps (3 sub-programs to be run) to generate plots from input files: first, convert .bam files to .bigwig format; second, compute coverage matrix; and last, plot genomic profiles. Huge amount of intermediate data are generated along the way and additional efforts have to be made to integrate these 3 closely related steps. All of them focus on plotting signals within genomic regions or around genomic loci, but not within or around combinations of genomic features. None of them have the capability of performing statistical tests on the data displayed in the profile plots.

To meet the diverse needs of experimental biologists, we have developed GenomicPlot using rich resources available on the R platform (particularly, the Bioconductor). Our GenomicPlot has the following major features:

  • Generating genomic (with introns) or metagenomic (without introns) plots around gene body and its upstream and downstream regions, the gene body can be further segmented into 5ā€™ UTR, CDS and 3ā€™ UTR
  • Plotting genomic profiles around the start and end of genomic features (like exons or introns), or custom defined genomic regions
  • Plotting distance between sample peaks and genomic features, or distance from one set of peaks to another set of peaks
  • Plotting peak annotation statistics (distribution in different type of genes, and in different parts of genes)
  • Plotting peak overlaps as Venn diagrams
  • All profile line plots have error bands
  • Random features can be generated and plotted to serve as contrast to real features
  • Statistical analysis results on user defined regions are plotted along with the profile plots

Installation

The following packages are prerequisites:

GenomicRanges (>= 1.46.1), GenomicFeatures, Rsamtools, ggplot2 (>= 3.3.5), tidyr, rtracklayer (>= 1.54.0), plyranges (>= 1.14.0), dplyr (>= 1.0.8), cowplot (>= 1.1.1), VennDiagram, ggplotify, GenomeInfoDb, IRanges, ComplexHeatmap, RCAS (>= 1.20.0), scales (>= 1.2.0), GenomicAlignments (>= 1.30.0), edgeR, forcats, circlize, viridis, ggsignif (>= 0.6.3), ggsci (>= 2.9), genomation (>= 1.26.0), ggpubr

You can install the current release version from Bioconductor:

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("GenomicPlot")

or the development version from Github with:

if (!require("remotes", quietly = TRUE))
   install.packages("remotes")
remotes::install_github("shuye2009/GenomicPlot", 
                        build_manual = TRUE, 
                        build_vignettes = TRUE)

Core functions

0.1 Plot gene/metagene with 5ā€™UTR, CDS and 3ā€™UTR

The lengths of each part of the genes are prorated based on the median length of 5ā€™UTR, CDS and 3ā€™UTR of protein-coding genes in the genome. The total length (including upstream and downstream extensions) are divided into the specified number of bins. Subsets of genes can be plotted as overlays for comparison.

suppressPackageStartupMessages(library(GenomicPlot, quietly = TRUE))
## Warning: replacing previous import 'Biostrings::pattern' by 'grid::pattern'
## when loading 'genomation'
txdb <- AnnotationDbi::loadDb(system.file("extdata", "txdb.sql", 
                                          package = "GenomicPlot"))
## Loading required package: GenomicFeatures
## Loading required package: AnnotationDbi
## Loading required package: Biobase
## Welcome to Bioconductor
## 
##     Vignettes contain introductory material; view with
##     'browseVignettes()'. To cite Bioconductor, see
##     'citation("Biobase")', and for packages 'citation("pkgname")'.
data(gf5_meta)

queryfiles <- system.file("extdata", "treat_chr19.bam", 
                          package = "GenomicPlot")
names(queryfiles) <- "clip_bam"
inputfiles <- system.file("extdata", "input_chr19.bam", 
                          package = "GenomicPlot")
names(inputfiles) <- "clip_input"

bamimportParams <- setImportParams(
  offset = -1, fix_width = 0, fix_point = "start", norm = TRUE,
  useScore = FALSE, outRle = TRUE, useSizeFactor = FALSE, genome = "hg19"
)

plot_5parts_metagene(
  queryFiles = queryfiles,
  gFeatures_list = list("metagene" = gf5_meta),
  inputFiles = inputfiles,
  scale = FALSE,
  verbose = FALSE,
  transform = NA,
  smooth = TRUE,
  stranded = TRUE,
  outPrefix = NULL,
  importParams = bamimportParams,
  heatmap = TRUE,
  rmOutlier = 0,
  nc = 2
)
## 565 [set_seqinfo]
## 418 [set_seqinfo]
## 75% of values are not unique, heatmap may not show
##                     signals effectively
## 
## [draw_matrix_heatmap] finished!
## 
## 75% of values are not unique, heatmap may not show
##                     signals effectively
## 
## [draw_matrix_heatmap] finished!
## 
## 75% of values are not unique, heatmap may not show
##                     signals effectively
## 
## [draw_matrix_heatmap] finished!
## 
## [draw_region_profile] started ...
## 
## [draw_region_profile] finished!
## 
## [draw_region_profile] started ...
## 
## [draw_region_profile] finished!
## 
## [draw_region_profile] started ...
## 
## [draw_region_profile] finished!
## 
## [draw_region_profile] started ...
## 
## [draw_region_profile] finished!

0.2 Plot along the ranges of genomic features

Signal profiles along with heatmaps in genomic features or user defined genomic regions provided through a .bed file or narrowPeak file can be plotted. Multiple samples (.bam files) and multiple set of regions (.bed file) can be overlayed on the same figure, or can be output as various combinations. When input files (for input samples) are available, ratio-over-input are displayed as well. Statistical comparisons between samples or between features can be plotted as boxplots or barplots of meansĀ±SE.

centerfiles <- system.file("extdata", "test_chip_peak_chr19.narrowPeak", 
                           package = "GenomicPlot")
names(centerfiles) <- c("NarrowPeak")
queryfiles <- c(
  system.file("extdata", "chip_treat_chr19.bam", package = "GenomicPlot")
)
names(queryfiles) <- c("chip_bam")
inputfiles <- c(
  system.file("extdata", "chip_input_chr19.bam", package = "GenomicPlot")
)
names(inputfiles) <- c("chip_input")

chipimportParams <- setImportParams(
  offset = 0, fix_width = 150, fix_point = "start", norm = TRUE,
  useScore = FALSE, outRle = TRUE, useSizeFactor = FALSE, genome = "hg19"
)

plot_region(
  queryFiles = queryfiles,
  centerFiles = centerfiles,
  inputFiles = inputfiles,
  nbins = 100,
  heatmap = TRUE,
  scale = FALSE,
  regionName = "narrowPeak",
  importParams = chipimportParams,
  verbose = FALSE,
  fiveP = -500,
  threeP = 500,
  smooth = TRUE,
  transform = NA,
  stranded = TRUE,
  outPrefix = NULL,
  Ylab = "Coverage/base/peak",
  rmOutlier = 0,
  nc = 2
)
## [set_seqinfo]
## [set_seqinfo]
## [set_seqinfo]
## [check_constraints]
## [set_seqinfo]
## [check_constraints]
## [set_seqinfo]
## [check_constraints]
## [set_seqinfo]
## [check_constraints]
## [set_seqinfo]
## [check_constraints]
## [set_seqinfo]
## [check_constraints]
## [set_seqinfo]
## 75% of values are not unique, heatmap may not show
##                     signals effectively
## [draw_matrix_heatmap] finished!
## 75% of values are not unique, heatmap may not show
##                     signals effectively
## [draw_matrix_heatmap] finished!
## [draw_region_profile] started ...
## [draw_region_profile] finished!
## [draw_region_profile] started ...
## [draw_region_profile] finished!
## [draw_region_profile] started ...
## [draw_region_profile] finished!
## [draw_region_profile] started ...
## [draw_region_profile] finished!
## 75% of values are not unique, heatmap may not show
##                     signals effectively
## [draw_matrix_heatmap] finished!
## [draw_region_profile] started ...
## [draw_region_profile] finished!

0.3 Plot genomic loci (start, end or center of a feature)

Difference in signal intensity within specific regions around the reference loci can be tested, and the test statistics can be plotted as boxplot and barplot of meanĀ±SE. The test can be done among loci or among samples.

centerfiles <- c(
  system.file("extdata", "test_clip_peak_chr19.bed", package = "GenomicPlot"),
  system.file("extdata", "test_chip_peak_chr19.bed", package = "GenomicPlot")
)
names(centerfiles) <- c("iCLIPPeak", "SummitPeak")
queryfiles <- c(
  system.file("extdata", "chip_treat_chr19.bam", package = "GenomicPlot")
)
names(queryfiles) <- c("chip_bam")
inputfiles <- c(
  system.file("extdata", "chip_input_chr19.bam", package = "GenomicPlot")
)
names(inputfiles) <- c("chip_input")

plot_locus(
  queryFiles = queryfiles,
  centerFiles = centerfiles,
  ext = c(-500, 500),
  hl = c(-100, 100),
  shade = TRUE,
  smooth = TRUE,
  importParams = chipimportParams,
  binSize = 10,
  refPoint = "center",
  Xlab = "Center",
  inputFiles = inputfiles,
  stranded = TRUE,
  scale = FALSE,
  outPrefix = NULL,
  verbose = FALSE,
  transform = NA,
  rmOutlier = 0,
  Ylab = "Coverage/base/peak",
  statsMethod = "wilcox.test",
  heatmap = TRUE,
  nc = 2
)
## [set_seqinfo]
## [set_seqinfo]
## [set_seqinfo]
## [set_seqinfo]
## [check_constraints]
## [set_seqinfo]
## [check_constraints]
## [set_seqinfo]
## 75% of values are not unique, heatmap may not show
##                     signals effectively
## [draw_matrix_heatmap] finished!
## 
## [draw_matrix_heatmap] finished!
## 75% of values are not unique, heatmap may not show
##                     signals effectively
## [draw_matrix_heatmap] finished!
## 75% of values are not unique, heatmap may not show
##                     signals effectively
## [draw_matrix_heatmap] finished!
## [draw_locus_profile] started ...
## [draw_locus_profile] finished
## [draw_locus_profile] started ...
## [draw_locus_profile] finished
## [draw_locus_profile] started ...
## [draw_locus_profile] finished
## [draw_locus_profile] started ...
## [draw_locus_profile] finished
## [draw_locus_profile] started ...
## [draw_locus_profile] finished
## [draw_locus_profile] started ...
## [draw_locus_profile] finished
## [draw_locus_profile] started ...
## [draw_locus_profile] finished
## [draw_locus_profile] started ...
## [draw_locus_profile] finished
## 75% of values are not unique, heatmap may not show
##                     signals effectively
## [draw_matrix_heatmap] finished!
## 
## [draw_matrix_heatmap] finished!
## [draw_locus_profile] started ...
## [draw_locus_profile] finished
## [draw_locus_profile] started ...
## [draw_locus_profile] finished
## [draw_locus_profile] started ...
## [draw_locus_profile] finished
## [draw_locus_profile] started ...
## [draw_locus_profile] finished