1 Basics

1.1 Install derfinderPlot

R is an open-source statistical environment which can be easily modified to enhance its functionality via packages. derfinderPlot is a R package available via the Bioconductor repository for packages. R can be installed on any operating system from CRAN after which you can install derfinderPlot by using the following commands in your R session:

if (!requireNamespace("BiocManager", quietly = TRUE)) {
    install.packages("BiocManager")
}

BiocManager::install("derfinderPlot")

## Check that you have a valid Bioconductor installation
BiocManager::valid()

1.2 Required knowledge

derfinderPlot is based on many other packages and in particular in those that have implemented the infrastructure needed for dealing with RNA-seq data. A derfinderPlot user is not expected to deal with those packages directly but will need to be familiar with derfinder and for some plots with ggbio.

If you are asking yourself the question “Where do I start using Bioconductor?” you might be interested in this blog post.

1.3 Asking for help

As package developers, we try to explain clearly how to use our packages and in which order to use the functions. But R and Bioconductor have a steep learning curve so it is critical to learn where to ask for help. The blog post quoted above mentions some but we would like to highlight the Bioconductor support site as the main resource for getting help: remember to use the derfinder or derfinderPlot tags and check the older posts. Other alternatives are available such as creating GitHub issues and tweeting. However, please note that if you want to receive help you should adhere to the posting guidelines. It is particularly critical that you provide a small reproducible example and your session information so package developers can track down the source of the error.

1.4 Citing derfinderPlot

We hope that derfinderPlot will be useful for your research. Please use the following information to cite the package and the overall approach. Thank you!

## Citation info
citation("derfinderPlot")
## To cite package 'derfinderPlot' in publications use:
## 
##   Collado-Torres L, Jaffe AE, Leek JT (2017). _derfinderPlot: Plotting
##   functions for derfinder_. doi:10.18129/B9.bioc.derfinderPlot
##   <https://doi.org/10.18129/B9.bioc.derfinderPlot>,
##   https://github.com/leekgroup/derfinderPlot - R package version
##   1.39.0, <http://www.bioconductor.org/packages/derfinderPlot>.
## 
##   Collado-Torres L, Nellore A, Frazee AC, Wilks C, Love MI, Langmead B,
##   Irizarry RA, Leek JT, Jaffe AE (2017). "Flexible expressed region
##   analysis for RNA-seq with derfinder." _Nucl. Acids Res._.
##   doi:10.1093/nar/gkw852 <https://doi.org/10.1093/nar/gkw852>,
##   <http://nar.oxfordjournals.org/content/early/2016/09/29/nar.gkw852>.
## 
## To see these entries in BibTeX format, use 'print(<citation>,
## bibtex=TRUE)', 'toBibtex(.)', or set
## 'options(citation.bibtex.max=999)'.

2 Introduction to derfinderPlot

derfinderPlot (Collado-Torres, Jaffe, and Leek, 2017) is an addon package for derfinder (Collado-Torres, Nellore, Frazee, Wilks, Love, Langmead, Irizarry, Leek, and Jaffe, 2017) with functions that allow you to visualize the results.

While the functions in derfinderPlot assume you generated the data with derfinder, they can be used with other GRanges objects properly formatted.

The functions in derfinderPlot are:

  • plotCluster() is a tailored ggbio (Yin, Cook, and Lawrence, 2012) plot that shows all the regions in a cluster (defined by distance). It shows the base-level coverage for each sample as well as the mean for each group. If these regions overlap any known gene, the gene and the transcript annotation is displayed.
  • plotOverview() is another tailored ggbio (Yin, Cook, and Lawrence, 2012) plot showing an overview of the whole genome. This plot can be useful to observe if the regions are clustered in a subset of a chromosome. It can also be used to check whether the regions match predominantly one part of the gene structure (for example, 3’ overlaps).
  • plotRegionCoverage() is a fast plotting function using R base graphics that shows the base-level coverage for each sample inside a specific region of the genome. If the region overlaps any known gene or intron, the information is displayed. Optionally, it can display the known transcripts. This function is most likely the easiest to use with GRanges objects from other packages.

3 Example

As an example, we will analyze a small subset of the samples from the BrainSpan Atlas of the Human Brain (BrainSpan, 2011) publicly available data.

We first load the required packages.

## Load libraries
suppressPackageStartupMessages(library("derfinder"))
library("derfinderData")
library("derfinderPlot")
## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2

3.1 Analyze data

For this example, we created a small table with the relevant phenotype data for 12 samples: 6 from fetal samples and 6 from adult samples. We chose at random a brain region, in this case the primary auditory cortex (core) and for the example we will only look at data from chromosome 21. Other variables include the age in years and the gender. The data is shown below.

library("knitr")
## Get pheno table
pheno <- subset(brainspanPheno, structure_acronym == "A1C")

## Display the main information
p <- pheno[, -which(colnames(pheno) %in% c(
    "structure_acronym",
    "structure_name", "file"
))]
rownames(p) <- NULL
kable(p, format = "html", row.names = TRUE)
gender lab Age group
1 M HSB114.A1C -0.5192308 fetal
2 M HSB103.A1C -0.5192308 fetal
3 M HSB178.A1C -0.4615385 fetal
4 M HSB154.A1C -0.4615385 fetal
5 F HSB150.A1C -0.5384615 fetal
6 F HSB149.A1C -0.5192308 fetal
7 F HSB130.A1C 21.0000000 adult
8 M HSB136.A1C 23.0000000 adult
9 F HSB126.A1C 30.0000000 adult
10 M HSB145.A1C 36.0000000 adult
11 M HSB123.A1C 37.0000000 adult
12 F HSB135.A1C 40.0000000 adult

We can load the data from derfinderData (Collado-Torres, Jaffe, and Leek, 2024) by first identifying the paths to the BigWig files with derfinder::rawFiles() and then loading the data with derfinder::fullCoverage().

## Determine the files to use and fix the names
files <- rawFiles(system.file("extdata", "A1C", package = "derfinderData"),
    samplepatt = "bw", fileterm = NULL
)
names(files) <- gsub(".bw", "", names(files))

## Load the data from disk
system.time(fullCov <- fullCoverage(files = files, chrs = "chr21"))
## 2024-05-01 17:02:48.816067 fullCoverage: processing chromosome chr21
## 2024-05-01 17:02:48.835168 loadCoverage: finding chromosome lengths
## 2024-05-01 17:02:48.869017 loadCoverage: loading BigWig file /home/biocbuild/bbs-3.20-bioc/R/site-library/derfinderData/extdata/A1C/HSB103.bw
## 2024-05-01 17:02:49.118028 loadCoverage: loading BigWig file /home/biocbuild/bbs-3.20-bioc/R/site-library/derfinderData/extdata/A1C/HSB114.bw
## 2024-05-01 17:02:49.332489 loadCoverage: loading BigWig file /home/biocbuild/bbs-3.20-bioc/R/site-library/derfinderData/extdata/A1C/HSB123.bw
## 2024-05-01 17:02:49.48678 loadCoverage: loading BigWig file /home/biocbuild/bbs-3.20-bioc/R/site-library/derfinderData/extdata/A1C/HSB126.bw
## 2024-05-01 17:02:49.593085 loadCoverage: loading BigWig file /home/biocbuild/bbs-3.20-bioc/R/site-library/derfinderData/extdata/A1C/HSB130.bw
## 2024-05-01 17:02:49.722038 loadCoverage: loading BigWig file /home/biocbuild/bbs-3.20-bioc/R/site-library/derfinderData/extdata/A1C/HSB135.bw
## 2024-05-01 17:02:49.841276 loadCoverage: loading BigWig file /home/biocbuild/bbs-3.20-bioc/R/site-library/derfinderData/extdata/A1C/HSB136.bw
## 2024-05-01 17:02:49.947943 loadCoverage: loading BigWig file /home/biocbuild/bbs-3.20-bioc/R/site-library/derfinderData/extdata/A1C/HSB145.bw
## 2024-05-01 17:02:50.07354 loadCoverage: loading BigWig file /home/biocbuild/bbs-3.20-bioc/R/site-library/derfinderData/extdata/A1C/HSB149.bw
## 2024-05-01 17:02:50.208837 loadCoverage: loading BigWig file /home/biocbuild/bbs-3.20-bioc/R/site-library/derfinderData/extdata/A1C/HSB150.bw
## 2024-05-01 17:02:50.314966 loadCoverage: loading BigWig file /home/biocbuild/bbs-3.20-bioc/R/site-library/derfinderData/extdata/A1C/HSB154.bw
## 2024-05-01 17:02:50.462467 loadCoverage: loading BigWig file /home/biocbuild/bbs-3.20-bioc/R/site-library/derfinderData/extdata/A1C/HSB178.bw
## 2024-05-01 17:02:50.602069 loadCoverage: applying the cutoff to the merged data
## 2024-05-01 17:02:50.635172 filterData: originally there were 48129895 rows, now there are 48129895 rows. Meaning that 0 percent was filtered.
##    user  system elapsed 
##   1.789   0.080   1.890

Alternatively, since the BigWig files are publicly available from BrainSpan (see here), we can extract the relevant coverage data using derfinder::fullCoverage(). Note that as of rtracklayer 1.25.16 BigWig files are not supported on Windows: you can find the fullCov object inside derfinderData to follow the examples.

## Determine the files to use and fix the names
files <- pheno$file
names(files) <- gsub(".A1C", "", pheno$lab)

## Load the data from the web
system.time(fullCov <- fullCoverage(files = files, chrs = "chr21"))

Once we have the base-level coverage data for all 12 samples, we can construct the models. In this case, we want to find differences between fetal and adult samples while adjusting for gender and a proxy of the library size.

## Get some idea of the library sizes
sampleDepths <- sampleDepth(collapseFullCoverage(fullCov), 1)
## 2024-05-01 17:02:50.739177 sampleDepth: Calculating sample quantiles
## 2024-05-01 17:02:50.885258 sampleDepth: Calculating sample adjustments
## Define models
models <- makeModels(sampleDepths,
    testvars = pheno$group,
    adjustvars = pheno[, c("gender")]
)

Next, we can find candidate differentially expressed regions (DERs) using as input the segments of the genome where at least one sample has coverage greater than 3. In this particular example, we chose a low theoretical F-statistic cutoff and used 20 permutations.

## Filter coverage
filteredCov <- lapply(fullCov, filterData, cutoff = 3)
## 2024-05-01 17:02:52.245942 filterData: originally there were 48129895 rows, now there are 90023 rows. Meaning that 99.81 percent was filtered.
## Perform differential expression analysis
suppressPackageStartupMessages(library("bumphunter"))
system.time(results <- analyzeChr(
    chr = "chr21", filteredCov$chr21,
    models, groupInfo = pheno$group, writeOutput = FALSE,
    cutoffFstat = 5e-02, nPermute = 20, seeds = 20140923 + seq_len(20)
))
## 2024-05-01 17:02:53.332025 analyzeChr: Pre-processing the coverage data
## 2024-05-01 17:02:55.067197 analyzeChr: Calculating statistics
## 2024-05-01 17:02:55.07143 calculateStats: calculating the F-statistics
## 2024-05-01 17:02:55.260165 analyzeChr: Calculating pvalues
## 2024-05-01 17:02:55.260554 analyzeChr: Using the following theoretical cutoff for the F-statistics 5.31765507157871
## 2024-05-01 17:02:55.261727 calculatePvalues: identifying data segments
## 2024-05-01 17:02:55.269578 findRegions: segmenting information
## 2024-05-01 17:02:55.297214 findRegions: identifying candidate regions
## 2024-05-01 17:02:55.34926 findRegions: identifying region clusters
## 2024-05-01 17:02:55.500463 calculatePvalues: calculating F-statistics for permutation 1 and seed 20140924
## 2024-05-01 17:02:55.645181 findRegions: segmenting information
## 2024-05-01 17:02:55.679358 findRegions: identifying candidate regions
## 2024-05-01 17:02:55.738465 calculatePvalues: calculating F-statistics for permutation 2 and seed 20140925
## 2024-05-01 17:02:55.880789 findRegions: segmenting information
## 2024-05-01 17:02:55.904266 findRegions: identifying candidate regions
## 2024-05-01 17:02:55.954381 calculatePvalues: calculating F-statistics for permutation 3 and seed 20140926
## 2024-05-01 17:02:56.096744 findRegions: segmenting information
## 2024-05-01 17:02:56.120451 findRegions: identifying candidate regions
## 2024-05-01 17:02:56.163547 calculatePvalues: calculating F-statistics for permutation 4 and seed 20140927
## 2024-05-01 17:02:56.316421 findRegions: segmenting information
## 2024-05-01 17:02:56.340203 findRegions: identifying candidate regions
## 2024-05-01 17:02:56.382797 calculatePvalues: calculating F-statistics for permutation 5 and seed 20140928
## 2024-05-01 17:02:56.535313 findRegions: segmenting information
## 2024-05-01 17:02:56.574726 findRegions: identifying candidate regions
## 2024-05-01 17:02:56.616917 calculatePvalues: calculating F-statistics for permutation 6 and seed 20140929
## 2024-05-01 17:02:56.765579 findRegions: segmenting information
## 2024-05-01 17:02:56.789125 findRegions: identifying candidate regions
## 2024-05-01 17:02:56.831018 calculatePvalues: calculating F-statistics for permutation 7 and seed 20140930
## 2024-05-01 17:02:56.994289 findRegions: segmenting information
## 2024-05-01 17:02:57.018025 findRegions: identifying candidate regions
## 2024-05-01 17:02:57.059249 calculatePvalues: calculating F-statistics for permutation 8 and seed 20140931
## 2024-05-01 17:02:57.207436 findRegions: segmenting information
## 2024-05-01 17:02:57.239752 findRegions: identifying candidate regions
## 2024-05-01 17:02:57.28156 calculatePvalues: calculating F-statistics for permutation 9 and seed 20140932
## 2024-05-01 17:02:57.426466 findRegions: segmenting information
## 2024-05-01 17:02:57.449796 findRegions: identifying candidate regions
## 2024-05-01 17:02:57.491729 calculatePvalues: calculating F-statistics for permutation 10 and seed 20140933
## 2024-05-01 17:02:57.641703 findRegions: segmenting information
## 2024-05-01 17:02:57.665217 findRegions: identifying candidate regions
## 2024-05-01 17:02:57.706583 calculatePvalues: calculating F-statistics for permutation 11 and seed 20140934
## 2024-05-01 17:02:57.860141 findRegions: segmenting information
## 2024-05-01 17:02:57.883783 findRegions: identifying candidate regions
## 2024-05-01 17:02:57.925738 calculatePvalues: calculating F-statistics for permutation 12 and seed 20140935
## 2024-05-01 17:02:58.085876 findRegions: segmenting information
## 2024-05-01 17:02:58.109545 findRegions: identifying candidate regions
## 2024-05-01 17:02:58.151442 calculatePvalues: calculating F-statistics for permutation 13 and seed 20140936
## 2024-05-01 17:02:58.314391 findRegions: segmenting information
## 2024-05-01 17:02:58.337799 findRegions: identifying candidate regions
## 2024-05-01 17:02:58.38324 calculatePvalues: calculating F-statistics for permutation 14 and seed 20140937
## 2024-05-01 17:02:58.537818 findRegions: segmenting information
## 2024-05-01 17:02:58.562051 findRegions: identifying candidate regions
## 2024-05-01 17:02:58.604797 calculatePvalues: calculating F-statistics for permutation 15 and seed 20140938
## 2024-05-01 17:02:58.764947 findRegions: segmenting information
## 2024-05-01 17:02:58.788674 findRegions: identifying candidate regions
## 2024-05-01 17:02:58.831265 calculatePvalues: calculating F-statistics for permutation 16 and seed 20140939
## 2024-05-01 17:02:58.997966 findRegions: segmenting information
## 2024-05-01 17:02:59.021931 findRegions: identifying candidate regions
## 2024-05-01 17:02:59.065704 calculatePvalues: calculating F-statistics for permutation 17 and seed 20140940
## 2024-05-01 17:02:59.220143 findRegions: segmenting information
## 2024-05-01 17:02:59.243892 findRegions: identifying candidate regions
## 2024-05-01 17:02:59.28664 calculatePvalues: calculating F-statistics for permutation 18 and seed 20140941
## 2024-05-01 17:02:59.438694 findRegions: segmenting information
## 2024-05-01 17:02:59.472284 findRegions: identifying candidate regions
## 2024-05-01 17:02:59.515346 calculatePvalues: calculating F-statistics for permutation 19 and seed 20140942
## 2024-05-01 17:02:59.663523 findRegions: segmenting information
## 2024-05-01 17:02:59.687637 findRegions: identifying candidate regions
## 2024-05-01 17:02:59.729747 calculatePvalues: calculating F-statistics for permutation 20 and seed 20140943
## 2024-05-01 17:02:59.883767 findRegions: segmenting information
## 2024-05-01 17:02:59.907793 findRegions: identifying candidate regions
## 2024-05-01 17:02:59.973227 calculatePvalues: calculating the p-values
## 2024-05-01 17:03:00.042626 analyzeChr: Annotating regions
## No annotationPackage supplied. Trying org.Hs.eg.db.
## Loading required package: org.Hs.eg.db
## Loading required package: AnnotationDbi
## Loading required package: Biobase
## Welcome to Bioconductor
## 
##     Vignettes contain introductory material; view with
##     'browseVignettes()'. To cite Bioconductor, see
##     'citation("Biobase")', and for packages 'citation("pkgname")'.
## 
## Getting TSS and TSE.
## Getting CSS and CSE.
## Getting exons.
## Annotating genes.
## ...
##    user  system elapsed 
##  57.755   0.988  58.745
## Quick access to the results
regions <- results$regions$regions

## Annotation database to use
suppressPackageStartupMessages(library("TxDb.Hsapiens.UCSC.hg19.knownGene"))
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene

3.2 plotOverview()

Now that we have obtained the main results using derfinder, we can proceed to visualizing the results using derfinderPlot. The easiest to use of all the functions is plotOverview() which takes a set of regions and annotation information produced by bumphunter::matchGenes().

Figure 1 shows the candidate DERs colored by whether their q-value was less than 0.10 or not.

## Q-values overview
plotOverview(regions = regions, annotation = results$annotation, type = "qval")
## 2024-05-01 17:03:52.265027 plotOverview: assigning chromosome lengths from hg19!
## Scale for x is already present.
## Adding another scale for x, which will replace the existing scale.
## Scale for x is already present.
## Adding another scale for x, which will replace the existing scale.
Location of the DERs in the genome. This plot is was designed for many chromosomes but only one is shown here for simplicity.

Figure 1: Location of the DERs in the genome
This plot is was designed for many chromosomes but only one is shown here for simplicity.

Figure 2 shows the candidate DERs colored by the type of gene feature they are nearest too.

## Annotation overview
plotOverview(
    regions = regions, annotation = results$annotation,
    type = "annotation"
)
## 2024-05-01 17:03:54.144088 plotOverview: assigning chromosome lengths from hg19!
## Scale for x is already present.
## Adding another scale for x, which will replace the existing scale.