Abstract
Important - NxtIRFcore will be replaced by SpliceWiz, available on Bioconductor version 3.16 onwards. Intron Retention (IR) is a form of alternative splicing whereby the intron is retained (i.e. not spliced) in final messenger RNA. Although many bioinformatics tools are available to quantitate other forms of alternative splicing, dedicated tools to quantify Intron Retention are limited. Quantifying IR requires not only measurement of spliced transcripts (often using mapped splice junction reads), but also measurement of the coverage of the putative retained intron. The latter requires adjustment for the fact that many introns contain repetitive regions as well as other RNA expressing elements. IRFinder corrects for many of these complexities; however its dependencies on Linux and STAR limits its wider usage. Also, IRFinder does not calculate other forms of splicing besides IR. Finally, IRFinder produces text-based output, requiring an established understanding of the data produced in order to interpret its results. NxtIRF overcomes the above limitations. Firstly, NxtIRF incorporates the IRFinder C++ routines, allowing users to run the IRFinder algorithm in the R/Bioconductor environment on multiple platforms. NxtIRF is a full pipeline that quantifies IR (and other alternative splicing) events, organises the data and produces relevant visualisation. Additionally, NxtIRF offers an interactive graphical interface that allows users to explore the data. NxtIRFcore is the command-line version of NxtIRF. Version 1.6.0NxtIRFcore will no longer be supported after Bioconductor version 3.16. Its full functionality (plus heaps more) is replaced by SpliceWiz which will be available on Bioconductor version 3.16 onwards.
This section provides instructions for installation and a quick working example to demonstrate the important functions of NxtIRF. NxtIRFcore is the command line utility for NxtIRF.
For detailed explanations of each step shown here, refer to chapter 2: “Explaining the NxtIRF workflow” in this vignette. For a list of ready-made “recipes” for typical-use NxtIRF in real datasets, refer to chapter 3: “NxtIRF cookbook”
To install NxtIRFcore, start R (version “4.1”) and enter:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("NxtIRFcore")
(Optional) For MacOS users, make sure OpenMP libraries are installed correctly. We recommend users follow this guide, but the quickest way to get started is to install libomp
via brew:
A NxtIRF reference requires a genome FASTA file (containing genome nucleotide sequences) and a gene annotation GTF file (preferably from Ensembl or Gencode).
NxtIRF provides an example genome and gene annotation which can be accessed via the NxtIRFdata package installed with NxtIRF:
# Provides the path to the example genome:
chrZ_genome()
#> [1] "/home/biocbuild/bbs-3.17-bioc/R/site-library/NxtIRFdata/extdata/genome.fa"
# Provides the path to the example gene annotation:
chrZ_gtf()
#> [1] "/home/biocbuild/bbs-3.17-bioc/R/site-library/NxtIRFdata/extdata/transcripts.gtf"
Using these two files, we construct a NxtIRF reference as follows:
NxtIRF provides an example set of 6 BAM files to demonstrate its use via this vignette.
Firstly, retrieve the BAM files from ExperimentHub using the NxtIRF helper function NxtIRF_example_bams()
. This makes a copy of the BAM files to the temporary directory:
bams = NxtIRF_example_bams()
bams
#> sample path
#> 1 02H003 /tmp/Rtmp6YyBwO/02H003.bam
#> 2 02H025 /tmp/Rtmp6YyBwO/02H025.bam
#> 3 02H026 /tmp/Rtmp6YyBwO/02H026.bam
#> 4 02H033 /tmp/Rtmp6YyBwO/02H033.bam
#> 5 02H043 /tmp/Rtmp6YyBwO/02H043.bam
#> 6 02H046 /tmp/Rtmp6YyBwO/02H046.bam
Finally, run NxtIRF/IRFinder as follows:
First, collate the IRFinder output files using the helper function Find_IRFinder_Output()
This creates a 3-column data frame with sample name, IRFinder gzipped text output, and COV files. Compile these output files into a single experiment:
The NxtSE
is a data structure that inherits SummarizedExperiment
The code below will contrast condition:B in respect to condition:A
# Requires limma to be installed:
require("limma")
res_limma = limma_ASE(
se = se,
test_factor = "condition",
test_nom = "B",
test_denom = "A",
)
# Requires DESeq2 to be installed:
require("DESeq2")
res_deseq = DESeq_ASE(
se = se,
test_factor = "condition",
test_nom = "B",
test_denom = "A",
)
# Requires DoubleExpSeq to be installed:
require("DoubleExpSeq")
res_DES = DoubleExpSeq_ASE(
se = se,
test_factor = "condition",
test_nom = "B",
test_denom = "A",
)
Filter by visibly-different events:
Plot individual samples:
p = Plot_Coverage(
se = se,
Event = res_limma.filtered$EventName[1],
tracks = colnames(se)[c(1,2,4,5)],
)
#> Warning: In subset.data.frame(reduced, type = "intron") :
#> extra argument 'type' will be disregarded
as_egg_ggplot(p)
Display the plotly interactive version of the coverage plot (not shown here)
Plot by condition:
p = Plot_Coverage(
se = se,
Event = res_limma.filtered$EventName[1],
tracks = c("A", "B"),
condition = "condition",
stack_tracks = TRUE,
t_test = TRUE,
)
#> Warning: In subset.data.frame(reduced, type = "intron") :
#> extra argument 'type' will be disregarded
as_egg_ggplot(p)
#> Warning: Removed 270 rows containing missing values (`geom_line()`).