What is omicplotR?

omicplotR is an R package containing a Shiny app used to visually explore omic datasets, where the input is a table of read counts from high-throughput sequencing runs. It integrates the ALDEx21 package for compositional analysis of differential abundance. omicplotR is intended facilitate exploring high-throughput sequencing datasets by providing a graphical user interface for users with and without experience in R.

Introduction

High-throughput sequencing (HTS) instruments generate an amount of reads that is constrained by limitations of the sequencing instrument itself, and do not represent the absolute number of DNA molecules in a sample. For example, an Illumina NextSeq can deliver up to 400 million single-end reads, whereas an Illumina MiSeq2 can only deliver up to 15 million single-end reads2. This type of data, which is constrained by an arbitrary or constant sum, is referred to as compositional data, and high-throughput sequencing data must be treated as such3. See ALDEx2 for more information.

Although several R packages exist for exploring high-throughput sequencing data, they are typically command line based, which presents a barrier for users without any significant command line or scripting experience. omicplotR was created to facilitate the exploratory phase of high-throughput sequencing data analysis allowing the generation of basic exploratory plots automatically with adjustable features and filters.

This vignette provides an overview of the R package omicplotR and the input requirements. A tutorial for each component of the Shiny app is available on the wiki: https://github.com/dgiguer/omicplotR/wiki. omicplotR was developed for several types of HTS datasets including RNASeq, meta-RNASeq, and 16s rRNA gene sequencing, and in principle, can be used for nearly any type of data generated by HTS that contains a tables of counts per feature for each sample.

Features

omicplotR provides a graphical user interface using the Shiny package for the following visualizations for HTS data:

  • Compositional Principal Component Analysis (PCA) biplots
  • Dendrograms
  • Stacked barplots of relative taxonomic abundance
  • Compositional differential abundance analysis

Additional features include:

  • Filtering count tables per sample or feature by counts
  • Filtering data into groups using metadata to compare sub groups within the data
  • Colour PCA biplots using metadata (continuously, by quartile, categorical)
  • Generate effect plots between groups according to metadata using ALDEx2
  • Interactive effect plots to visualize difference between and within groups
  • Plot pre-calculated ALDEx2 tables and colour points by rownames for large datasets

Installation and example

Install the latest version of omicplotR using BiocManager. Make sure you have the newest version of R, ALDEx2, and other dependancies. omicplotR requires you to have at least R version 3.5. The most up to date version is available at www.github.com/dgiguer/omicplotr/, and is the dev branch.

First, load the omicplotR package. All other dependencies will be loaded automatically. This will launch the Shiny app in your default browser. For this vignette, we will be using the example data and metadata provided. Example data and metadata are accessible by data(otu_table) and data(metadata). They are also available as .txt files in ~/omicplotR/shiny-app/.

install.packages("BiocManager")
BiocManager::install("omicplotR")
library(omicplotR)
omicplotr.run()

After launching the Shiny app, click the ‘Input data’ tab to get started.

Input data

The ‘Data’ tab on the sidebar panel allows you to choose your own data and metadata by clicking ‘Browse’. To follow along with this vignette, please click the ‘Example data’ tab on the sidebar panel, and click the checkbox for the ‘Vaginal dataset’. This dataset, which includes associated metadata, is from a study that characterized the changes in the vaginal microbiome following antibiotic and probiotic treatment by 16s rRNA gene sequencing4. Return to the ‘Data’ tab on the sidebar panel to view the data and metadata by clicking ‘Show data’ and ‘Show metadata’. The tabs on the main panel allow you to switch between displaying your data and metadata tables.

Figure 1: Screenshot of input data page. The 'Example data' tab on the sidebar panel provides access to the provided datasets within the Shiny app.

Figure 1: Screenshot of input data page. The ‘Example data’ tab on the sidebar panel provides access to the provided datasets within the Shiny app.

Data

When choosing your own data set, input requirements are as follows: for both metadata and data, each sample and feature name (operational taxonomic unit - OTU) must be unique. An example of an appropriately formatted data file is shown in Figure 2.

  1. The data file must be a tab delimited .txt (this is an option when you click ‘Save as’ from Excel, or when writing to a table in R).
  2. The first column must contain feature/OTU identifiers. In this case, they are labelled as numbers.
  3. The first row must contain sample identifiers.
  4. The last column may contain taxonomic level information, but is not required. If present, it must be labelled exactly ‘taxonomy’. The taxonomy column must have at least four levels, separated by a semi colon or colon.
  5. Data table must have all blank rows removed (this may require you to check in a text editor like Notepad ++ or Atom before using the app). This should be especially checked if you are using a Windows based computer, since they use different line feeds.
Figure 2: Example data. If taxonomy column is present, it must use the column name 'taxonomy'. Image taken from modified version of Vaginal dataset.

Figure 2: Example data. If taxonomy column is present, it must use the column name ‘taxonomy’. Image taken from modified version of Vaginal dataset.

Your metadata file must follow a similar format. An example of an appropriate metadata file is shown in Figure 3.

  1. Must be a tab delimited .txt (this is an option when you click ‘Save as’ from Excel).
  2. The first column must contain sample identifiers. The sample identifiers must be identical to the data file, however, they are not required to be in the same order.
  3. The first row must contain phenotypic information, or descriptions of each variable.