GEOexplorer

Guy Hunt

April 19, 2022

1. Introduction

GEOexplorer is a web server and R package for the exploration, generation, and analysis of transcriptomic datasets. GEOexplorer can be used for the exploration, integration, and harmonization of different datasets available in GEO or uploaded by the user. GEOexplorer is based on widely used and validated protocols and enables users to take full advantage of the great availability of high-throughput data both from in-house experiments and publicly available databases. Additionally, GEOexplorer does not require programming proficiency or in-depth statistical knowledge to use.

GEOexplorer enables users to:

2. Accessing the GEOexplorer Web Server

GEOexplorer can be accessed on the following link.

3. Alternatively Installing the GEOexplorer R package

GEOexplorer can be installed as an R package from Bioconductor:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

# The following initializes usage of Bioc devel
BiocManager::install(version='devel')

BiocManager::install("GEOexplorer")

Or the latest version can be downloaded from GitHub:

if (!requireNamespace("devtools", quietly = TRUE))
  install.packages("devtools")

devtools::install_github("guypwhunt/GEOexplorer")

4. Launching the GEOexplorer R package

GEOexplorer can be launched via the two steps below:

Step 1: Load the package

library(GEOexplorer)
#> Loading required package: shiny
#> Loading required package: limma
#> Loading required package: Biobase
#> Loading required package: BiocGenerics
#> 
#> Attaching package: 'BiocGenerics'
#> The following object is masked from 'package:limma':
#> 
#>     plotMA
#> The following objects are masked from 'package:stats':
#> 
#>     IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#> 
#>     Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
#>     as.data.frame, basename, cbind, colnames, dirname, do.call,
#>     duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
#>     lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
#>     pmin.int, rank, rbind, rownames, sapply, setdiff, sort, table,
#>     tapply, union, unique, unsplit, which.max, which.min
#> Welcome to Bioconductor
#> 
#>     Vignettes contain introductory material; view with
#>     'browseVignettes()'. To cite Bioconductor, see
#>     'citation("Biobase")', and for packages 'citation("pkgname")'.
#> Loading required package: plotly
#> Loading required package: ggplot2
#> 
#> Attaching package: 'plotly'
#> The following object is masked from 'package:ggplot2':
#> 
#>     last_plot
#> The following object is masked from 'package:stats':
#> 
#>     filter
#> The following object is masked from 'package:graphics':
#> 
#>     layout
#> Loading required package: enrichR
#> Welcome to enrichR
#> Checking connection ...
#> Enrichr ... Connection is Live!
#> FlyEnrichr ... Connection is Live!
#> WormEnrichr ... Connection is Live!
#> YeastEnrichr ... Connection is Live!
#> FishEnrichr ... Connection is Live!
#> OxEnrichr ... Connection is Live!
#> Warning: replacing previous import 'shiny::dataTableOutput' by
#> 'DT::dataTableOutput' when loading 'GEOexplorer'
#> Warning: replacing previous import 'shiny::renderDataTable' by
#> 'DT::renderDataTable' when loading 'GEOexplorer'
#> Setting options('download.file.method.GEOquery'='auto')
#> Setting options('GEOquery.inmemory.gpl'=FALSE)

Step 2: Launch the GEOexplorer web application.

loadApp()
#> Warning: Navigation containers expect a collection of
#> `bslib::nav_panel()`/`shiny::tabPanel()`s and/or
#> `bslib::nav_menu()`/`shiny::navbarMenu()`s. Consider using `header` or `footer`
#> if you wish to place content above (or below) every panel's contents.

3. Tutorial

GEOexplorer splits gene expression analysis into three distinct processes. The first process is exploratory data analysis, which aims to gain an overall understanding of the gene expression dataset. The second process is differential gene expression analysis, which aims to identify the genes that are statistically upregulated or downregulated between two groups. The final process is gene enrichment analysis, which aims to provide the biological context of the differentially expressed genes.

a. GEOexplorer Structure

GEOexplorer contains several tabs, each of which serves a distinct purpose. These tabs will be described in the subsections below.

I. Home Tab

The home tab contains all the widgets to perform gene expression analysis.

II. About Tab

The about tab contains information about GEOexplorer including links to additional documentation.

III. Workflow Tab

The workflow tab provides a high level overview of the workflow performed used by GEOexplorer.

IV. Tutorial Tab

The tutorial tab provides a step by step guide on how to use GEOexplorer.

V. GEO Search Tab

The GEO search tab allows you to search for the GEO database for relevant gene expression datasets to analyse.

VI. Example Datasets Tab

The example datasets tab provides the following: * An example microarray GEO dataset * A gene expression file template * An experimental file template * A microarray gene expression dataset * A microarray experimental conditions file * An RNA-seq gene expression dataset * An RNA-seq experimental conditions file

b. Loading GEO Datasets into GEOexplorer

In this tutorial, we will be exploring the following GEO RNA-seq dataset GSE93939. This dataset contains the gene expression profiles of oculomotor and spinal motor neurons. Oculomotor motor neurons are resilient to degeneration in the lethal motor neuron disease amyotrophic lateral sclerosis (ALS). Therefore comparing the gene expression profiles of oculomotor neurons to spinal motor neurons may indicate the protective mechanisms of oculomotor neurons. There are two ways to automatically load GEO datasets into GEOexplorer. Using the GEO search functionality or manually inputting a GEO accession code

I. Searching for a GEO Dataset

Step 1: Navigate to the GEO Search tab.

Step 2: Input the keywords you which to search. The keywords used in this tutorial are RNA-seq of laser captured oculomotor, cervical and lumbar spinal motor.

Step 3: Select the number of results to display.

Step 4: Click the Search button. A table containing the results will be loaded.

Step 5: Check if the file is processable by GEOexplorer. Due to the variability in the format of GEO RNA-seq datasets, not all GEO RNA-seq datasets can automatically be loaded into GEOexplorer. If the dataset cannot be automatically loaded into GEOexplorer, the user will have to download the dataset and format it into a count matrix as per the template on the Example Datasets tab. However, nearly all GEO microarray datasets can automatically be loaded into GEOexplorer.

Step 6: Click Load Dataset for the microarray gene expression dataset you wish to load. GEOexplorer will attempt to load the dataset from GEO.

II. Using a GEO Accession Code

If you already know the GEO accession code of the dataset you wish to analyse you can perform the following steps.

Step 1: Navigate to the Home tab.

Step 2: Select if you want to analyse multiple datasets or a single dataset. In this example, we will analyse a single dataset.

Step 3: Select the data source. In this example, we will source the dataset directly from GEO.

Step 4: Select if the dataset contains microarray or RNA-seq data.

Step 5: Input the GEO accession code you wish to analyse. GEOexplorer will attempt to load the dataset from GEO. The GEO accession code used in this tutorial is GSE93939.

c. Performing Exploratory Data Analysis

After loading your dataset onto GEOexplorer performing exploratory data analysis is very similar for GEO datasets and user uploaded datasets as well as microarray and RNA-seq datasets.

I. Checking if RNA-seq Datasets Contain Transformed Data

For GEOexplorer to perform differential gene expression analysis RNA-seq datasets must contain the raw counts rather than transformed counts. This step is not required for analysing microarray datasets.

Step 1: If the GEO accession code is linked to multiple datasets please select the platform linked to the dataset you wish to analyse.

Step 2: Select not to apply log transformation. This is important as we want to analyse the non-transformed data.

Step 3: Select not to apply counts per million transformation. This is important as we want to analyse the non-transformed data.

Step 4: Click Analyse.

After clicking “Analyse”, exploratory data analysis will be performed and the results can be reviewed.

Step 5: Click on the Dataset Information tab.

Step 6: Click on the Gene Expression Dataset sub-tab.

Step 7: Review the dataset. If there are any values with non zero decimal places or any negative values, this indicates the RNA-seq dataset has already been transformed and should not be used for differential gene expression analysis.

Step 8: Click on the Exploratory Data Analysis tab.

Step 9: Click on the Expression Density Plot sub-tab.

Step 10: Review the expression density plot. The plot should look similar to the image below. If the plot contains normally distributed density curves with a bell shaped pattern it indicates the RNA-seq dataset has already been transformed and should not be used for differential gene expression analysis.

Step 11: Click on the Box-and-Whisker Plot sub-tab.

Step 12: Review the box-and-whisker plot. The plot should look similar to the image below with the lowest value being 0 or more e.g. a positive number. If the plot contains negative values or median centred values it indicates the RNA-seq dataset has already been transformed and should not be used for differential gene expression analysis.

Note: Whilst transformed datasets should not be used for differential gene expression analysis, they can be used for exploratory data analysis.

II. Reviewing the Results of Exploratory Data Analysis

After checking if the RNA-seq dataset contains transformed data you can continue performing exploratory data analysis.

Step 1: You can set GEOexplorer to perform log transformation or auto-detect if log transformation should be applied.

Step 2: You can set GEOexplorer to perform counts per million transformation. This setting is only available for RNA-seq datasets. For microarray datasets, you can instead select whether to fill in missing values using KNN imputation.

Step 3: Click Analyse to rerun exploratory data analysis.