1 Overview

SBGNview is a tool set for pathway based data visalization, integration and analysis. SBGNview is similar and complementary to the widely used Pathview(Luo and Brouwer 2013), with the following key features:

  • Pathway definition by the widely adopted Systems Biology Graphical Notation (SBGN);

  • Supports multiple major pathway databases beyond KEGG (Reactome, MetaCyc, SMPDB, PANTHER, METACROP etc) and user defined pathways;

  • Covers 5,200 reference pathways and over 3,000 species by default;

  • Extensive graphics controls, including glyph and edge attributes, graph layout and sub-pathway highlight;

  • SBGN pathway data manipulation, processing, extraction and analysis.

2 Citation

Please cite the following papers when using this open-source package. This will help the project and our team:

Luo W, Brouwer C. Pathview: an R/Biocondutor package for pathway-based data integration and visualization. Bioinformatics, 2013, 29(14):1830-1831, doi: 10.1093/bioinformatics/btt285

3 Installation and quick start

Please see the Quick Start tutorial for installation instructions and quick start examples.

4 Getting started

SBGNview is the main function of the package. It takes two major types of inputs: user data (gene and/or compound data) and SBGN pathway data (pathway IDs or SBGN-ML files). SBGNview parses the SBGN pathway data, extracts glyph (node) and arc (edge) data from SBGN-ML file, maps and integrates the user data to the glyphs and renders the SBGN graph in SVG formatwith mapped data as pseudo-colors. Currently it maps gene/protein omics data to “macromolecule” glyphs and maps compound omics data to “simple chemical” glyphs. The SBGNview function returns a SBGNview object, which contains data to recreate the rendered SBGN graph or further modified it. Please see the documentation of function SBGNview for more details.

4.1 Load the SBGNview package, together with pathview and other dependencies

library(SBGNview)

4.2 Accessing help and documentation

Use ls("package:SBGNview") to list all available functions in SBGNview. To learn more about how to use the functions and access their manuals, use help() or ?. For example, ?”SBGNview” or help("+.SBGNview").

ls("package:SBGNview")
##  [1] "SBGNview"         "changeDataId"     "changeIds"        "downloadSbgnFile"
##  [5] "findPathways"     "highlightArcs"    "highlightNodes"   "highlightPath"   
##  [9] "loadMappingTable" "outputFile"       "outputFile<-"     "renderSbgn"      
## [13] "sbgn.gsets"       "sbgnNodes"

4.3 SBGN pathway file (SBGN-ML)

SBGN pathways are defined in a special XML format (SBGN-ML file). SBGN-ML files describe the pathway elements (molecules and their reactions) and the graph layout (positions and appearances of the pathway elements). There are two major data types ement in SBGN-ML files: 1. node data (in tag “glyph”), such as node location, width, hight and node class (macromolecule, simple chemical etc.). 2. edge data (in tag “arc”), such as arc class, start node and end node. For more details, see: https://github.com/sbgn/sbgn/wiki/SBGN_ML

4.3.1 Two tiers of support and two scenarios with SBGN based pathways

SBGNview provides two tiers of support for SBGN based pathway data, which corresponds to the two common SBGNview scenarios:

  • Scenarios 1 or Tier 1{#ourCollection}, direct and deep support to a core collection of pathway data from 5 major pathway databases including Reactome, SMPDB, PANTHER, MetaCyc and MetaCrop, or using SBGNview’s pathway collection. Each of these databases covers up to thousands of reference pathways and species (Table 1). SBGNview provides diagram optimization and ID mapping on these pathway databases. In other words, ve prepared a full collection of high quality sbgn pathways from these databases, which can be directly used in data analysis. The pathway data in SBGN-ML format are stored in data sbgn.xmls (included in package SBGNview.data), which are effectively indexed/accessed by SBGNview pathway IDs. We’ve also generated ID mapping between SBGN-ML glyph IDs and common molecule IDs, so that user data can be easily mapped to these pathways.

  • Scenarios 2 or Tier 2, indirect support to raw SBGN pathway data from pathway databases, collections and users’ custom pathway data. Many major pathway databases provide SBGN-ML files, including Pathway Commons, Reactome and MetaCrop. Their data can be downloaded and directly used by SBGNview. Other databases (e.g. PANTHER and BioCyc) use SBML or BioPAX formats, which can be converted to SBGN-ML format using third party tools. Note Pathway Commons collect pathways from Reactome, PANTHER, SMPDB and other primary databases. These data can be used in SBGNview, except diagrams may not be optimized for data visualization, and users need to provided ID mapping or make sure the glyph IDs are commonly use molecule IDs.

Note the core collection of pathway databases with tier 1 support includes thousands of pathways and species, which should cover most of the users needs in pathway analysis and data visualization. We are taking the first the default route, i.e. to use the SBGNview’s core collection.

4.3.1.1 Load SBGNview’s SBGN-ML pathway data collection

Because a large amount of pathway data is loaded into R, this step can take up to several second.

data("sbgn.xmls")

4.3.1.2 Information about the SBGN-ML pathway data collection

We can check the information of collected pathways using pathways.info and number of pathways collected using pathways.stats.

data("pathways.info", "pathways.stats")
head(pathways.info)
##                                            file.name       database
## 1 http___identifiers.org_panther.pathway_P00001.sbgn pathwayCommons
## 2 http___identifiers.org_panther.pathway_P00002.sbgn pathwayCommons
## 3 http___identifiers.org_panther.pathway_P00003.sbgn pathwayCommons
## 4 http___identifiers.org_panther.pathway_P00004.sbgn pathwayCommons
## 5 http___identifiers.org_panther.pathway_P00005.sbgn pathwayCommons
## 6 http___identifiers.org_panther.pathway_P00006.sbgn pathwayCommons
##                                             uri    sub.database pathway.id
## 1 http://identifiers.org/panther.pathway/P00001 panther.pathway     P00001
## 2 http://identifiers.org/panther.pathway/P00002 panther.pathway     P00002
## 3 http://identifiers.org/panther.pathway/P00003 panther.pathway     P00003
## 4 http://identifiers.org/panther.pathway/P00004 panther.pathway     P00004
## 5 http://identifiers.org/panther.pathway/P00005 panther.pathway     P00005
## 6 http://identifiers.org/panther.pathway/P00006 panther.pathway     P00006
##                                  pathway.name macromolecule.ID.type
## 1   Adrenaline and noradrenaline biosynthesis        pathwayCommons
## 2 Alpha adrenergic receptor signaling pathway        pathwayCommons
## 3 Alzheimer disease-amyloid secretase pathway        pathwayCommons
## 4        Alzheimer disease-presenilin pathway        pathwayCommons
## 5                                Angiogenesis        pathwayCommons
## 6                 Apoptosis signaling pathway        pathwayCommons
##   simple.chemical.ID.type
## 1          pathwayCommons
## 2          pathwayCommons
## 3          pathwayCommons
## 4          pathwayCommons
## 5          pathwayCommons
## 6          pathwayCommons
pathways.stats
##          database    sub.database Freq
## 1        MetaCrop        MetaCrop   62
## 5         MetaCyc         MetaCyc 2518
## 9  pathwayCommons panther.pathway  149
## 12 pathwayCommons        reactome 1746
## 15 pathwayCommons           smpdb  725

4.3.1.3 Search for pathways by keywords

SBGNview can search for pathways by keyword and automatically download SBGN-ML files.

pathways <- findPathways(c("bile acid","bile salt"))
head(pathways)
pathways.local.file <- downloadSbgnFile(pathways$pathway.id[1:3])
pathways.local.file

By default findPathways searches for keywords in pathway names. It can also search by different ID types

pathways <- findPathways(c("tp53","Trp53"),keyword.type = "SYMBOL")
head(pathways)
pathways <- findPathways(c("K04451","K10136"),keyword.type = "KO")
head(pathways)

4.3.1.4 Different layout for the same pathway

Researchers may have different tastes for a “good looking” layout. We have created different layouts for each pathway. User can download them from pre-generated SBGN-ML files and try.

4.3.2 Custom SBGN-ML files

USers can also define their own pathways and create custom SBGN-ML files from scratch. Several tools like Newt editor can let the user draw a pathway diagram and save it as SBGN-ML file. The tools may also able to generate a primitive pathway layout. But these layouts are usually not desirable for data visualization with too many node-node overlaps and edge-node crossings. Therefore, we recommend the users to stick to SBGNview core collection.

5 Basic visualization

SBGNview can visualize a range of omics data, including both gene (or transcript, protein, enzyme) data and compound (or metabolite, chemical, small molecules) data.

5.1 Visualize gene data

Gene/protein related data will be mapped to “macromolecule” nodes or glyphs on SBGN maps. Most of online databases or resources have their unique glyph ID types and it is likely different from the ID type in the gene data. We often need to map the omics IDs to the node IDs in the SBGN-ML file. In the quick start example (“Adrenaline and noradrenaline biosynthesis” pathway), the SBGN-ML file uses pathwayCommons IDs for gene/protein nodes, whereas the omics dataset uses Entrez gene IDs. The function SBGNview can automatically map common ID types such as ENTREZ, UniProt etc. to nodes in our pre-generated SBGN-ML files as shown in the quick start example. We can also do it manually using function changeDataId, which is called by SBGNview internally to do ID mapping. Supported ID type pairs can be found in data(mapped.ids). changeDataId uses pre-generated mapping tables or pathview to do the mapping. If the input-output ID type pair is not in data(mapped.ids) or can’t be mapped by SBGNview automatically, user needs to provide the mapping table explicitly using the “id.mapping.table” argument.

Firt, load the preprocessed demo gene data.

data("gse16873.d")
head(gse16873.d)

Let’s change the IDs in the gene data manually as an demo.

# select a subset of data for later use
gse16873 <- gse16873.d[,1:3]
# gene data to demonstrate ID convertion
gene.data <- gse16873
head(gene.data[,1:2])
##                DCIS_1      DCIS_2
## 10000     -0.30764480 -0.14722769
## 10001      0.41586805 -0.33477259
## 10002      0.19854925  0.03789588
## 10003     -0.23155297 -0.09659311
## 100048912 -0.04490724 -0.05203146
## 10004     -0.08756237 -0.05027725
gene.data <- changeDataId(data.input.id = gene.data,
                          input.type = "entrez",
                          output.type = "pathwayCommons",
                          mol.type = "gene",
                          sum.method = "sum")

When multiple genes are mapped to the same glyph ID, we use the sum of their values (sum.method = “sum”).

head(gene.data[,1:2])
##                              DCIS_1     DCIS_2
## Protein100049:@:PWY-5188 0.02357413 0.35777714
## Protein100090:@:PWY-5188 0.02357413 0.35777714
## Protein100099:@:PWY-5188 0.02357413 0.35777714
## Protein100143:@:PWY-5424 0.23976217 0.03226769
## Protein100158:@:PWY-5188 0.02357413 0.35777714
## Protein100294:@:PWY-5188 0.23614703 0.52414811

Now we run SBGNview, the main function to map and render gene data on SBGN map.

SBGNview.obj <- SBGNview(gene.data = gene.data,
                         input.sbgn = "P00001",
                         output.file = "test_output",
                         gene.id.type = "pathwayCommons",
                         output.formats =  c("png", "pdf", "ps"))
SBGNview.obj

SBGNview will always generate a .svg file. Other formats can be added also. In this example, three additional files (pdf, ps, png) will be created in the same folder.