This package is designed for reactome pathway-based analysis. Reactome is an open-source, open access, manually curated and peer-reviewed pathway database.
If you use Reactome1 in published research, please cite:
G Yu, QY He*. ReactomePA: an R/Bioconductor package for reactome pathway analysis and visualization. Molecular BioSystems 2016, 12(2):477-479. doi:[10.1039/C5MB00663E](http://dx.doi.org/10.1039/C5MB00663E)
Currently ReactomePA supports several model organisms, including ‘celegans’, ‘fly’, ‘human’, ‘mouse’, ‘rat’, ‘yeast’ and ‘zebrafish’. The input gene ID should be Entrez gene ID. We recommend using clusterProfiler::bitr
to convert biological IDs. For more detail, please refer to bitr: Biological Id TranslatoR.
Enrichment analysis is a widely used approach to identify biological themes. Here, we implement hypergeometric model to assess whether the number of selected genes associated with reactome pathway is larger than expected. The p values were calculated based the hypergeometric model2.
library(ReactomePA)
data(geneList)
de <- names(geneList)[abs(geneList) > 1.5]
head(de)
## [1] "4312" "8318" "10874" "55143" "55388" "991"
x <- enrichPathway(gene=de,pvalueCutoff=0.05, readable=T)
head(as.data.frame(x))
## ID Description GeneRatio BgRatio
## 68877 68877 Mitotic Prometaphase 25/248 100/6748
## 69278 69278 Cell Cycle, Mitotic 49/248 407/6748
## 2500257 2500257 Resolution of Sister Chromatid Cohesion 23/248 92/6748
## 1640170 1640170 Cell Cycle 54/248 496/6748
## 5663220 5663220 RHO GTPases Activate Formins 21/248 102/6748
## 68886 68886 M Phase 30/248 235/6748
## pvalue p.adjust qvalue
## 68877 8.570808e-15 4.473962e-12 3.951593e-12
## 69278 4.338551e-14 1.132362e-11 1.000150e-11
## 2500257 1.072091e-13 1.589955e-11 1.404316e-11
## 1640170 1.218356e-13 1.589955e-11 1.404316e-11
## 5663220 7.285769e-11 7.606343e-09 6.718246e-09
## 68886 1.545857e-09 1.344895e-07 1.187869e-07
## geneID
## 68877 CDCA8/CDC20/CENPE/CCNB2/NDC80/NCAPH/SKA1/CENPM/CENPN/CDK1/ERCC6L/MAD2L1/KIF18A/BIRC5/NCAPG/AURKB/CCNB1/KIF2C/PLK1/BUB1B/ZWINT/CENPU/SPC25/CENPI/TAOK1
## 69278 CDC45/CDCA8/MCM10/CDC20/FOXM1/KIF23/CENPE/MYBL2/CCNB2/NDC80/TOP2A/NCAPH/RRM2/UBE2C/SKA1/NEK2/CENPM/CENPN/CCNA2/CDK1/ERCC6L/MAD2L1/GINS1/KIF18A/CDT1/BIRC5/NCAPG/AURKB/GINS2/KIF20A/AURKA/CCNB1/MCM5/PTTG1/MCM2/KIF2C/CDC25A/CDC6/PLK1/BUB1B/ZWINT/CENPU/SPC25/CENPI/ESPL1/CCNE1/ORC6/ORC1/TAOK1
## 2500257 CDCA8/CDC20/CENPE/CCNB2/NDC80/SKA1/CENPM/CENPN/CDK1/ERCC6L/MAD2L1/KIF18A/BIRC5/AURKB/CCNB1/KIF2C/PLK1/BUB1B/ZWINT/CENPU/SPC25/CENPI/TAOK1
## 1640170 CDC45/CDCA8/MCM10/CDC20/FOXM1/KIF23/CENPE/MYBL2/CCNB2/NDC80/TOP2A/NCAPH/RRM2/UBE2C/HJURP/SKA1/NEK2/CENPM/CENPN/CCNA2/CDK1/ERCC6L/MAD2L1/GINS1/KIF18A/CDT1/BIRC5/NCAPG/AURKB/GINS2/CHEK1/KIF20A/AURKA/CCNB1/MCM5/PTTG1/LMNB1/MCM2/KIF2C/CDC25A/CDC6/PLK1/BUB1B/ZWINT/CENPU/SPC25/CENPI/ESPL1/RAD51/CCNE1/ORC6/ORC1/OIP5/TAOK1
## 5663220 CDCA8/CDC20/CENPE/NDC80/SKA1/CENPM/CENPN/ERCC6L/MAD2L1/KIF18A/BIRC5/AURKB/KIF2C/PLK1/BUB1B/ZWINT/CENPU/SPC25/CENPI/TAOK1/EVL
## 68886 CDCA8/CDC20/KIF23/CENPE/CCNB2/NDC80/NCAPH/UBE2C/SKA1/CENPM/CENPN/CDK1/ERCC6L/MAD2L1/KIF18A/BIRC5/NCAPG/AURKB/KIF20A/CCNB1/PTTG1/KIF2C/PLK1/BUB1B/ZWINT/CENPU/SPC25/CENPI/ESPL1/TAOK1
## Count
## 68877 25
## 69278 49
## 2500257 23
## 1640170 54
## 5663220 21
## 68886 30
For calculation/parameter details, please refer to the vignette of DOSE3..
Pathway analysis using NGS data (eg, RNA-Seq and ChIP-Seq) can be performed by linking coding and non-coding regions to coding genes via ChIPseeker package, which can annotates genomic regions to their nearest genes, host genes, and flanking genes respectivly. In addtion, it provides a function, seq2gene, that simultaneously considering host genes, promoter region and flanking gene from intergenic region that may under control via cis-regulation. This function maps genomic regions to genes in a many-to-many manner and facilitate functional analysis. For more details, please refer to ChIPseeker4.
We implement barplot, dotplot enrichment map and category-gene-network for visualization. It is very common to visualize the enrichment result in bar or pie chart. We believe the pie chart is misleading and only provide bar chart.
barplot(x, showCategory=8)
dotplot(x, showCategory=15)
Enrichment map can be viusalized by enrichMap:
enrichMap(x, layout=igraph::layout.kamada.kawai, vertex.label.cex = 1)
In order to consider the potentially biological complexities in which a gene may belong to multiple annotation categories, we developed cnetplot function to extract the complex association between genes and diseases.
cnetplot(x, categorySize="pvalue", foldChange=geneList)
We have developed an R
package clusterProfiler5 for comparing biological themes among gene clusters. ReactomePA works fine with clusterProfiler and can compare biological themes at reactome pathway perspective.
require(clusterProfiler)
data(gcSample)
res <- compareCluster(gcSample, fun="enrichPathway")
plot(res)
A common approach in analyzing gene expression profiles was identifying differential expressed genes that are deemed interesting. The enrichPathway function we demonstrated previously were based on these differential expressed genes. This approach will find genes where the difference is large, but it will not detect a situation where the difference is small, but evidenced in coordinated way in a set of related genes. Gene Set Enrichment Analysis (GSEA)6 directly addressed this limitation. All genes can be used in GSEA; GSEA aggregates the per gene statistics across genes within a gene set, therefore making it possible to detect situations where all genes in a predefined set change in a small but coordinated way. For algorithm details, please refer to the vignette of DOSE3.
y <- gsePathway(geneList, nPerm=1000,
minGSSize=120, pvalueCutoff=0.2,
pAdjustMethod="BH", verbose=FALSE)
res <- as.data.frame(y)
head(res)
## ID Description setSize enrichmentScore
## 1474244 1474244 Extracellular matrix organization 228 -0.4459249
## 2467813 2467813 Separation of Sister Chromatids 130 0.6620115
## 73894 73894 DNA Repair 125 0.4140755
## 68882 68882 Mitotic Anaphase 138 0.6585014
## 2555396 2555396 Mitotic Metaphase and Anaphase 139 0.6608000
## 5357801 5357801 Programmed Cell Death 142 0.4017868
## NES pvalue p.adjust qvalues rank
## 1474244 -1.847583 0.001367989 0.01859296 0.01243057 1824
## 2467813 2.814801 0.003048780 0.01859296 0.01243057 1768
## 73894 1.742587 0.003086420 0.01859296 0.01243057 2643
## 68882 2.803159 0.003246753 0.01859296 0.01243057 1768
## 2555396 2.808080 0.003278689 0.01859296 0.01243057 1768
## 5357801 1.718243 0.003289474 0.01859296 0.01243057 2516
## leading_edge
## 1474244 tags=33%, list=15%, signal=29%
## 2467813 tags=42%, list=14%, signal=36%
## 73894 tags=37%, list=21%, signal=29%
## 68882 tags=41%, list=14%, signal=35%
## 2555396 tags=41%, list=14%, signal=36%
## 5357801 tags=37%, list=20%, signal=30%
## core_enrichment
## 1474244 3910/3371/1291/3791/1301/4238/7450/3685/80781/1280/1306/4314/3675/8425/977/4054/7042/3912/4322/1278/1511/4060/30008/1277/22795/10516/81578/1293/2247/1295/58494/8076/2192/1281/83700/50509/4319/1290/1513/11096/2202/4313/2199/3693/10536/1294/11117/3339/1462/1289/1292/3908/4016/3909/6678/1296/633/2331/7043/3913/1300/2200/1634/7177/1287/3679/2006/7373/1307/1311/1308/652/4148/4239
## 2467813 55143/991/1062/10403/11065/220134/79019/55839/54821/4085/81930/332/9212/9232/11004/5347/701/11130/79682/57405/2491/9700/1058/699/1063/5688/5709/55055/5698/5693/5713/79980/9735/5721/5691/5685/5690/5684/5885/5686/5695/10213/23198/54908/6396/5699/5714/5702/5905/3619/5708/55166/5692/10393
## 73894 51514/55215/9156/9636/5888/9768/2175/2237/5984/5111/2178/5427/5435/4436/5982/2956/7398/5424/7374/2072/7353/2189/5983/5440/4683/9100/142/9246/3014/9978/5425/5433/27033/7415/6119/672/6996/675/1642/5591/10973/3978/51008/5985/328/5437
## 68882 55143/991/1062/10403/11065/220134/79019/55839/54821/4085/81930/332/9212/9232/11004/5347/701/11130/79682/57405/2491/9700/1058/699/1063/5688/5709/55055/5698/5693/5713/79980/9735/5721/5691/5685/5690/5684/5885/5686/5695/10213/23198/54908/6396/5699/5714/7443/8815/5702/5905/3619/5708/55166/5692/10393
## 2555396 55143/991/1062/10403/11065/220134/79019/55839/54821/4085/81930/332/9212/9232/11004/5347/701/11130/79682/57405/2491/9700/1058/699/1063/5688/5709/26271/55055/5698/5693/5713/79980/9735/5721/5691/5685/5690/5684/5885/5686/5695/10213/23198/54908/6396/5699/5714/7443/8815/5702/5905/3619/5708/55166/5692/10393
## 5357801 3002/4001/5688/5317/5709/1869/330/5698/51765/5693/5713/3148/5721/2810/5691/54205/578/5685/637/5690/5684/5686/5695/7027/10213/10059/7534/23198/5588/63967/999/5699/5714/929/5702/5708/8772/5692/5704/1830/7186/9414/356/5683/9500/5694/10018/1612/5366/10134/5718/5682
enrichMap(y)
gseaplot(y, geneSetID = "1280215")
In ReactomePA, we also implemented viewPathway to visualized the pathway.
viewPathway("E2F mediated regulation of DNA replication", readable=TRUE, foldChange=geneList)
More documents can be found on the project website, https://guangchuangyu.github.io/ReactomePA.
Here is the output of sessionInfo()
on the system on which this document was compiled:
## R version 3.3.1 (2016-06-21)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.1 LTS
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] ReactomePA_1.18.1 DOSE_3.0.4 org.Hs.eg.db_3.4.0
## [4] AnnotationDbi_1.36.0 IRanges_2.8.1 S4Vectors_0.12.0
## [7] Biobase_2.34.0 BiocGenerics_0.20.0 BiocStyle_2.2.0
##
## loaded via a namespace (and not attached):
## [1] Rcpp_0.12.7 plyr_1.8.4 tools_3.3.1
## [4] digest_0.6.10 RSQLite_1.0.0 evaluate_0.10
## [7] tibble_1.2 gtable_0.2.0 graph_1.52.0
## [10] fastmatch_1.0-4 igraph_1.0.1 DBI_0.5-1
## [13] yaml_2.1.13 fgsea_1.0.1 gridExtra_2.2.1
## [16] stringr_1.1.0 knitr_1.15 rappdirs_0.3.1
## [19] grid_3.3.1 qvalue_2.6.0 data.table_1.9.6
## [22] BiocParallel_1.8.1 GOSemSim_2.0.1 rmarkdown_1.1
## [25] reactome.db_1.58.0 reshape2_1.4.2 GO.db_3.4.0
## [28] DO.db_2.9 ggplot2_2.1.0 magrittr_1.5
## [31] splines_3.3.1 scales_0.4.1 htmltools_0.3.5
## [34] assertthat_0.1 graphite_1.20.1 colorspace_1.2-7
## [37] labeling_0.3 stringi_1.1.2 munsell_0.4.3
## [40] chron_2.3-47
1. Yu, G. & He, Q.-Y. ReactomePA: An r/Bioconductor package for reactome pathway analysis and visualization. Mol. BioSyst. 12, 477–479 (2016).
2. Boyle, E. I. et al. GO::TermFinder–open source software for accessing gene ontology information and finding significantly enriched gene ontology terms associated with a list of genes. Bioinformatics (Oxford, England) 20, 3710–3715 (2004).
3. Yu, G., Wang, L.-G., Yan, G.-R. & He, Q.-Y. DOSE: An r/Bioconductor package for disease ontology semantic and enrichment analysis. Bioinformatics 31, 608–609 (2015).
4. Yu, G., Wang, L.-G. & He, Q.-Y. ChIPseeker: An r/Bioconductor package for chIP peak annotation, comparison and visualization. Bioinformatics 31, 2382–2383 (2015).
5. Yu, G., Wang, L.-G., Han, Y. & He, Q.-Y. clusterProfiler: an r package for comparing biological themes among gene clusters. OMICS: A Journal of Integrative Biology 16, 284–287 (2012).
6. Subramanian, A. et al. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences of the United States of America 102, 15545–15550 (2005).