Gene Ontology

Zuguang Gu ( z.gu@dkfz.de )

2024-02-06

Gene Ontology is the most widely used bio-ontologies. On Bioconductor, there are standard packages for GO (GO.db) and organism-specific GO annotation packages (org.*.db). In simona, there is a helper function create_ontology_DAG_from_GO_db() which makes use of the Biocoductor standard GO packages and constructs a DAG object automatically.

Create the GO DAG object

GO has three namespaces (or ontologies): biological process (BP), molecular function (MF) and celullar component (CC). The three GO namespaces are mutually exclusive, so the first argument of create_ontology_DAG_from_GO_db() is the GO namespace.

library(simona)
dag = create_ontology_DAG_from_GO_db("BP")
dag
## An ontology_DAG object:
##   Source: GO BP / GO.db package 3.18.0 
##   27597 terms / 55036 relations
##   Root: GO:0008150 
##   Terms: GO:0000001, GO:0000002, GO:0000003, GO:0000011, ...
##   Max depth: 18 
##   Avg number of parents: 1.99
##   Avg number of children: 1.88
##   Aspect ratio: 358:1 (based on the longest distance from root)
##                 771.78:1 (based on the shortest distance from root)
##   Relations: is_a, part_of
## 
## With the following columns in the metadata data frame:
##   id, name, definition

There are three main GO relations: “is_a”, “part_of” and “regulates”. “regulates” has two child relation types in GO: “negatively_regulates” and “positively_regulates”. So when “regulates” is selected, the two child relation types are automatically selected. By default only “is_a” and “part_of” are selected.

You can set a subset of relation types with the argument relations.

create_ontology_DAG_from_GO_db("BP", relations = c("part of", "regulates"))  # "part_of" is also OK
## An ontology_DAG object:
##   Source: GO BP / GO.db package 3.18.0 
##   27597 terms / 63527 relations
##   Root: GO:0008150 
##   Terms: GO:0000001, GO:0000002, GO:0000003, GO:0000011, ...
##   Max depth: 18 
##   Avg number of parents: 2.30
##   Avg number of children: 2.19
##   Aspect ratio: 277.13:1 (based on the longest distance from root)
##                 997.5:1 (based on the shortest distance from root)
##   Relations: is_a, negatively_regulates, part_of, positively_regulates,
##              regulates
##   Relation types may have hierarchical relations.
## 
## With the following columns in the metadata data frame:
##   id, name, definition

“is_a” is always selected because this is primary semantic relation type. So if you only want to include “is_a” relation, you can assign an empty vector to relations:

create_ontology_DAG_from_GO_db("BP", relations = character(0)) # or NULL, NA

Or you can apply dag_filter() after DAG is generated.

dag = create_ontology_DAG_from_GO_db("BP")
dag_filter(dag, relations = "is_a")

Add gene annotation

Gene annotation can be set with the argument org_db. The value is an OrgDb object of the corresponding organism. The primary gene ID type in the __org.*.db__ package is internally used (which is normally the EntreZ ID type).

library(org.Hs.eg.db)
dag = create_ontology_DAG_from_GO_db("BP", org_db = org.Hs.eg.db)
dag
## An ontology_DAG object:
##   Source: GO BP / GO.db package 3.18.0 
##   27597 terms / 55036 relations
##   Root: GO:0008150 
##   Terms: GO:0000001, GO:0000002, GO:0000003, GO:0000011, ...
##   Max depth: 18 
##   Avg number of parents: 1.99
##   Avg number of children: 1.88
##   Aspect ratio: 358:1 (based on the longest distance from root)
##                 771.78:1 (based on the shortest distance from root)
##   Relations: is_a, part_of
##   Annotations: 18870 items
##                291, 1890, 4205, 4358, ...
## 
## With the following columns in the metadata data frame:
##   id, name, definition

For standard organism packages on Biocoductor, the OrgDb object always has the same name as the package, so the name of the organism package can also be set to org_db:

create_ontology_DAG_from_GO_db("BP", org_db = "org.Hs.eg.db")

Similarly, if the analysis is applied on mouse, the mouse organism package can be set to org_db. If the mouse organism package is not installed yet, it will be installed automatically.

create_ontology_DAG_from_GO_db("BP", org_db = "org.Mm.eg.db")

Genes that are annotated to GO terms can be obtained by term_annotations(). Note the genes are automatically merged from offspring terms.

term_annotations(dag, c("GO:0000002", "GO:0000012"))
## $`GO:0000002`
##  [1] "291"    "1890"   "4205"   "4358"   "4976"   "9361"   "10000"  "55186" 
##  [9] "80119"  "84275"  "92667"  "1763"   "142"    "7157"   "9093"   "7156"  
## [17] "6240"   "50484"  "2021"   "11232"  "83667"  "5428"   "6742"   "56652" 
## [25] "201973"
## 
## $`GO:0000012`
##  [1] "1161"      "2074"      "3981"      "7141"      "7515"      "23411"    
##  [7] "54840"     "55775"     "200558"    "100133315"

Meta data frame

There are additional meta columns attached to the DAG object. They can be accessed by mcols().

head(mcols(dag))
##                    id                             name
## GO:0000001 GO:0000001        mitochondrion inheritance
## GO:0000002 GO:0000002 mitochondrial genome maintenance
## GO:0000003 GO:0000003                     reproduction
## GO:0000011 GO:0000011              vacuole inheritance
## GO:0000012 GO:0000012       single strand break repair
## GO:0000017 GO:0000017        alpha-glucoside transport
##                                                                                                                                                                                                                                                                                                   definition
## GO:0000001                                                                                                           The distribution of mitochondria, including the mitochondrial genome, into daughter cells after mitosis or meiosis, mediated by interactions between mitochondria and the cytoskeleton.
## GO:0000002                                                                                                                                                 The maintenance of the structure and integrity of the mitochondrial genome; includes replication and segregation of the mitochondrial chromosome.
## GO:0000003                                                                                                                                                                      The production of new individuals that contain some portion of genetic material inherited from one or more parent organisms.
## GO:0000011                                                                                                                                                        The distribution of vacuoles into daughter cells after mitosis or meiosis, mediated by interactions between vacuoles and the cytoskeleton.
## GO:0000012                                                                                                                                                      The repair of single strand breaks in DNA. Repair of such breaks is mediated by the same enzyme systems as are used in base excision repair.
## GO:0000017 The directed movement of alpha-glucosides into, out of or within a cell, or between cells, by means of some agent such as a transporter or pore. Alpha-glucosides are glycosides in which the sugar group is a glucose residue, and the anomeric carbon of the bond is in an alpha configuration.

The additional information of GO terms is from the GO.db package. The row order of the meta data frame is the same as in dag_all_terms(dag).

Session info

sessionInfo()
## R version 4.3.2 Patched (2023-11-13 r85521)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.3 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.18-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
## [1] org.Hs.eg.db_3.18.0  AnnotationDbi_1.64.1 IRanges_2.36.0      
## [4] S4Vectors_0.40.2     Biobase_2.62.0       BiocGenerics_0.48.1 
## [7] igraph_2.0.1.1       simona_1.0.10        knitr_1.45          
## 
## loaded via a namespace (and not attached):
##  [1] KEGGREST_1.42.0         circlize_0.4.15         shape_1.4.6            
##  [4] rjson_0.2.21            xfun_0.41               bslib_0.6.1            
##  [7] GlobalOptions_0.1.2     bitops_1.0-7            vctrs_0.6.5            
## [10] tools_4.3.2             parallel_4.3.2          Polychrome_1.5.1       
## [13] RSQLite_2.3.5           highr_0.10              cluster_2.1.6          
## [16] blob_1.2.4              pkgconfig_2.0.3         RColorBrewer_1.1-3     
## [19] scatterplot3d_0.3-44    GenomeInfoDbData_1.2.11 lifecycle_1.0.4        
## [22] compiler_4.3.2          Biostrings_2.70.2       codetools_0.2-19       
## [25] ComplexHeatmap_2.18.0   clue_0.3-65             GenomeInfoDb_1.38.5    
## [28] httpuv_1.6.14           htmltools_0.5.7         sass_0.4.8             
## [31] RCurl_1.98-1.14         yaml_2.3.8              later_1.3.2            
## [34] crayon_1.5.2            jquerylib_0.1.4         GO.db_3.18.0           
## [37] ellipsis_0.3.2          cachem_1.0.8            iterators_1.0.14       
## [40] foreach_1.5.2           mime_0.12               digest_0.6.34          
## [43] fastmap_1.1.1           grid_4.3.2              colorspace_2.1-0       
## [46] cli_3.6.2               magrittr_2.0.3          promises_1.2.1         
## [49] bit64_4.0.5             rmarkdown_2.25          XVector_0.42.0         
## [52] httr_1.4.7              matrixStats_1.2.0       bit_4.0.5              
## [55] png_0.1-8               GetoptLong_1.0.5        memoise_2.0.1          
## [58] shiny_1.8.0             evaluate_0.23           doParallel_1.0.17      
## [61] rlang_1.1.3             Rcpp_1.0.12             xtable_1.8-4           
## [64] glue_1.7.0              DBI_1.2.1               xml2_1.3.6             
## [67] jsonlite_1.8.8          R6_2.5.1                zlibbioc_1.48.0