This little vignette shows you how to get started with the data available for HOGs in the OmaDB
package.
Hierarchical orthologous groups (also known as HOGs) are sets of genes that are defined with respect to particular taxonomic ranges of interest[1]. They group genes that have descended from a single common ancestral genes in that taxonomic range.
HOGs hold a lot of useful information and have many applications in various contexts, including inference of gene function, study of gene evolution dynamics and comparative genomics. Each HOG has a taxonomic range - within it, a given HOG can branch into constructs known as subHOGs which arise in an event of gene duplication.
HOGs can be retrived either by their hog id or by one of their members. Let’s say we are interested in a gene that goes by the name of HUMAN22168, which can be simply accessed using getHOG(“HOG:0273533.1b”). The exploration of the example response object is below.
## $hog_id
## [1] "HOG:0273533.1b"
##
## $level
## [1] "Sarcopterygii"
##
## $levels_url
## [1] "https://omabrowser.org/api/hog/HOG:0273533.1b/?level=Sarcopterygii"
##
## $members_url
## [1] "https://omabrowser.org/api/hog/HOG:0273533.1b/members/?level=Sarcopterygii"
##
## $alternative_levels
## X..i..
## 1 Chlorocebus sabaeus
## 2 Carnivora
## 3 Gorilla gorilla gorilla
## 4 Sauria
## 5 Tetrapoda
## 6 Microcebus murinus
## 7 Hominoidea
## 8 Haplorrhini
## 9 Chiroptera
## 10 Mustela putorius furo
## 11 Pelodiscus sinensis
## 12 Catarrhini
## 13 Pongo abelii
## 14 Laurasiatheria
## 15 Oryctolagus cuniculus
## 16 Eutheria
## 17 Strepsirrhini
## 18 Pan troglodytes
## 19 Lagomorpha
## 20 Mammalia
## 21 Theria
## 22 Xenopus tropicalis
## 23 Sorex araneus
## 24 Dasypus novemcinctus
## 25 Macaca mulatta
## 26 Hominidae
## 27 Metatheria
## 28 Homininae
## 29 Cavia porcellus
## 30 Boreoeutheria
## 31 Insectivora
## 32 Equus caballus
## 33 Otolemur garnettii
## 34 Canis lupus familiaris
## 35 Glires
## 36 Ictidomys tridecemlineatus
## 37 Simiiformes
## 38 Tarsius syrichta
## 39 Archelosauria
## 40 Cercopithecinae
## 41 Euarchontoglires
## 42 Callithrix jacchus
## 43 Pteropus vampyrus
## 44 Papio anubis
## 45 Primates
## 46 Caniformia
## 47 Nomascus leucogenys
## 48 Homo sapiens
## 49 Amniota
## 50 Sarcophilus harrisii
## 51 Ailuropoda melanoleuca
## 52 Xenarthra
## 53 Myotis lucifugus
## 54 Rodentia
## 55 Monodelphis domestica
##
## $roothog_id
## [1] 273533
##
## $parent_hogs
## hog_id levels_url
## 1 HOG:0273533 https://omabrowser.org/api/hog/HOG:0273533/
## members_url
## 1 https://omabrowser.org/api/hog/HOG:0273533/members/
##
## $children_hogs
## hog_id levels_url
## 1 HOG:0273533.1b.1b https://omabrowser.org/api/hog/HOG:0273533.1b.1b/
## 2 HOG:0273533.1b.1a https://omabrowser.org/api/hog/HOG:0273533.1b.1a/
## members_url
## 1 https://omabrowser.org/api/hog/HOG:0273533.1b.1b/members/
## 2 https://omabrowser.org/api/hog/HOG:0273533.1b.1a/members/
## [1] "hog_id : character"
## [1] "level : character"
## [1] "levels_url : URL"
## [1] "members_url : URL"
## [1] "alternative_levels : data.frame"
## [1] "roothog_id : integer"
## [1] "parent_hogs : data.frame"
## [1] "children_hogs : data.frame"
## [1] 1 3
parent_hog_id = parent_hogs[[hog_id]]
children_hogs = getAttribute(hog,'children_hogs')
children_hogs
## hog_id levels_url
## 1 HOG:0273533.1b.1b https://omabrowser.org/api/hog/HOG:0273533.1b.1b/
## 2 HOG:0273533.1b.1a https://omabrowser.org/api/hog/HOG:0273533.1b.1a/
## members_url
## 1 https://omabrowser.org/api/hog/HOG:0273533.1b.1b/members/
## 2 https://omabrowser.org/api/hog/HOG:0273533.1b.1a/members/
From above, we can recognise that HOG:0261495.1a is a subhog of its parent hog HOG:0261495 and that it further splits into 2 children subhogs, HOG:0261495.1a.1a and HOG:0261495.1a.1b respectively. We further investigate at what taxonomic level this split has occured by looking at the root levels of the children subhogs.
We have just detected a gene duplication - it would be interesting to see if there is any gene differentiation as a consenquence. We can check this by looking at the member protein annotations for each subhog_id and perform a GO enrichment analysis on this by using the Bioconductor package topGO.