CTDquerier 1.5.0
The Comparative Toxicogenomics Database (CTDbase; http://ctdbase.org) is a public resource for toxicogenomic information manually curated from the peer-reviewed scientific literature, providing key information about the interactions of environmental chemicals with gene products and their effect on human disease [1][2]. CTDbase is offered to public by using a web-based interface that includes basic and advanced query options to access data for sequences, references, and toxic agents, and a platform for analysing sequences.
CTDquerier
R packageCTDquerier
is an R package that allows to R users to download basic data from CTDbase about genes, chemicals and diseases. Once the user’s input is validated allows to query CTDbase to download the information of the given input from the other modules.
CTDquerier
can be installed using devtools
. To install CTDquerier
run the following command in an R session:
if (!requireNamespace("BiocManager", quietly=TRUE))
install.packages("BiocManager")
BiocManager::install("CTDquerier")
Once installed, CTDquerier
should be loaded running the following command:
library( CTDquerier )
The main function of CTDquerier
are three depending of the input: genes, chemicals or diseases. Table 1 indicates the proper function to be used to query CTDbase depending on the input.
Input | Function |
---|---|
Genes | query_ctd_gene |
Chemicals | query_ctd_chem |
Diseases | query_ctd_dise |
The function to query CTDbase relies on a set of function that download the specific vocabulary of each input. Table 2 shows the different functions that are used to download the specific vocabulary and to load it into R. This process is transparent to user since it is encapsulated into each one of the query functions.
Input | Load Function | Download Function |
---|---|---|
Genes | load_ctd_gene |
download_ctd_genes |
Chemicals | load_ctd_chem |
download_ctd_chem |
Diseases | load_ctd_dise |
download_ctd_dise |
The three main functions of CTDquerier
returns CTDdata
objects. These objects can be used to plot the information available in CTDbase by using plot
. Moreover, the informatin from CTDbase can be extracted as data.frame
s using the method get_table
. Both plot
and extract
methods needs an argument index_name
that indicates the table to be ploted or extarcted. Table 3 shows the relation between the possible options for index_name
depeting of the query performed. Also the pssible representation of each table.
Accessor | Genes | Chemicals | Diseases |
---|---|---|---|
gene interactions |
heat-map/network | heat-map | |
chemical interactions |
heat-map | heat-map | |
diseases |
heat-map | heat-map | |
gene-gene interactions |
heat-map/network | ||
kegg pathways |
network | heat-map | network |
go terms |
network | heat-map |
To query CTDbase for a given gene or set of genes, we use the function query_ctd_gene
:
args( query_ctd_gene )
## function (terms, verbose = FALSE)
## NULL
The argument terms
is the one that must be filled with the list of genes of interest. The argument filename
is filled with the name that will receive the table with the specific vocabulary from CTDbase for genes. The function checks if this file already exists, if is the case it used the local version. The argument mode
is used to download the vocabulary file (for more info., check download.file
from module utils
). Finally, the argument verbose
will show relevant messages about the querying process if is set to TRUE
.
A typical gene-query follows:
ctd_genes <- query_ctd_gene(
terms = c( "APOE", "APOEB", "APOE2", "APOE3" , "APOE4", "APOA1", "APOA5" ) )
## Warning in .get_cache(): /home/biocbuild/.cache/CTDQuery
## Using temporary cache /tmp/RtmpRHEOpp/BiocFileCache
## Warning in .get_cache(): /home/biocbuild/.cache/CTDQuery
## Using temporary cache /tmp/RtmpRHEOpp/BiocFileCache
## 1/tmp/RtmpRHEOpp/BiocFileCache/592453443d9_CTD_genes.tsv.gz
## Warning in load_ctd_gene(): 1/tmp/RtmpRHEOpp/BiocFileCache/
## 592453443d9_CTD_genes.tsv.gz
## 1/tmp/RtmpRHEOpp/BiocFileCache/592453443d9_CTD_genes.tsv.gz
## Warning in load_ctd_gene(): 1/tmp/RtmpRHEOpp/BiocFileCache/
## 592453443d9_CTD_genes.tsv.gz
## Warning in query_ctd_gene(terms = c("APOE", "APOEB", "APOE2", "APOE3",
## "APOE4", : 2/7 terms were dropped.
ctd_genes
## Object of class 'CTDdata'
## -------------------------
## . Type: GENE
## . Length: 5
## . Items: APOE, ..., APOA5
## . Diseases: 2215 ( 5089 / 5715 )
## . Gene-gene interactions: 192 ( 232 )
## . Gene-chemical interactions: 607 ( 1539 )
## . KEGG pathways: 59 ( 59 )
## . GO terms: 353 ( 355 )
As can be seen, query_ctd_gene
informs about the number of terms used in the query and the number of terms lost in the process. To know the exact terms that were found in CTDbase and the ones that were lost, we use the method get_terms
.
get_terms( ctd_genes )
## $found
## [1] "APOE" "APOEB" "APOE2" "APOA1" "APOA5"
##
## $lost
## [1] "APOE3" "APOE4"
Now that the information about the genes of interest was download from CTDbase we can access to it using the method get_table
. Method extract allows to access to different tables according to the origin of the object. For a created from genes the accessible tables are:
Table | Available | Accessors |
---|---|---|
Gene Interactions | NO | "gene interactions" |
Chemicals Interactions | YES | "chemical interactions" |
Diseases | YES | "diseases" |
Gene-Gene Interactions | YES | "gene-gene interactions" |
Pathways (KEGG) | YES | "kegg pathways" |
GO (Gene Ontology Terms) | YES | "go terms" |
Example of how to extract one of this tables follows:
get_table( ctd_genes , index_name = "diseases" )[ 1:2, 1:3 ]
## DataFrame with 2 rows and 3 columns
## Disease.Name Disease.ID Direct.Evidence
## <character> <character> <character>
## 1 Chemical and Drug Induced Liver Injury MESH:D056486 marker/mechanism
## 2 Atherosclerosis MESH:D050197 marker/mechanism
The information stored in each table can be see in the following code, were the names of the columns of each table is shown:
colnames( get_table( ctd_genes, index_name = "chemical interactions" ) )
## [1] "Chemical.Name" "Chemical.ID" "CAS.RN"
## [4] "Interaction" "Interaction.Actions" "Reference.Count"
## [7] "Organism.Count" "GeneSymbol" "GeneID"
colnames( get_table( ctd_genes, index_name = "diseases" ) )
## [1] "Disease.Name" "Disease.ID" "Direct.Evidence"
## [4] "Inference.Network" "Inference.Score" "Reference.Count"
## [7] "GeneSymbol" "GeneID"
colnames( get_table( ctd_genes, index_name = "gene-gene interactions" ) )
## [1] "Source.Gene.Symbol" "Source.Gene.ID" "Target.Gene.Symbol"
## [4] "Target.Gene.ID" "Source.Organism" "Target.Organism"
## [7] "Assay" "Interaction.Type" "Throughput"
## [10] "Reference.Authors" "Reference.Citation" "PubMed.ID"
## [13] "GeneSymbol" "GeneID"
colnames( get_table( ctd_genes, index_name = "kegg pathways" ) )
## [1] "Pathway" "Pathway.ID" "GeneSymbol" "GeneID"
colnames( get_table( ctd_genes, index_name = "go terms" ) )
## [1] "Ontology" "Qualifiers" "GO.Term.Name"
## [4] "GO.Term.ID" "Organisms..Evidence." "GeneSymbol"
## [7] "GeneID"
CTDdata
ObjectsThe generic plot
function has the same mechanism that get_table
. Using the argument index_name
we select the table to plot. Then, the arguments subset.gene
and subset.*
(being * chemicals, diseases, pathways and go) allows to filter the X-axis and Y-axis. Depending the table to be plotted, the argument field.score
can be used to select the field to plotted (that can takes "Inference"
or "Reference"
values). Then argument filter.score
can be used to filter entries of the table. Finally, the argument max.length
is in charge to reduce the characters of the labels.
The following plot shows the number of reference that cites the association between the APOE-like genes and chemicals.
plot( ctd_genes, index_name = "chemical interactions", filter.score = 3 )
Then, next plot shows shows the inference score that associates the APOE-like genes with diseases according to CTDbase.
plot( ctd_genes, index_name = "disease", filter.score = 115 )
The plot to explore the gene-gene interactions is based in a network representation. The genes from the original set are dark-coloured, while the other genes are light-coloured.
plot( ctd_genes, index_name = "gene-gene interactions",
representation = "network", main = "APOE-like gene-gene interactions" )