1 Introduction

To create an EpiTxDb object a number of different functions are available. The most univeral functions are makeEpiTxDb and makeEpiTxDbFromGRanges. makeEpiTxDb uses four data.frames as input, whereas makeEpiTxDbFromGRanges is a wrapper for information available as a GRanges object.

The other functions are makeEpiTxDbFromRMBase and makeEpiTxDbFromtRNAdb, which are aimed to make data available from the RMBase v2.0 database (Xuan et al. 2017; Sun et al. 2015) or the tRNAdb (Jühling et al. 2009; Sprinzl and Vassilenko 2005). However, before creating your EpiTxDb objects, have a look at the already available resources for H. sapiens, M. musculus and S. cerevisiae.

Additional metadata can be provided as separate data.frame for all functions. The data.frame must have two columns name and value.

library(GenomicRanges)
library(EpiTxDb)

2 makeEpiTxDb and makeEpiTxDbFromGRanges

The creation of an etdb object is quite easy starting with a GRanges object.

gr <- GRanges(seqnames = "test",
              ranges = IRanges::IRanges(1,1),
              strand = "+",
              DataFrame(mod_id = 1L,
                        mod_type = "Am",
                        mod_name = "Am_1"))
etdb <- makeEpiTxDbFromGRanges(gr, metadata = data.frame(name = "test",
                                                         value = "Yes"))
## Creating EpiTxDb object ... done
etdb
## EpiTxDb object:
## # Db type: EpiTxDb
## # Supporting package: EpiTxDb
## # test: Yes
## # Nb of modifications: 1
## # Db created by: EpiTxDb package from Bioconductor
## # Creation time: 2021-05-19 17:36:46 -0400 (Wed, 19 May 2021)
## # EpiTxDb version at creation time: 1.4.0
## # RSQLite version at creation time: 2.2.7
## # DBSCHEMAVERSION: 1.0
metadata(etdb)

Additional data can be provided via the metadata columns of the GRanges object. For supported columns have a look at ?makeEpiTxDb or ?makeEpiTxDbFromGRanges.

3 makeEpiTxDbFromtRNAdb

The information of the tRNAdb can be accessed via the tRNAdbImport package using the RNA database. As a result a ModRNAStringSet object is returned from which the modifications can be extracted using separate().

The only input require is a valid organism name returned by listAvailableOrganismsFromtRNAdb().

etdb <- makeEpiTxDbFromtRNAdb("Saccharomyces cerevisiae")
## Loading data from tRNAdb ...
## Assembling data ...
## Creating EpiTxDb object ... done
etdb
## EpiTxDb object:
## # Db type: EpiTxDb
## # Supporting package: EpiTxDb
## # Nb of modifications: 557
## # Db created by: EpiTxDb package from Bioconductor
## # Creation time: 2021-05-19 17:37:20 -0400 (Wed, 19 May 2021)
## # EpiTxDb version at creation time: 1.4.0
## # RSQLite version at creation time: 2.2.7
## # DBSCHEMAVERSION: 1.0

For additional information have a look at ?makeEpiTxDbFromtRNAdb. The result returned from the tRNAdb is also available as GRanges object, if gettRNAdbDataAsGRanges() is used.

4 makeEpiTxDbFromRMBase

Analogous to the example above makeEpiTxDbFromRMBase() will download the data from the RMBase v2.0. Three inputs are required, organism, genome and modtype, which have to valid bia the functions listAvailableOrganismsFromRMBase(), .listAvailableGenomesFromRMBase() and listAvailableModFromRMBase.

etdb <- makeEpiTxDbFromRMBase(organism = "Saccharomyces cerevisiae",
                              genome = "sacCer3",
                              modtype = "m1A")

Internally, the files are cached using the BiocFileCache package and passed to makeEpiTxDbFromRMBaseFiles(), which can also be used with locally stored files. The resuls for creating the EpiTxDb class are processed from these files via the getRMBaseDataAsGRanges() function.

5 Session info

sessionInfo()
## R version 4.1.0 (2021-05-18)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.2 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.13-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.13-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] EpiTxDb_1.4.0        Modstrings_1.8.0     Biostrings_2.60.0   
##  [4] XVector_0.32.0       AnnotationDbi_1.54.0 Biobase_2.52.0      
##  [7] GenomicRanges_1.44.0 GenomeInfoDb_1.28.0  IRanges_2.26.0      
## [10] S4Vectors_0.30.0     BiocGenerics_0.38.0  BiocStyle_2.20.0    
## 
## loaded via a namespace (and not attached):
##  [1] MatrixGenerics_1.4.0        httr_1.4.2                 
##  [3] sass_0.4.0                  bit64_4.0.5                
##  [5] jsonlite_1.7.2              bslib_0.2.5.1              
##  [7] assertthat_0.2.1            BiocManager_1.30.15        
##  [9] BiocFileCache_2.0.0         blob_1.2.1                 
## [11] GenomeInfoDbData_1.2.6      Rsamtools_2.8.0            
## [13] yaml_2.2.1                  progress_1.2.2             
## [15] lattice_0.20-44             pillar_1.6.1               
## [17] RSQLite_2.2.7               glue_1.4.2                 
## [19] digest_0.6.27               Structstrings_1.8.0        
## [21] colorspace_2.0-1            tRNA_1.10.0                
## [23] Matrix_1.3-3                htmltools_0.5.1.1          
## [25] XML_3.99-0.6                pkgconfig_2.0.3            
## [27] biomaRt_2.48.0              bookdown_0.22              
## [29] zlibbioc_1.38.0             purrr_0.3.4                
## [31] scales_1.1.1                BiocParallel_1.26.0        
## [33] tibble_3.1.2                KEGGREST_1.32.0            
## [35] ggplot2_3.3.3               generics_0.1.0             
## [37] ellipsis_0.3.2              cachem_1.0.5               
## [39] SummarizedExperiment_1.22.0 GenomicFeatures_1.44.0     
## [41] magrittr_2.0.1              crayon_1.4.1               
## [43] memoise_2.0.0               evaluate_0.14              
## [45] fansi_0.4.2                 xml2_1.3.2                 
## [47] tools_4.1.0                 prettyunits_1.1.1          
## [49] hms_1.1.0                   matrixStats_0.58.0         
## [51] BiocIO_1.2.0                lifecycle_1.0.0            
## [53] stringr_1.4.0               munsell_0.5.0              
## [55] tRNAdbImport_1.10.0         DelayedArray_0.18.0        
## [57] compiler_4.1.0              jquerylib_0.1.4            
## [59] rlang_0.4.11                grid_4.1.0                 
## [61] RCurl_1.98-1.3              rjson_0.2.20               
## [63] rappdirs_0.3.3              bitops_1.0-7               
## [65] rmarkdown_2.8               gtable_0.3.0               
## [67] restfulr_0.0.13             DBI_1.1.1                  
## [69] curl_4.3.1                  R6_2.5.0                   
## [71] GenomicAlignments_1.28.0    knitr_1.33                 
## [73] dplyr_1.0.6                 rtracklayer_1.52.0         
## [75] fastmap_1.1.0               bit_4.0.4                  
## [77] utf8_1.2.1                  filelock_1.0.2             
## [79] stringi_1.6.2               Rcpp_1.0.6                 
## [81] vctrs_0.3.8                 png_0.1-7                  
## [83] dbplyr_2.1.1                tidyselect_1.1.1           
## [85] xfun_0.23

References

Jühling, Frank, Mario Mörl, Roland K. Hartmann, Mathias Sprinzl, Peter F. Stadler, and Joern Pütz. 2009. “TRNAdb 2009: Compilation of tRNA Sequences and tRNA Genes.” Nucleic Acids Research 37: D159–D162. https://doi.org/10.1093/nar/gkn772.

Sprinzl, Mathias, and Konstantin S. Vassilenko. 2005. “Compilation of tRNA Sequences and Sequences of tRNA Genes.” Nucleic Acids Research 33: D139–D140. https://doi.org/10.1093/nar/gki012.

Sun, Wen-Ju, Jun-Hao Li, Shun Liu, Jie Wu, Hui Zhou, Liang-Hu Qu, and Jian-Hua Yang. 2015. “RMBase: a resource for decoding the landscape of RNA modifications from high-throughput sequencing data.” Nucleic Acids Research 44 (D1): D259–D265. https://doi.org/10.1093/nar/gkv1036.

Xuan, Jia-Jia, Wen-Ju Sun, Peng-Hui Lin, Ke-Ren Zhou, Shun Liu, Ling-Ling Zheng, Liang-Hu Qu, and Jian-Hua Yang. 2017. “RMBase v2.0: deciphering the map of RNA modifications from epitranscriptome sequencing data.” Nucleic Acids Research 46 (D1): D327–D334. https://doi.org/10.1093/nar/gkx934.