txdbmaker 1.4.2
The txdbmaker package provides functions to make TxDb
objects from genomic annotation provided by the UCSC Genome Browser
(https://genome.ucsc.edu/), Ensembl (https://ensembl.org/),
BioMart (http://www.biomart.org/), or directly from a GFF or GTF file.
In this document we will quickly demonstrate the use of these functions.
Note that the package also provides a lower-level utility, makeTxDb(),
for creating TxDb objects from data directly supplied by the user.
Please refer to its man page (?makeTxDb) for more information.
See vignette in the GenomicFeatures package for an
introduction to TxDb objects.
txdbmaker packageInstall the package with:
if (!require("BiocManager", quietly=TRUE))
install.packages("BiocManager")
BiocManager::install("txdbmaker")
Then load it with:
suppressPackageStartupMessages(library(txdbmaker))
makeTxDbFromUCSCThe function makeTxDbFromUCSC downloads UCSC
Genome Bioinformatics transcript tables (e.g. knownGene,
refGene, ensGene) for a genome build (e.g.
mm9, hg19). Use the supportedUCSCtables
utility function to get the list of tables known to work with
makeTxDbFromUCSC.
supportedUCSCtables(genome="mm9")
## tablename track composite_track
## 1 acembly AceView Genes <NA>
## 2 augustusGene AUGUSTUS <NA>
## 3 ccdsGene CCDS <NA>
## 4 ensGene Ensembl Genes <NA>
## 5 exoniphy Exoniphy <NA>
## 6 geneid Geneid Genes <NA>
## 7 genscan Genscan Genes <NA>
## 8 knownGene UCSC Genes <NA>
## 9 knownGeneOld4 Old UCSC Genes <NA>
## 10 nscanGene N-SCAN <NA>
## 11 pseudoYale60 Yale Pseudo60 <NA>
## 12 refGene RefSeq Genes <NA>
## 13 sgpGene SGP Genes <NA>
## 14 transcriptome Transcriptome <NA>
## 15 vegaPseudoGene Vega Pseudogenes Vega Genes
## 16 vegaGene Vega Protein Genes Vega Genes
## 17 xenoRefGene Other RefSeq <NA>
mm9KG_txdb <- makeTxDbFromUCSC(genome="mm9", tablename="knownGene")
## Download the knownGene table ... OK
## Download the knownToLocusLink table ... OK
## Extract the 'transcripts' data frame ... OK
## Extract the 'splicings' data frame ... OK
## Download and preprocess the 'chrominfo' data frame ... OK
## Prepare the 'metadata' data frame ... OK
## Make the TxDb object ...
## Warning in .makeTxDb_normarg_chrominfo(chrominfo): genome version information
## is not available for this TxDb object
## OK
mm9KG_txdb
## TxDb object:
## # Db type: TxDb
## # Supporting package: GenomicFeatures
## # Data source: UCSC
## # Genome: mm9
## # Organism: Mus musculus
## # Taxonomy ID: 10090
## # UCSC Table: knownGene
## # UCSC Track: UCSC Genes
## # Resource URL: https://genome.ucsc.edu/
## # Type of Gene ID: Entrez Gene ID
## # Full dataset: yes
## # miRBase build ID: NA
## # Nb of transcripts: 55419
## # Db created by: txdbmaker package from Bioconductor
## # Creation time: 2025-06-23 23:39:55 -0400 (Mon, 23 Jun 2025)
## # txdbmaker version at creation time: 1.4.2
## # RSQLite version at creation time: 2.4.1
## # DBSCHEMAVERSION: 1.2
See ?makeTxDbFromUCSC for more information.
makeTxDbFromBiomartRetrieve data from BioMart by specifying the mart and the data set to
the makeTxDbFromBiomart function (not all BioMart
data sets are currently supported):
mmusculusEnsembl <- makeTxDbFromBiomart(dataset="mmusculus_gene_ensembl")
As with the makeTxDbFromUCSC function, the
makeTxDbFromBiomart function also has a
circ_seqs argument that will default to using the contents
of the DEFAULT_CIRC_SEQS vector. And just like those UCSC
sources, there is also a helper function called
getChromInfoFromBiomart that can show what the different
chromosomes are called for a given source.
Using the makeTxDbFromBiomart
makeTxDbFromUCSC functions can take a while and
may also require some bandwidth as these methods have to download and
then assemble a database from their respective sources. It is not
expected that most users will want to do this step every time.
Instead, we suggest that you save your annotation objects and label
them with an appropriate time stamp so as to facilitate reproducible
research.
See ?makeTxDbFromBiomart for more information.
makeTxDbFromEnsemblThe makeTxDbFromEnsembl function creates a TxDb object
for a given organism by importing the genomic locations of its transcripts,
exons, CDS, and genes from an Ensembl database.
See ?makeTxDbFromEnsembl for more information.
makeTxDbFromGFFYou can also extract transcript information from either GFF3 or GTF
files by using the makeTxDbFromGFF function.
Usage is similar to makeTxDbFromBiomart and
makeTxDbFromUCSC.
See ?makeTxDbFromGFF for more information.
TxDb ObjectOnce a TxDb object has been created, it can be saved
to avoid the time and bandwidth costs of recreating it and to make it
possible to reproduce results with identical genomic feature data at a
later date. Since TxDb objects are backed by a
SQLite database, the save format is a SQLite database file (which
could be accessed from programs other than R if desired). Note that
it is not possible to serialize a TxDb object using
R’s save function.
saveDb(mm9KG_txdb, file="mm9KG_txdb.sqlite")
And as was mentioned earlier, a saved TxDb object can
be initialized from a .sqlite file by simply using loadDb.
mm9KG_txdb <- loadDb("mm9KG_txdb.sqlite")
makeTxDbPackageFromUCSC and makeTxDbPackageFromBiomartIt is often much more convenient to just make an annotation package
out of your annotations. If you are finding that this is the case,
then you should consider the convenience functions:
makeTxDbPackageFromUCSC and
makeTxDbPackageFromBiomart. These functions are similar
to makeTxDbFromUCSC and
makeTxDbFromBiomart except that they will take the
extra step of actually wrapping the database up into an annotation
package for you. This package can then be installed and used as of
the standard TxDb packages found on in the Bioconductor
repository.
## R version 4.5.1 (2025-06-13 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows Server 2022 x64 (build 20348)
##
## Matrix products: default
## LAPACK version 3.12.1
##
## locale:
## [1] LC_COLLATE=C
## [2] LC_CTYPE=English_United States.utf8
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.utf8
##
## time zone: America/New_York
## tzcode source: internal
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] txdbmaker_1.4.2 GenomicFeatures_1.60.0 AnnotationDbi_1.70.0
## [4] Biobase_2.68.0 GenomicRanges_1.60.0 GenomeInfoDb_1.44.0
## [7] IRanges_2.42.0 S4Vectors_0.46.0 BiocGenerics_0.54.0
## [10] generics_0.1.4 BiocStyle_2.36.0
##
## loaded via a namespace (and not attached):
## [1] tidyselect_1.2.1 dplyr_1.1.4
## [3] blob_1.2.4 filelock_1.0.3
## [5] Biostrings_2.76.0 bitops_1.0-9
## [7] fastmap_1.2.0 RCurl_1.98-1.17
## [9] BiocFileCache_2.16.0 GenomicAlignments_1.44.0
## [11] XML_3.99-0.18 digest_0.6.37
## [13] timechange_0.3.0 lifecycle_1.0.4
## [15] KEGGREST_1.48.1 RSQLite_2.4.1
## [17] magrittr_2.0.3 compiler_4.5.1
## [19] rlang_1.1.6 sass_0.4.10
## [21] progress_1.2.3 tools_4.5.1
## [23] yaml_2.3.10 rtracklayer_1.68.0
## [25] knitr_1.50 prettyunits_1.2.0
## [27] S4Arrays_1.8.1 bit_4.6.0
## [29] curl_6.4.0 DelayedArray_0.34.1
## [31] xml2_1.3.8 abind_1.4-8
## [33] BiocParallel_1.42.1 grid_4.5.1
## [35] biomaRt_2.64.0 SummarizedExperiment_1.38.1
## [37] cli_3.6.5 rmarkdown_2.29
## [39] crayon_1.5.3 httr_1.4.7
## [41] rjson_0.2.23 DBI_1.2.3
## [43] cachem_1.1.0 stringr_1.5.1
## [45] parallel_4.5.1 BiocManager_1.30.26
## [47] XVector_0.48.0 restfulr_0.0.15
## [49] matrixStats_1.5.0 vctrs_0.6.5
## [51] Matrix_1.7-3 jsonlite_2.0.0
## [53] bookdown_0.43 hms_1.1.3
## [55] bit64_4.6.0-1 jquerylib_0.1.4
## [57] glue_1.8.0 codetools_0.2-20
## [59] lubridate_1.9.4 stringi_1.8.7
## [61] BiocIO_1.18.0 UCSC.utils_1.4.0
## [63] tibble_3.3.0 pillar_1.10.2
## [65] rappdirs_0.3.3 htmltools_0.5.8.1
## [67] GenomeInfoDbData_1.2.14 R6_2.6.1
## [69] dbplyr_2.5.0 httr2_1.1.2
## [71] evaluate_1.0.4 lattice_0.22-7
## [73] RMariaDB_1.3.4 png_0.1-8
## [75] Rsamtools_2.24.0 memoise_2.0.1
## [77] bslib_0.9.0 SparseArray_1.8.0
## [79] xfun_0.52 MatrixGenerics_1.20.0
## [81] pkgconfig_2.0.3