Identify Different Architectures of Sequence Elements
Bioconductor version: Development (3.21)
seqArchR enables unsupervised discovery of _de novo_ clusters with characteristic sequence architectures characterized by position-specific motifs or composition of stretches of nucleotides, e.g., CG-richness. seqArchR does _not_ require any specifications w.r.t. the number of clusters, the length of any individual motifs, or the distance between motifs if and when they occur in pairs/groups; it directly detects them from the data. seqArchR uses non-negative matrix factorization (NMF) as its backbone, and employs a chunking-based iterative procedure that enables processing of large sequence collections efficiently. Wrapper functions are provided for visualizing cluster architectures as sequence logos.
To install this package, start R (version "4.5") and enter:
if (!require("BiocManager", quietly = TRUE))
# The following initializes usage of Bioc devel
Example usage of _seqArchR_ on simulated DNA sequences | HTML | R Script |
Reference Manual | ||
NEWS | Text | |
LICENSE | Text |
biocViews | Clustering, DNASeq, DimensionReduction, FeatureExtraction, GeneRegulation, Genetics, MathematicalBiology, MotifDiscovery, Software, SystemsBiology, Transcriptomics |
Version | 1.11.0 |
In Bioconductor since | BioC 3.15 (R-4.2) (3 years) |
License | GPL-3 | file LICENSE |
Depends | R (>= 4.2.0) |
Imports | utils, graphics, cvTools (>= 0.3.2), MASS, Matrix, methods, stats, cluster, matrixStats, fpc, cli, prettyunits, reshape2 (>= 1.4.3), reticulate (>= 1.22), BiocParallel, Biostrings, grDevices, ggplot2 (>= 3.1.1), ggseqlogo (>= 0.1) |
System Requirements | Python (>= 3.5), scikit-learn (>= 0.21.2), packaging |
URL | |
Bug Reports | |
Suggests | cowplot, hopach(>= 2.42.0), BiocStyle, knitr (>= 1.22), rmarkdown (>= 1.12), testthat (>= 3.0.2), covr, vdiffr (>= 0.3.0) |
