Bioconductor has a rich ecosystem of metadata around packages, usage, and build status. This package is a simple collection of functions to access that metadata from R in a tidy data format. The goal is to expose metadata for data mining and value-added functionality such as package searching, text mining, and analytics on packages.
BiocPkgTools 1.24.0
Bioconductor has a rich ecosystem of metadata around packages, usage, and build status. This package is a simple collection of functions to access that metadata from R in a tidy data format. The goal is to expose metadata for data mining and value-added functionality such as package searching, text mining, and analytics on packages.
Functionality includes access to :
The Bioconductor build reports are available online as HTML pages.
However, they are not very computable.
The biocBuildReport
function does some heroic parsing of the HTML
to produce a tidy data.frame for further processing in R.
library(BiocPkgTools)
head(biocBuildReport())
## # A tibble: 6 × 12
## pkg author version git_last_commit git_last_commit_date pkgType Deprecated
## <chr> <chr> <chr> <chr> <dttm> <chr> <lgl>
## 1 ABSSeq Wentao… 1.59.0 dc09894 2024-04-30 10:35:52 bioc FALSE
## 2 ABSSeq Wentao… 1.59.0 dc09894 2024-04-30 10:35:52 bioc FALSE
## 3 ABSSeq Wentao… 1.59.0 dc09894 2024-04-30 10:35:52 bioc FALSE
## 4 ABSSeq Wentao… 1.59.0 dc09894 2024-04-30 10:35:52 bioc FALSE
## 5 ABSSeq Wentao… 1.59.0 dc09894 2024-04-30 10:35:52 bioc FALSE
## 6 ABSSeq Wentao… 1.59.0 dc09894 2024-04-30 10:35:52 bioc FALSE
## # ℹ 5 more variables: PackageStatus <chr>, node <chr>, stage <chr>,
## # result <chr>, bioc_version <chr>
Because developers may be interested in a quick view of their own packages,
there is a simple function, problemPage
, to produce an HTML report of the
build status of packages matching a given author regex supplied to the
authorPattern
argument. The default is to report only “problem” build statuses
(ERROR, WARNING).
problemPage(authorPattern = "V.*Carey")
In similar fashion, maintainers of packages that have many downstream packages
that depend on them may wish to check that a change they introduced hasn’t
suddenly broken a large number of these. You can use the dependsOn
argument
to produce the summary report of those packages that “depend on” the given
package.
problemPage(dependsOn = "limma")
When run in an interactive environment, the problemPage
function
will open a browser window for user interaction. Note that if you want
to include all your package results, not just the broken ones, simply
specify includeOK = TRUE
.
Bioconductor supplies download stats for all packages. The biocDownloadStats
function grabs all available download stats for all packages in all
Experiment Data, Annotation Data, and Software packages. The results
are returned as a tidy data.frame for further analysis.
head(biocDownloadStats())
## # A tibble: 6 × 7
## pkgType Package Year Month Nb_of_distinct_IPs Nb_of_downloads Date
## <chr> <chr> <int> <chr> <int> <int> <date>
## 1 software a4 2024 Jan 75 320 2024-01-01
## 2 software a4 2024 Feb 85 245 2024-02-01
## 3 software a4 2024 Mar 156 296 2024-03-01
## 4 software a4 2024 Apr 247 577 2024-04-01
## 5 software a4 2024 May 108 510 2024-05-01
## 6 software a4 2024 Jun 79 811 2024-06-01
The download statistics reported are for all available versions of a
package. There are no separate, publicly available statistics broken down by
version.
The majority of Bioconductor Software packages are also available through other
channels such as Anaconda, who also provided download statistics for packages
installed from their repositories. Access to these counts is provided by the
anacondaDownloadStats
function:
head(anacondaDownloadStats())
## # A tibble: 6 × 7
## Package Year Month Nb_of_distinct_IPs Nb_of_downloads repo Date
## <chr> <chr> <chr> <int> <dbl> <chr> <date>
## 1 ABAData 2018 Apr NA 8 Anaconda 2018-04-01
## 2 ABAData 2018 Aug NA 5 Anaconda 2018-08-01
## 3 ABAData 2018 Dec NA 133 Anaconda 2018-12-01
## 4 ABAData 2018 Jul NA 6 Anaconda 2018-07-01
## 5 ABAData 2018 Jun NA 18 Anaconda 2018-06-01
## 6 ABAData 2018 Mar NA 13 Anaconda 2018-03-01
Note that Anaconda do not provide counts for distinct IP addresses, but this column is included for compatibility with the Bioconductor count tables.
The R DESCRIPTION
file contains a plethora of information regarding package
authors, dependencies, versions, etc. In a repository such as Bioconductor,
these details are available in bulk for all included packages. The biocPkgList
returns a data.frame with a row for each package. Tons of information are
available, as evidenced by the column names of the results.
bpi = biocPkgList()
colnames(bpi)
## [1] "Package" "Version"
## [3] "Depends" "Imports"
## [5] "Suggests" "License"
## [7] "MD5sum" "NeedsCompilation"
## [9] "Title" "Description"
## [11] "biocViews" "Author"
## [13] "Maintainer" "git_url"
## [15] "git_branch" "git_last_commit"
## [17] "git_last_commit_date" "Date/Publication"
## [19] "source.ver" "win.binary.ver"
## [21] "mac.binary.big-sur-x86_64.ver" "mac.binary.big-sur-arm64.ver"
## [23] "vignettes" "vignetteTitles"
## [25] "hasREADME" "hasNEWS"
## [27] "hasINSTALL" "hasLICENSE"
## [29] "Rfiles" "importsMe"
## [31] "dependencyCount" "URL"
## [33] "VignetteBuilder" "Archs"
## [35] "suggestsMe" "LinkingTo"
## [37] "SystemRequirements" "dependsOnMe"
## [39] "BugReports" "Enhances"
## [41] "Video" "linksToMe"
## [43] "License_restricts_use" "PackageStatus"
## [45] "License_is_FOSS" "OS_type"
## [47] "organism"
Some of the variables are parsed to produce list
columns.
head(bpi)
## # A tibble: 6 × 47
## Package Version Depends Imports Suggests License MD5sum NeedsCompilation Title
## <chr> <chr> <list> <list> <list> <chr> <chr> <chr> <chr>
## 1 ABSSeq 1.59.0 <chr> <chr> <chr> GPL (>… ead1e… no "ABS…
## 2 ABarray 1.73.0 <chr> <chr> <chr> GPL 5b457… no "Mic…
## 3 ACE 1.23.0 <chr> <chr> <chr> GPL-2 a3d57… no "Abs…
## 4 ACME 2.61.0 <chr> <chr> <chr> GPL (>… b239c… yes "Alg…
## 5 ADAM 1.21.0 <chr> <chr> <chr> GPL (>… 10bcb… yes "ADA…
## 6 ADAMgui 1.21.0 <chr> <chr> <chr> GPL (>… 3378f… no "Act…
## # ℹ 38 more variables: Description <chr>, biocViews <list>, Author <list>,
## # Maintainer <list>, git_url <chr>, git_branch <chr>, git_last_commit <chr>,
## # git_last_commit_date <chr>, `Date/Publication` <chr>, source.ver <chr>,
## # win.binary.ver <chr>, `mac.binary.big-sur-x86_64.ver` <chr>,
## # `mac.binary.big-sur-arm64.ver` <chr>, vignettes <list>,
## # vignetteTitles <list>, hasREADME <chr>, hasNEWS <chr>, hasINSTALL <chr>,
## # hasLICENSE <chr>, Rfiles <list>, importsMe <list>, dependencyCount <chr>, …
As a simple example of how these columns can be used, extracting
the importsMe
column to find the packages that import the
GEOquery package.
require(dplyr)
bpi = biocPkgList()
bpi %>%
filter(Package=="GEOquery") %>%
pull(importsMe) %>%
unlist()
## [1] NA
For the end user of Bioconductor, an analysis often starts with finding a
package or set of packages that perform required tasks or are tailored
to a specific operation or data type. The biocExplore()
function
implements an interactive bubble visualization with filtering based on
biocViews terms. Bubbles are sized based on download statistics. Tooltip
and detail-on-click capabilities are included. To start a local session:
biocExplore()