BiocCheck
SummaryBiocCheck
BiocCheck
be runBiocCheck
BiocCheck
outputBiocCheckGitClone
BiocCheckGitClone
BiocCheckGitClone
BiocCheckGitClone
outputBiocCheck
BiocCheck
Summarylibrary(BiocCheck)
BiocCheck
encapsulates Bioconductor package guidelines and best
practices, analyzing packages and reporting three categories of
issues:
BiocCheck
will continue past an ERROR
, thus it is
possible to have more than one, but it will exit with an error code
if run from the OS command line.)BiocCheck
BiocCheck
is meant to run within R on a directory containing an R package, or
a source tarball (.tar.gz
file):
BiocCheck("<packageDirOrTarball>")
BiocCheck
takes options which can be seen with ?BioCheck
.
Note that the --new-package
option is turned on in the Single Package
Builder (SPB) during the new package submission process.
BiocCheck
be runBiocCheck
should always be run after R CMD check
.
Note that BiocCheck
is not a replacement for R CMD check
; it is
complementary. It should be run after R CMD check
completes
successfully.
BiocCheck
can also be run via
GitHub Actions,
a continuous integration system on GitHub. This service allows automatic
testing of R packages in a controlled build environment.
See the biocthis package for more details.
BiocCheck
BiocCheck
should be installed as follows:
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("BiocCheck")
BiocCheck
outputActual BiocCheck
output is shown below in bold.
Checking for deprecated package usage…
Can be disabled with --no-check-deprecated
.
At present, this looks to see whether your package has a dependency on
the multicore
package (ERROR
).
Our recommendation is to use BiocParallel. Note that ‘fork’ clusters do not provide any gain from parallelizing code on Windows. Socket clusters work on all operating systems.
Also checks Deprecated
Packages currently specified in release and devel
versions of Bioconductor (ERROR
).
Checking for remote package usage…
Can be disabled with --no-check-remotes
Bioconductor only allows dependencies that are hosted on CRAN or
Bioconductor. The use of Remotes:
in the DESCRIPTION to specify a unique
remote location is not allowed.
Checking for ‘LazyData: true’ usage…
For packages that include data, we recommend not including LazyData: TRUE
.
This rarely proves to be a good thing. In our experience it only slows down
the loading of packages with large data (NOTE
).
Can be disabled with --no-check-version-num
and --no-check-R-ver
.
Checking version number…
Version:
field in your DESCRIPTION
file. If it doesn’t, it usually means
you did not build the tarball with R CMD build
. (ERROR
)99
‘y’ version in the x.y.z
versioning scheme
(ERROR
). Package versions starting with a non-zero value will
get flagged with a warning. Typical new package submissions
start with a zero ‘x’ version (e.g., 0.99.*
; WARNING
). This
is only done if the --new-package
option is supplied. An ‘x’ nonzero will
only be accepted if the package was pre-released or published under such a
case.ERROR
).Depends:
field of your
DESCRIPTION
file, BiocCheck
checks to make sure that the R
version specified matches the version currently used in
Bioconductor. This helps to prevent mixing of Bioconductor release and
devel versions (esp. when R versions differ) which is a frequent source of
confusion and errors (NOTE
).For more information on package versions, see the Version Numbering HOWTO.
Can be disabled with --no-check-pkg-size
and --no-check-file-size
.
Checking package size Checks that the package size meets Bioconductor requirements. The current package size limit is 5 MB for Software packages. Experiment Data and Annotation packages are excluded from this check. This check is only run if checking a source tarball. (ERROR)
Checking individual file sizes
The current size limit for all individual files is 5 MB. Checks inspect both
package-wide files and data files found in the data
, inst/extdata
, and
data-raw
folders. (WARNING)
It may be necessary to remove large files from your Git history; see Remove Large Data Files and Clean Git Tree
These can be disabled with the --no-check-bioc-views
option, which
might be useful when checking non-Bioconductor packages (since
biocViews is a concept unique to Bioconductor).
Checking biocViews…
Can be disabled with --no-check-bioc-views
biocViews
field is present in the DESCRIPTION file
(ERROR
).ERROR
).WARNING
).WARNING
).recommendBiocViews()
function from biocViews
to
automatically suggest some biocViews for your package.More information about biocViews is available in the Using biocViews HOWTO.
The Bioconductor Build System (BBS) is our nightly build system and it has certain requirements. Packages which don’t meet these requirements can be silently skipped by BBS, so it’s important to make sure that every package meets the requirements.
Can be disabled with --no-check-bbs
Checking build system compatibility…
Checking for blank lines in DESCRIPTION…
Checks to make sure there are no blank lines in the DESCRIPTION
file (ERROR
).
Checking if DESCRIPTION is well formatted…
Checks if the DESCRIPTION can be parsed with read.dcf (ERROR
)
Checking Description: field length… Checks that the Description field in the DESCRIPTION file has a minimum
WARNING
if less than 50)WARNING
if less than 20)NOTE
if less than 3)Checking for whitespace in DESCRIPTION field names…
Checks to make sure there is no whitespace in DESCRIPTION file field
names (ERROR
).
Checking that Package field matches dir/tarball name…
Checks to make sure that Package
field of DESCRIPTION file matches
directory or tarball name (ERROR
).
Checking for Version field…
Checks to make sure a Version
field is present in the DESCRIPTION
file (ERROR
).
Checking for valid maintainer…
Checks to make sure the DESCRIPTION file has a valid Authors@R
field which
resolves to a valid Maintainer
(ERROR
).
A valid Authors@R
field consists of:
person
.cre
(creator) role.family
or given
name defined.NOTE
if not.Suggests that the maintainer provide an ORCID iD in the Authors@R
field as an argument in the person function, e.g., comment = c(ORCID = ...)
(NOTE
).
License:
in the DESCRIPTION
file does not
restrict use, e.g., to academic-use only (ERROR
). Licenses are compared to
R’s internal database provided at $R_HOME/share/licenses/license.db and
read with read.dcf
. Licenses not listed in the database or with spelling
deviations e.g., GPL-3.0
vs GPL-3
are flagged with a NOTE
. A NOTE
is
also generated if the license is not a valid SPDX license identifier (with the
exception of those already in the database file) or if the license cannot be
verified in the database. A NOTE
is also generated if the License:
field
is malformed, or the database cannot be located. We also recommend developers
to browse to the choosealicense to find a suitable license for their
package as well as the SPDX License List website.==
in the DESCRIPTION
file (ERROR
).DESCRIPTION
file to see whether recommended fields i.e., ‘URL’, ‘Date’ and
‘BugReports’ are populated (NOTE
). Date
field is checked for the format
YYYY-MM-DD
.Depends:
and Imports:
fields; if none (WARNING
)..Rbuildignore
file (ERROR
).<package_name>.BiocCheck
folder byproduct when running BiocCheck(".")
locally does not get included in the package directory (ERROR
).R CMD build
; therefore, inst/doc
folder is not needed
(ERROR
).Can be disabled with --no-check-vignettes
.
Checking vignette directory…
vignettes
directory exists (ERROR
).vignettes
directory only contains vignette sources
(.Rmd, .Rnw, .Rrst, .Rhtml, *.Rtex) (ERROR
)..Rnw
vignettes, if any found, suggest RMarkdown (.Rmd
)
vignettes instead (WARNING
)ERROR
).WARNING
)ERROR
)WARNING
)eval=FALSE
chunks is more than 50% of
the total (WARNING
).eval=FALSE
.
The majority of vignette code is expected to be evaluated (WARNING
)BiocInstaller
code (WARNING
)sessionInfo()
or session_info()
for
reproducibility (NOTE
).ERROR
).ERROR
).Checking whether vignette is built with ‘R CMD build’…
Only run when --build-output-file
is specified.
Analyzes the output of R CMD build
to see if vignettes are built.
It simply looks for a line that starts:
* creating vignettes ...
If this line is not present, it means R
has not detected that a
vignette needs to be built (ERROR
).
If you have vignette sources yet still get this message, there could be several causes:
VignetteBuilder
line in the DESCRIPTION
file.VignetteEngine
line in the vignette source.See knitr
’s package vignette page, or the
Non-Sweave vignettes section of “Writing R Extensions” for more
information.
Can be disabled with --no-check-library-calls
and --no-check-install-self
.
NOTE
)
Check for use of functions that install or update packages. This list
currently includes the use of install
, install.packages
, update.packages
or biocLite
.ERROR
)
It is not necessary to call library()
or require()
on your own
package within code in the R directory or in man page examples. In
these contexts, your package is already loaded.Can be disabled with --no-check-coding-practices
.
Checking coding practices…
Checks to see whether certain programming practices are found in the R directory.
We recommend that vapply()
be used instead of sapply()
. Problems
arise when the X
argument to sapply()
has length 0; the return
type is then a list()
rather than a vector or array. (NOTE
)
We recommend that seq_len()
or seq_along()
be used instead of
1:...
. This is because the case 1:0
creates the sequence c(1, 0)
which may be an unexpected or unwanted result (NOTE
).
Single colon typos are checked for when a user inputs ‘package:function’
instead of using double colons (‘::’) to import a function (ERROR
).
Users should not download data from external hosting platforms. This
means avoiding references to major platforms such as GitHub, GitLab,
and BitBucket. For the same reason we do not import GitHub packages,
external data can be unstable and not well maintained. Maintainers
should re-use data already available in Bioconductor or contribute
an ExperimentHub
, AnnotationHub
or similar package (ERROR
).
A package should not download files at the time of loading or attaching
i.e., using library
. Using download.file
and download
should be avoided
and when found, an ERROR
will be emitted.
paste
and paste0
function calls within signaling functions such as
message
, warning
, and stop
are redundant and should be avoided
(NOTE
). paste
calls with the collapse
argument are ignored.
When notifying users, message
should be used. When cat
and print
are
used, users will get a note saying that these should only be used in show
methods for classes (NOTE
).
message
, warn*
, and error
keywords should not be included in
signal condition functions: message
, warning
, and stop
. This is
redundant and should be avoided (NOTE
).
It is favorable to use the assignment arrow (‘<-’) over the equals assignment
(‘=’) for clarity in the code and legibility. Any use of the =
will be
flagged with a NOTE
.
New submissions should not use any .Deprecated
or .Defunct
function calls
(WARNING
). Existing packages should evolve .Deprecated
and .Defunct
functions after a Bioconductor release according to the
package guidelines.
Checking for T… Checking for F…
It is bad practice to use T
and F
for TRUE
and FALSE
. This
is because T
and F
are ordinary variables whose value can be
altered, leading to unexpected results, whereas the value of TRUE
and FALSE
cannot be changed (WARNING
).
Avoid class membership checks with class()
/ is()
and ==
/ !=
.
Developers should use is(x, 'class')
for S4 classes. (WARNING
)
Use system2()
over system()
. ‘system2’ is a more portable and
flexible interface than ‘system’.(NOTE
)
Use of set.seed()
in R code. The set.seed
should not be set in
R functions directly. The user should always have the option for
the set.seed and know when it is being invoked. (WARNING
)
Checking parsed R code in R directory, examples, vignettes…
BiocCheck
parses the code in your package’s R directory, and in
evaluated man page and vignette examples to look for various symbols,
which result in issues of varying severity.
BiocCheck
checks for direct slot access (via @
or slot()
) to
S4 objects in vignette and example code. This code should always
use accessors to interact with S4 classes. Since you may be using S4
classes (which don’t provide accessors) from another package, the
severity is only NOTE
. But if the S4 object is defined in your
package, it’s mandatory to write accessors for it and to use
them (instead of direct slot access) in all vignette and example
code (NOTE
).browser()
causes the command-line R debugger to be invoked, and
should not be used in production code (though it’s OK to wrap such
calls in a conditional that evaluates to TRUE if some debugging
option is set) (WARNING
).install()
calls is bad practice. A
separation between analysis and configuration tasks keeps code
modular and reproducible (ERROR
).<<-
is bad practice. It can over-write
user-defined symbols, and introduces non-linear paths of evaluation
that are difficult to debug (NOTE
).Sys.setenv
function (ERROR
).suppressWarnings
and suppressMessages
is
problematic as it usually indicates a larger underlying issue with
the fragility of the package codebase (NOTE
).Can be disabled with --no-check-function-len
.
Checking function lengths…
BiocCheck
prints an informative message about the length (in lines)
of your five longest functions (this includes functions in your R
directory and in evaluated man page and vignette examples).
If there are functions longer than 50 lines, BiocCheck
outputs (NOTE
).
You may want to consider breaking up long functions into smaller ones. This is
a basic refactoring technique that results in code that’s easier to
read, debug, test, reuse, and maintain.
Can be disabled with --no-check-man-doc
.
Checking man page documentation…
It can be handy to generate man page skeletons with prompt()
and/or
RStudio. These skeletons contain comments that look like this:
%% ~~ A concise (1-5 lines) description of the dataset. ~~
BiocCheck
asks you to remove such comments (NOTE
).
Every man page must have a non-empty \value
section. (WARNING
)
man page examples examples
Checking exported objects have runnable examples…
BiocCheck
looks at all man pages which document exported objects and
lists the ones that don’t contain runnable examples (either because
there is no examples
section or because its examples are tagged with
dontrun
or donttest
). Runnable examples are a key part of literate
programming and help ensure that your code does what you say it does.
ERROR
).BiocCheck
lists the missing
ones and asks you to add runnable examples to them (NOTE
).dontrun
or donttest
. Use of these functions is not
recommended and shoud be justified (NOTE
). If exception is made the
recommended usage is to use donttest over dontrun (NOTE
) as donttest
requires valid R code.Can be disabled with --no-check-news
.
Checking package NEWS…
BiocCheck
looks to see if there is a valid NEWS file either in the ‘inst’
directory or in the top-level directory of your package, and checks whether it
is properly formatted (NOTE
).
The location and format of the NEWS file must be consistent with
?news
. Meaning the file can be one of the following four options:
inst/NEWS.Rd
./NEWS.md
./NEWS
inst/NEWS
NEWS files are a good way to keep users up-to-date on changes to your
package. Excerpts from properly formatted NEWS files will be included
in Bioconductor release announcements to tell users what has changed
in your package in the last release. In order for this to happen, your
NEWS file must be formatted in a specific way; you may want to
consider using an inst/NEWS.Rd
file instead as the format is more
well-defined. Malformatted NEWS file outputs WARNING
.
More information on NEWS files is available in the help topic ?news
.
Can be disabled with --no-check-unit-tests
.
Checking unit tests…
We strongly recommend unit tests, though we do not at present require them. For more on what unit tests are, why they are helpful, and how to implement them, read our Unit Testing HOWTO.
At present we just check to see whether unit tests are present, and if not,
urge you to add them (NOTE
).
Checking skip_on_bioc() in tests…
Can be disabled with --no-check-skip-bioc-tests
.
Finds flag for skipping tests in the bioconductor environment (NOTE
)
Can be disabled with --no-check-formatting
.
Checking formatting of DESCRIPTION, NAMESPACE, man pages, R source, and vignette source…
There is no 100% correct way to format code. These checks adhere to the
Bioconductor Style Guide (NOTE
).
We think it’s important to avoid very long lines in code. Note that some text editors do not wrap text automatically, requiring horizontal scrolling in order to read it. Also note that R syntax is very flexible and whitespace can be inserted almost anywhere in an expression, making it easy to break up long lines.
These checks are run against not just R code, but the DESCRIPTION and NAMESPACE files as well as man pages and vignette source files. All of these files allow long lines to be broken up.
The output of this check includes the first 6 offending lines of code;
see more with BiocCheck:::checkFormatting("path/to/YourPackage", nlines=Inf)
.
There are several helpful packages that can be used for formatting of
R code to particular coding standards such as formatR and
styler as well as the “Reformat code” button in
RStudio Desktop. Each solution has its advantages, though
styler works with roxygen2
examples and is actively
maintained. You can re-format your code using styler as shown
below:
## Install styler if necessary
if (!requireNamespace("styler", quietly = TRUE)) {
install.packages("styler")
}
## Automatically re-format the R code in your package
styler::style_pkg(transformers = styler::tidyverse_style(indent_by = 4))
If you are
working with RStudio Desktop use also the “Reformat code”
button which will help you break long lines of code. Alternatively,
use formatR, though beware that it can break valid R code
involving both types of quotation marks ("
and '
) and does not
support re-formatting roxygen2
examples. In general,
it is best to version control your code before applying any
automatic re-formatting solutions and implement unit tests to
verify that your code runs as intended after you re-format your code.
Checking if package already exists in CRAN…
This can be disabled with the --no-check-CRAN
option. A package with the
same name (case differences are ignored) cannot exist on CRAN. Packages
submitted to Bioconductor must be removed from CRAN before the next
Bioconductor release (WARNING
).
Checking if new package already exists in Bioconductor…
Only run if the --new-package
flag is turned on. A package
with the same name (case differences are ignored) cannot exist in
Bioconductor (ERROR
).
Checking for bioc-devel mailing list subscription…
This only applies if BiocCheck
is run on the Bioconductor build
machines, because this step requires special authorization. This can be disabled
with the --no-check-bioc-help
option.
Checking for support site registration…
Check that the package maintainer is register at our
support site using the same email address that is in the
Maintainer
field of their package DESCRIPTION
file (ERROR
).
This can be disabled with the --no-check-bioc-help
option.
The main place people ask questions about Bioconductor packages is the support site. Please register and then include your package name in the list of watched tags. When a question is asked and tagged with your package name, you’ll get an email.
Package name is in support site watched tags is now a requirement.
BiocCheckGitClone
BiocCheckGitClone
provides a few additional Bioconductor package checks that
can only should be run on a open source directory (raw Git clone) NOT a
tarball. Reporting similarly in three categories as discussed above:
ERROR.
WARNING.
NOTE.
BiocCheckGitClone
BiocCheckGitClone
is meant to run within R on a directory containing an R package:
BiocCheckGitClone("packageDir")
BiocCheckGitClone
Please see previous Installing BiocCheck
section.
BiocCheckGitClone
outputActual BiocCheckGitClone
output is shown below in bold.
Checking valid files
There are a number of files that should not be Git tracked. This check notifies
if any of these files are present (ERROR
)
The current list of files checked are given by this internal constant:
BiocCheck:::.HIDDEN_FILE_EXTS
## [1] ".renviron" ".rprofile" ".rproj" ".rproj.user"
## [5] ".rhistory" ".rapp.history" ".o" ".sl"
## [9] ".so" ".dylib" ".a" ".dll"
## [13] ".def" ".ds_store" "unsrturl.bst" ".log"
## [17] ".aux" ".backups" ".cproject" ".directory"
## [21] ".dropbox" ".exrc" ".gdb.history" ".gitattributes"
## [25] ".gitmodules" ".hgtags" ".project" ".seed"
## [29] ".settings" ".tm_properties" ".rdata"
These files may be included in your personal directories but should be added to
a .gitignore
file so they are not Git tracked.
Checking DESCRIPTION
Default R CMD build behavior will format the DESCRIPTION file; After this occurs, it is hard to determine certain aspects of the original DESCRIPTION file. An example would be how the Authors and Maintainers are specified. The DESCRIPTION file is therefore checked in its raw original form.
Checking if DESCRIPTION is well formatted
The DESCRIPTION file must be properly formatted and able to be read in with
read.dcf()
in order to function properly on the Bioconductor build
machines. This check attempts to read.dcf("DESCRIPTION")
and throws an ERROR
if mal-formatted. (ERROR
)
Checking for valid maintainer
While in the past using the Author and Maintainer fields were acceptable, R
has moved towards using the Authors@R
standard for listing package
contributors. This checks that Authors@R is utilized and that there are no
instances of Author or Maintainer in the DESCRIPTION (ERROR
)
Checking that CITATION file is correctly formatted
BiocCheck
tries to read the provided CITATION
file (i.e. not the one
automatically generated by each package) with readCitationFile()
- this is
expected to be in the INST
folder (NOTE
). readCitationFile()
needs to work
properly without the package being installed. Most common causes of failure
occur when trying to use helper functions like packageVersion()
or
packageDate()
instead of using meta$Version
or meta$Date
. See
R documentation
for more information.
Here is an example of a formatted CITATION
file. See the GenomicRanges
package CITATION
file for details.
library(utils)
readCitationFile(
system.file("CITATION", package = "GenomicRanges")
)
## Lawrence M, Huber W, Pag\`es H, Aboyoun P, Carlson M, et al. (2013)
## Software for Computing and Annotating Genomic Ranges. PLoS Comput Biol
## 9(8): e1003118. doi:10.1371/journal.pcbi.1003118
##
## A BibTeX entry for LaTeX users is
##
## @Article{,
## title = {Software for Computing and Annotating Genomic Ranges},
## author = {Michael Lawrence and Wolfgang Huber and Herv\'e Pag\`es and Patrick Aboyoun and Marc Carlson and Robert Gentleman and Martin Morgan and Vincent Carey},
## year = {2013},
## journal = {{PLoS} Computational Biology},
## volume = {9},
## issue = {8},
## doi = {10.1371/journal.pcbi.1003118},
## url = {http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1003118},
## }
CITATION
files are expected to contain a doi
input within the bibentry()
function call. When a doi
input is not present, a WARNING
is emitted as most
modern publications should have an assigned DOI.
Note that citEntry()
should be updated to bibentry()
as seen with
R CMD check --as-cran
.
Bioconductor packages are not required to have a CITATION
file but it is
useful both for users and for tracking Bioconductor project-wide metrics.
Maintainers should update the CITATION
file once a preprint or publication is
released. Packages that do not have a CITATION
file are flagged with a NOTE
.
BiocCheck
We make an effort to reduce package reviewer burden and to increase the quality
of Bioconductor submissions via automated checks; therefore, BiocCheck
is a
continually evolving package. Contributions are certainly most welcome. Consider
opening a pull request on GitHub with unit tests and updates to both the NEWS
file and vignette. Thank you for being part of the community!
sessionInfo()
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.4 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.20-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] BiocCheck_1.41.17 BiocStyle_2.33.1
##
## loaded via a namespace (and not attached):
## [1] rappdirs_0.3.3 sass_0.4.9 utf8_1.2.4
## [4] generics_0.1.3 bitops_1.0-8 RSQLite_2.3.7
## [7] digest_0.6.37 magrittr_2.0.3 evaluate_0.24.0
## [10] bookdown_0.40 fastmap_1.2.0 blob_1.2.4
## [13] jsonlite_1.8.8 graph_1.83.0 DBI_1.2.3
## [16] BiocManager_1.30.25 httr_1.4.7 fansi_1.0.6
## [19] stringdist_0.9.12 XML_3.99-0.17 httr2_1.0.3
## [22] codetools_0.2-20 jquerylib_0.1.4 cli_3.6.3
## [25] rlang_1.1.4 dbplyr_2.5.0 Biobase_2.65.1
## [28] bit64_4.0.5 cachem_1.1.0 yaml_2.3.10
## [31] BiocBaseUtils_1.7.3 parallel_4.4.1 tools_4.4.1
## [34] memoise_2.0.1 dplyr_1.1.4 filelock_1.0.3
## [37] BiocGenerics_0.51.1 RUnit_0.4.33 curl_5.2.2
## [40] vctrs_0.6.5 R6_2.5.1 stats4_4.4.1
## [43] biocViews_1.73.1 BiocFileCache_2.13.0 lifecycle_1.0.4
## [46] RBGL_1.81.0 bit_4.0.5 pkgconfig_2.0.3
## [49] bslib_0.8.0 pillar_1.9.0 glue_1.7.0
## [52] xfun_0.47 tibble_3.2.1 tidyselect_1.2.1
## [55] knitr_1.48 htmltools_0.5.8.1 rmarkdown_2.28
## [58] compiler_4.4.1 RCurl_1.98-1.16