Introduction

When performing statistical analysis on any set of genomic ranges it is often important to compare focal sets to null sets that are carefully matched for possible covariates that may influence the analysis. To address this need, the nullranges package implements matchRanges(), an efficient and convenient tool for selecting a covariate-matched set of null hypothesis ranges from a pool of background ranges within the Bioconductor framework.

In this vignette, we provide an overview of matchRanges() and its associated functions. We start with a simulated example generated with the utility function makeExampleMatchedDataSet(). We also provide an overview of the class struture and a guide for choosing among the supported matching methods. To see matchRanges() used in real biological examples, visit the Case study I: CTCF occupancy, and Case study II: CTCF orientation vignettes.

For a description of the method, see Davis et al. (2023).

Terminology

matchRanges references four sets of data: focal, pool, matched and unmatched. The focal set contains the outcome of interest (Y=1) while the pool set contains all other observations (Y=0). matchRanges generates the matched set, which is a subset of the pool that is matched for provided covariates (i.e. covar) but does not contain the outcome of interest (i.e Y=0). Finally, the unmatched set contains the remaining unselected elements from the pool. The diagram below depicts the relationships between the four sets.