chipenrich
: Gene Set Enrichment For ChIP-seq Peak DataThis document describes how to use chipenrich
to analyze results from ChIP-Seq experiments and other DNA sequencing experiments that result in a set of genomic regions. chipenrich
includes two methods that adjust for potential confounders of gene set enrichment testing (locus length and mappability of the sequence reads). The first method chipenrich
is designed for use with transcription-factor based ChIP-seq experiments and other DNA sequencing experiments with narrow genomic regions. The second method broadenrich
is designed for use with histone modification based ChIP-seq experiments and other DNA sequencing experiments with broad genomic regions.
After starting R, the package should be loaded using the following:
library(chipenrich)
This will load chipenrich
, the chipenrich.data
package, and necessary dependencies. The main function for conducting all gene set enrichment testing is chipenrich()
, whose defaults are:
chipenrich(peaks, out_name = "chipenrich", out_path = getwd(),
genome = "hg19", genesets = c("GOBP", "GOCC", "GOMF"),
locusdef = "nearest_tss", method = "chipenrich",
fisher_alt = "two.sided", use_mappability = F, mappa_file = NULL,
read_length = 36, qc_plots = T, max_geneset_size = 2000,
num_peak_threshold = 1, n_cores = 1)
The peaks
option should be either a data frame or character vector representing the path to a file containing the peaks. The file (or data frame) should have at least 3 columns: chrom
, start
and end
, denoting the chromosome, start position, and end position of the peaks. Chromosome should be in UCSC format, e.g. chrX, chrY, chr22, etc. If a file, it must be tab-delimited, and the header must exist. The input file may also be a .bed, .broadPeak, or .narrowPeak file. Additional columns can exist, so long as they do not contain tab characters. Two example datasets, peaks_E2F4
and peaks_H3K4me3_GM12878
, are included in the package.
data(peaks_E2F4, package = 'chipenrich.data')
data(peaks_H3K4me3_GM12878, package = 'chipenrich.data')
head(peaks_E2F4)
## chrom start end
## 1 chr1 156186314 156186469
## 2 chr1 10490456 10490550
## 3 chr1 46713352 46713436
## 4 chr1 226496843 226496924
## 5 chr1 200589825 200589928
## 6 chr1 47779789 47779907
The first task of chipenrich()
is to assign the peaks to genes. Currently supported genomes are listed below, with supported_genomes()
. Data from older genome versions may be converted using UCSC’s liftover tool: http://genome.ucsc.edu/cgi-bin/hgLiftOver.
supported_genomes()
## [1] "dm3" "hg19" "mm10" "mm9" "rn4"
Peaks are assigned to genes according to a pre-defined locus definition, i.e. the region where peaks have to occur in order to be assigned to a gene. The following locus definitions are supported in chipenrich
:
supported_locusdefs()
## genome locusdef
## 1 dm3 10kb
## 2 dm3 10kb_and_more_upstream
## 3 dm3 1kb
## 4 dm3 5kb
## 5 dm3 exon
## 6 dm3 intron
## 7 dm3 nearest_gene
## 8 dm3 nearest_tss
## 9 hg19 10kb
## 10 hg19 10kb_and_more_upstream
## 11 hg19 1kb
## 12 hg19 5kb
## 13 hg19 exon
## 14 hg19 intron
## 15 hg19 nearest_gene
## 16 hg19 nearest_tss
## 17 mm10 10kb
## 18 mm10 10kb_and_more_upstream
## 19 mm10 1kb
## 20 mm10 5kb
## 21 mm10 exon
## 22 mm10 intron
## 23 mm10 nearest_gene
## 24 mm10 nearest_tss
## 25 mm9 10kb
## 26 mm9 10kb_and_more_upstream
## 27 mm9 1kb
## 28 mm9 5kb
## 29 mm9 exon
## 30 mm9 intron
## 31 mm9 nearest_gene
## 32 mm9 nearest_tss
## 33 rn4 10kb
## 34 rn4 10kb_and_more_upstream
## 35 rn4 1kb
## 36 rn4 5kb
## 37 rn4 exon
## 38 rn4 intron
## 39 rn4 nearest_gene
## 40 rn4 nearest_tss
Using the options 1kb
, 5kb
, or 10kb
will only assign peaks to genes if the peaks are within 1 kilobases (kb), 5kb, or 10kb of a gene’s transcription start site (TSS), respectively. The option exon
or intron
will assign peaks to genes if the peaks occur within a gene’s exons or introns, respectively. The option 10kb_and_more_upstream
will assign peaks to genes if the peaks occur in a region more than 10kb upstream from a TSS to the midpoint between the adjacent TSS. Using nearest_gene
or nearest_tss
will assign peaks to genes according to the nearest gene or the nearest TSS. Only the nearest_gene
and nearest_tss
locus definitions retain all peaks, others use only a subset of peaks that fall within the defined region. All gene loci are non-overlapping. The command help(chipenrich.data)
may also provide more information on the locus definitions. Users may also create their own custom locus definitions.
The default gene set database is Gene Ontology (GO) terms, comprising of all three GO branches (GOBP, GOCC, and GOMF). Though, many more genesets are supported by chipenrich
:
supported_genesets()
## geneset organism
## 1 GOBP dme
## 5 GOCC dme
## 9 GOMF dme
## 2 GOBP hsa
## 6 GOCC hsa
## 10 GOMF hsa
## 13 biocarta_pathway hsa
## 16 cytoband hsa
## 17 drug_bank hsa
## 20 ehmn_pathway_gene hsa
## 23 gene_expression hsa
## 25 kegg_pathway hsa
## 28 mesh hsa
## 31 metabolite hsa
## 34 mirbase hsa
## 37 panther_pathway hsa
## 40 pfam hsa
## 43 protein_interaction_mimi hsa
## 46 reactome hsa
## 49 transcription_factors hsa
## 3 GOBP mmu
## 7 GOCC mmu
## 11 GOMF mmu
## 14 biocarta_pathway mmu
## 18 drug_bank mmu
## 21 ehmn_pathway_gene mmu
## 24 gene_expression mmu
## 26 kegg_pathway mmu
## 29 mesh mmu
## 32 metabolite mmu
## 35 mirbase mmu
## 38 panther_pathway mmu
## 41 pfam mmu
## 44 protein_interaction_mimi mmu
## 47 reactome mmu
## 50 transcription_factors mmu
## 4 GOBP rno
## 8 GOCC rno
## 12 GOMF rno
## 15 biocarta_pathway rno
## 19 drug_bank rno
## 22 ehmn_pathway_gene rno
## 27 kegg_pathway rno
## 30 mesh rno
## 33 metabolite rno
## 36 mirbase rno
## 39 panther_pathway rno
## 42 pfam rno
## 45 protein_interaction_mimi rno
## 48 reactome rno
## 51 transcription_factors rno
Three methods for gene set enrichment testing are provided: the main ChIP-Enrich method (chipenrich
), the Broad-Enrich method (broadenrich
), and Fisher’s exact test (fet
). The chipenrich
method designed for datasets with narrow genomic regions such as transcription factor ChIP-seq peaks. The broadenrich
method is designed for datasets with broad genomic regions such as histone modification ChIP-seq peaks. Finally, the fet
method is also offered for comparison purposes and/or for use in limited situations when its assumptions are met (see examples).
supported_methods()
## [1] "chipenrich" "fet" "broadenrich"
Accounting for mappability of reads is optional and can only be accomplished using the ChIP-Enrich or Broad-Enrich method. See the section on mappability for more information on how it is calculated. Mappabilities for the following read lengths are available (24bp is only available for hg19):
supported_read_lengths()
## genome locusdef read_length
## 1 hg19 10kb 100
## 2 hg19 10kb 24
## 3 hg19 10kb 36
## 4 hg19 10kb 40
## 5 hg19 10kb 50
## 6 hg19 10kb 75
## 7 hg19 10kb_and_more_upstream 100
## 8 hg19 10kb_and_more_upstream 24
## 9 hg19 10kb_and_more_upstream 36
## 10 hg19 10kb_and_more_upstream 40
## 11 hg19 10kb_and_more_upstream 50
## 12 hg19 10kb_and_more_upstream 75
## 13 hg19 1kb 100
## 14 hg19 1kb 24
## 15 hg19 1kb 36
## 16 hg19 1kb 40
## 17 hg19 1kb 50
## 18 hg19 1kb 75
## 19 hg19 5kb 100
## 20 hg19 5kb 24
## 21 hg19 5kb 36
## 22 hg19 5kb 40
## 23 hg19 5kb 50
## 24 hg19 5kb 75
## 25 hg19 exon 100
## 26 hg19 exon 24
## 27 hg19 exon 36
## 28 hg19 exon 40
## 29 hg19 exon 50
## 30 hg19 exon 75
## 31 hg19 intron 100
## 32 hg19 intron 24
## 33 hg19 intron 36
## 34 hg19 intron 40
## 35 hg19 intron 50
## 36 hg19 intron 75
## 37 hg19 nearest_gene 100
## 38 hg19 nearest_gene 24
## 39 hg19 nearest_gene 36
## 40 hg19 nearest_gene 40
## 41 hg19 nearest_gene 50
## 42 hg19 nearest_gene 75
## 43 hg19 nearest_tss 100
## 44 hg19 nearest_tss 24
## 45 hg19 nearest_tss 36
## 46 hg19 nearest_tss 40
## 47 hg19 nearest_tss 50
## 48 hg19 nearest_tss 75
## 49 mm9 10kb 100
## 50 mm9 10kb 36
## 51 mm9 10kb 40
## 52 mm9 10kb 50
## 53 mm9 10kb 75
## 54 mm9 10kb_and_more_upstream 100
## 55 mm9 10kb_and_more_upstream 36
## 56 mm9 10kb_and_more_upstream 40
## 57 mm9 10kb_and_more_upstream 50
## 58 mm9 10kb_and_more_upstream 75
## 59 mm9 1kb 100
## 60 mm9 1kb 36
## 61 mm9 1kb 40
## 62 mm9 1kb 50
## 63 mm9 1kb 75
## 64 mm9 5kb 100
## 65 mm9 5kb 36
## 66 mm9 5kb 40
## 67 mm9 5kb 50
## 68 mm9 5kb 75
## 69 mm9 exon 100
## 70 mm9 exon 36
## 71 mm9 exon 40
## 72 mm9 exon 50
## 73 mm9 exon 75
## 74 mm9 intron 100
## 75 mm9 intron 36
## 76 mm9 intron 40
## 77 mm9 intron 50
## 78 mm9 intron 75
## 79 mm9 nearest_gene 100
## 80 mm9 nearest_gene 36
## 81 mm9 nearest_gene 40
## 82 mm9 nearest_gene 50
## 83 mm9 nearest_gene 75
## 84 mm9 nearest_tss 100
## 85 mm9 nearest_tss 36
## 86 mm9 nearest_tss 40
## 87 mm9 nearest_tss 50
## 88 mm9 nearest_tss 75
We define a gene locus as the region from which we predict a gene could be regulated. ChIP-seq peaks, or other types of genomic regions, falling within a locus for a gene are assigned to that gene.
We provide a number of different definitions of a gene locus:
nearest_tss
: The locus is the region spanning the midpoints between the TSSs of adjacent genes.nearest_gene
: The locus is the region spanning the midpoints between the boundaries of each gene, where a gene is defined as the region between the furthest upstream TSS and furthest downstream TES for that gene. If two gene loci overlap each other, we take the midpoint of the overlap as the boundary between the two loci. When a gene locus is completely nested within another, we create a disjoint set of 3 intervals, where the outermost gene is separated into 2 intervals broken apart at the endpoints of the nested gene.1kb
: The locus is the region within 1 kb of any of the TSSs belonging to a gene. If TSSs from two adjacent genes are within 2 kb of each other, we use the midpoint between the two TSSs as the boundary for the locus for each gene.5kb
: The locus is the region within 5 kb of any of the TSSs belonging to a gene. If TSSs from two adjacent genes are within 10 kb of each other, we use the midpoint between the two TSSs as the boundary for the locus for each gene.10kb
: The locus is the region within 10 kb of any of the TSSs belonging to a gene. If TSSs from two adjacent genes are within 20 kb of each other, we use the midpoint between the two TSSs as the boundary for the locus for each gene.10kb_and_more_upstream
: The locus is the region more than 10kb upstream from a TSS to the midpoint between the adjacent TSS.exon
: Each gene has multiple loci corresponding to its exons.intron
: Each gene has multiple loci corresponding to its introns.We define base pair mappability as the average read mappability of all possible reads of size K that encompass a specific base pair location, \(b\). Mappability files from UCSC Genome Browser mappability track were used to calculate base pair mappability. The mappability track provides values for theoretical read mappability, or the number of places in the genome that could be mapped by a read that begins with the base pair location \(b\). For example, a value of 1 indicates a Kmer read beginning at \(b\) is mappable to one area in the genome. A value of 0.5 indicates a Kmer read beginning at \(b\) is mappable to two areas in the genome. For our purposes, we are only interested in uniquely mappable reads; therefore, all reads with mappability less than 1 were set to 0 to indicate non-unique mappability. Then, base pair mappability is calculated as:
\[ \begin{equation} M_{i} = (\frac{1}{2K-1}) \sum_{j=i-K+1}^{i+(K-1)} M_{j} \end{equation} \]
where \(M_{i}\) is the mappability of base pair \(i\), and \(M_{j}\) is mappability (from UCSC’s mappability track) of read \(j\) where j is the start position of the K length read. We calculated base pair mappability for reads of lengths 24, 36, 40, 50, 75, and 100 base pairs for Homo sapiens (build hg19) and for reads of lengths 36, 40, 50, 75, and 100 base pairs for Mus musculus (build mm9). Mappability is unavailable for Rattus norvegicus (build rn4) and Mus musculus (build mm10).
We define locus mappability as the average of all base pair mappability values for a gene’s locus. Locus mappability is calculated for each available locus definition.
If method = chipenrich
and qc_plots = T
then two pdf files will be output: One with a binomial smoothing spline fitted to the probability of a peak given gene length and one showing the distribution of the distance of peaks to the nearest TSS of a gene. These plots may also be generated using separate functions as illustrated below. The first figure below shows the distribution of peaks to the nearest TSS. In the second figure below, spline is fitted to the data given gene locus length. In the third figure below, we do the same but here we account for the mappable locus length (\(mappability \times locuslength\)).
plot_dist_to_tss(peaks = peaks_E2F4, genome = 'hg19')
## Constructing GRanges
## Sorting GRanges
## Reducing GRanges
plot_spline_length(peaks = peaks_E2F4, locusdef = 'nearest_tss', genome = 'hg19')
## Constructing GRanges
## Sorting GRanges
## Reducing GRanges
plot_spline_length(peaks = peaks_E2F4, locusdef = 'nearest_tss', genome = 'hg19', use_mappability = T, read_length = 24)
## Constructing GRanges
## Sorting GRanges
## Reducing GRanges
If method = broadenrich
and qc_plots = T
then one pdf file is output: A plot showing the relationship between the gene locus length and the proportion of the locus covered by a peak. Figure 4 shows this relationship.
plot_gene_coverage(peaks = peaks_H3K4me3_GM12878, locusdef = 'nearest_tss', genome = 'hg19')
## Constructing GRanges
## Sorting GRanges
## Reducing GRanges
There are many combinations of methods, genesets, and mappabiity settings that may be used to do gene set enrichment testing using chipenrich
. In the following, we include some examples of gene set enrichment testing using the peaks_E2F4
and peaks_H3K4me3_GM12878
example datasets. Note: Analysis using multiple cores (n_cores
) is not available on Windows.
# Without mappability
gs_path = system.file('extdata','vignette_genesets.txt', package='chipenrich')
results = chipenrich(peaks = peaks_E2F4, genesets = gs_path,
locusdef = "nearest_tss", qc_plots = F, out_name = NULL, n_cores = 1)
## User-specified geneset(s)...
## Reading user-specified gene set definitions: /Users/rcavalca/Projects/chipenrich/inst/extdata/vignette_genesets.txt
## Done setting up user-specified geneset..
## Reading peaks from data.frame..
## Constructing GRanges
## Sorting GRanges
## Reducing GRanges
## Assigning peaks to genes with assign_peaks(...) ..
## Successfully assigned peaks..
## Test: ChIP-Enrich
## Genesets: user-supplied
## Running tests..
results.ce = results$results
print(results.ce[1:5,1:5])
## Geneset.Type Geneset.ID Description P.value FDR
## goterm13 user-supplied GO:0031400 GO:0031400 0.1329270 0.7728310
## goterm3 user-supplied GO:0006631 GO:0006631 0.1557651 0.7728310
## goterm7 user-supplied GO:0009101 GO:0009101 0.2330154 0.7728310
## goterm8 user-supplied GO:0009314 GO:0009314 0.3834412 0.7728310
## goterm4 user-supplied GO:0007346 GO:0007346 0.4371945 0.7807044
# With mappability
results = chipenrich(peaks = peaks_E2F4, genesets = gs_path,
locusdef = "nearest_tss", use_mappability=T, read_length=24, qc_plots = F,
out_name = NULL,n_cores=1)
## User-specified geneset(s)...
## Reading user-specified gene set definitions: /Users/rcavalca/Projects/chipenrich/inst/extdata/vignette_genesets.txt
## Done setting up user-specified geneset..
## Reading peaks from data.frame..
## Constructing GRanges
## Sorting GRanges
## Reducing GRanges
## Assigning peaks to genes with assign_peaks(...) ..
## Successfully assigned peaks..
## Mappability adjustment is enabled..
## Test: ChIP-Enrich
## Genesets: user-supplied
## Running tests..
results.cem = results$results
print(results.cem[1:5,1:5])
## Geneset.Type Geneset.ID Description P.value FDR
## goterm13 user-supplied GO:0031400 GO:0031400 0.1244346 0.6353849
## goterm3 user-supplied GO:0006631 GO:0006631 0.1270770 0.6353849
## goterm7 user-supplied GO:0009101 GO:0009101 0.2233518 0.6770617
## goterm8 user-supplied GO:0009314 GO:0009314 0.3600322 0.6770617
## goterm4 user-supplied GO:0007346 GO:0007346 0.3791546 0.6770617
results = chipenrich(peaks = peaks_H3K4me3_GM12878, genesets = gs_path,
method='broadenrich', locusdef = "nearest_tss", qc_plots = F,
out_name = NULL, n_cores=1)
## User-specified geneset(s)...
## Reading user-specified gene set definitions: /Users/rcavalca/Projects/chipenrich/inst/extdata/vignette_genesets.txt
## Done setting up user-specified geneset..
## Reading peaks from data.frame..
## Constructing GRanges
## Sorting GRanges
## Reducing GRanges
## Assigning peaks to genes with assigned_peak_segments(...) ..
## Successfully assigned peaks..
## Calculating peak overlaps with gene loci..
## Test: Broad-Enrich
## Genesets: user-supplied
## Running tests..
results.be = results$results
print(results.be[1:5,1:5])
## Geneset.Type Geneset.ID Description P.value FDR
## ratio1 user-supplied GO:0002521 GO:0002521 0.05227898 0.8126158
## ratio22 user-supplied GO:0051129 GO:0051129 0.29165616 0.8126158
## ratio14 user-supplied GO:0034220 GO:0034220 0.31491786 0.8126158
## ratio12 user-supplied GO:0022604 GO:0022604 0.35426035 0.8126158
## ratio15 user-supplied GO:0034660 GO:0034660 0.38877141 0.8126158
Fisher’s Exact test assumes that each gene is equally likely to have a peak. We recommend using Fisher’s exact test with the 1kb
or 5kb
locus definitions only. This will force all genes to have approximately the same locus length and avoid returning bias results.
results = chipenrich(peaks = peaks_E2F4, genesets = gs_path, locusdef = "5kb",
method = "fet", fisher_alt = "two.sided", qc_plots = F, out_name = NULL)
## User-specified geneset(s)...
## Reading user-specified gene set definitions: /Users/rcavalca/Projects/chipenrich/inst/extdata/vignette_genesets.txt
## Done setting up user-specified geneset..
## Reading peaks from data.frame..
## Constructing GRanges
## Sorting GRanges
## Reducing GRanges
## Assigning peaks to genes with assign_peaks(...) ..
## Successfully assigned peaks..
## Test: Fisher's Exact Test
## Genesets: user-supplied
## Running tests..
results.fet = results$results
print(results.fet[1:5,1:5])
## Geneset.Type Geneset.ID Description P.value FDR
## 14 user-supplied GO:0031400 GO:0031400 0.09062939 0.5664337
## 9 user-supplied GO:0009314 GO:0009314 0.16267803 0.5982504
## 5 user-supplied GO:0007346 GO:0007346 0.16751011 0.5982504
## 8 user-supplied GO:0009101 GO:0009101 0.35941216 0.8168458
## 4 user-supplied GO:0006631 GO:0006631 0.43484863 0.8906293
The output of chipenrich()
is an R object containing the results of the test and the peak to gene assignments. Both of these are also written to text files in the working directory (unless specified otherwised) after the test is completed.
Peak assignments are stored in $peaks
. For example,
head(results$peaks)
## peak_id chr peak_start peak_end geneid gene_symbol gene_locus_start
## 1 peak:587 chr1 76190246 76190344 34 ACADM 76185042
## 2 peak:758 chr1 112050490 112050617 140 ADORA3 112041743
## 3 peak:1261 chr1 226595642 226595837 142 PARP1 226590801
## 4 peak:1355 chr1 244614895 244615100 159 ADSS 244610413
## 5 peak:1356 chr1 244615437 244615512 159 ADSS 244610413
## 6 peak:702 chr1 100315629 100315789 178 AGL 100310639
## gene_locus_end nearest_tss dist_to_tss nearest_tss_gene
## 1 76191321 76190042 252 34
## 2 112051743 112046743 -3809 140
## 3 226600801 226595801 61 142
## 4 244616544 244615436 438 159
## 5 244616544 244615436 -37 159
## 6 100315840 100315639 69 178
## nearest_tss_gene_strand
## 1 +
## 2 -
## 3 -
## 4 -
## 5 -
## 6 +
Peak information aggregated over genes is stored in $peaks_per_gene
. For example,
head(results$peaks_per_gene)
## geneid length log10_length num_peaks peak
## 604 864 30135 4.479071 5 1
## 2447 3725 10000 4.000000 5 1
## 6333 9659 70457 4.847924 5 1
## 1753 2672 13075 4.116442 4 1
## 2254 3399 10343 4.014647 4 1
## 4971 7398 10407 4.017326 4 1
Gene set enrichment results are stored in $results
. For example,
head(results$results)
## Geneset.Type Geneset.ID Description P.value FDR Odds.Ratio
## 14 user-supplied GO:0031400 GO:0031400 0.09062939 0.5664337 1.585073
## 9 user-supplied GO:0009314 GO:0009314 0.16267803 0.5982504 1.446375
## 5 user-supplied GO:0007346 GO:0007346 0.16751011 0.5982504 1.475597
## 8 user-supplied GO:0009101 GO:0009101 0.35941216 0.8168458 1.279989
## 4 user-supplied GO:0006631 GO:0006631 0.43484863 0.8906293 1.244098
## 16 user-supplied GO:0034660 GO:0034660 0.87556302 1.0000000 1.048473
## Status N.Geneset.Genes N.Geneset.Peak.Genes
## 14 enriched 297 18
## 9 enriched 284 16
## 5 enriched 299 17
## 8 enriched 295 15
## 4 enriched 282 14
## 16 enriched 281 12
## Geneset.Peak.Genes
## 14 3725, 1647, 2810, 7161, 991, 1855, 2629, 5590, 5664, 5686, 5690, 6497, 8613, 10459, 54206, 64853, 116496, 374969
## 9 3725, 7398, 777, 1647, 7161, 10277, 1263, 4893, 7159, 7391, 8438, 23596, 25896, 51150, 51514, 261734
## 5 1104, 7161, 63967, 64326, 991, 1063, 1263, 4582, 4751, 5586, 5686, 5690, 6886, 7175, 10459, 54998, 79577
## 8 10905, 6487, 142, 1650, 4582, 6646, 6675, 8704, 9653, 22796, 23169, 29929, 55624, 79947, 148789
## 4 9261, 34, 1376, 2475, 5565, 5743, 5825, 6051, 8560, 11000, 11332, 51094, 55268, 64834
## 16 5394, 6202, 7175, 23318, 25896, 25973, 55157, 55699, 79707, 113802, 116461, 126789
R.P. Welch^, C. Lee^, R.A. Smith, P. Imbriano, S. Patil, T. Weymouth, L.J. Scott, M.A. Sartor. “ChIP-Enrich: gene set enrichment testing for ChIP-seq data.” Nucl. Acids Res. (2014) 42(13):e105. doi:10.1093/nar/gku463
R.G. Cavalcante, C. Lee, R.P. Welch, S. Patil, T. Weymouth, L.J. Scott, and M.A. Sartor. “Broad-Enrich: functional interpretation of large sets of broad genomic regions.” Bioinformatics (2014) 30(17):i393-i400 doi:10.1093/bioinformatics/btu444