1 Introduction

As transcription factors (TFs) play a crucial role in regulating the transcription process through binding on the genome alone or in a combinatorial manner, TF enrichment analysis is an efficient and important procedure to locate the candidate functional TFs from a set of experimentally defined regulatory regions. While it is commonly accepted that structurally related TFs may have similar binding preference to sequences (i.e. motifs) and one TF may have multiple motifs, TF enrichment analysis is much more challenging than motif enrichment analysis. Here we present a R package for TF enrichment analysis which combine motif enrichment with the PECA model.

2 Quick Start

2.1 Download and Installation

The package enrichTF is part of Bioconductor project starting from Bioc 3.9 built on R 3.6. To install the latest version of enrichTF, please check your current Bioconductor version and R version first. The latest version of R is recommended, and then you can download and install enrichTF and all its dependencies as follows:

if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
BiocManager::install("enrichTF")

2.2 Loading

Similar with other R packages, please load enrichTF each time before using the package.

library(enrichTF)

2.3 Running with the default configuration

It is quite convenient to run the default pipeline.

Users only need to provide a region list in BED format which contains 3 columns (chr, start, end). It could be the peak calling result of sequencing data like ATAC-seq etc.

All required data and software will be installed automatically.

# Provide your region list in BED format with 3 columns.
foregroundBedPath <- system.file(package = "enrichTF", "extdata","testregion.bed")
# Call the whole pipeline
PECA_TF_enrich(inputForegroundBed = foregroundBedPath, genome = "testgenome") # change"testgenome" to one of "hg19", "hg38", "mm9", 'mm10' ! "testgenome" is only a test example.
## Configure bsgenome ...
## 
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:parallel':
## 
##     clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
##     clusterExport, clusterMap, parApply, parCapply, parLapply,
##     parLapplyLB, parRapply, parSapply, parSapplyLB
## The following objects are masked from 'package:stats':
## 
##     IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
## 
##     Filter, Find, Map, Position, Reduce, anyDuplicated, append,
##     as.data.frame, basename, cbind, colnames, dirname, do.call,
##     duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
##     lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
##     pmin.int, rank, rbind, rownames, sapply, setdiff, sort, table,
##     tapply, union, unique, unsplit, which.max, which.min
## 
## Attaching package: 'S4Vectors'
## The following object is masked from 'package:base':
## 
##     expand.grid
## 
## Attaching package: 'Biostrings'
## The following object is masked from 'package:base':
## 
##     strsplit
## Configure bsgenome finished
## Configure motifpwm ...
## Configure motifpwm finished
## Configure motifPWMOBJ ...
## Configure motifPWMOBJ finished
## Configure RE_gene_corr ...
## Configure RE_gene_corr finished
## Configure Enhancer_RE_gene_corr ...
## Configure Enhancer_RE_gene_corr finished
## Configure MotifTFTable ...
## Configure MotifTFTable finished
## Configure MotifWeights ...
## Configure MotifWeights finished
## Configure TFgeneRelMtx ...
## Configure TFgeneRelMtx finished
## Configure SampleName ...
## Configure SampleName finished
## Configure OpenRegion ...
## Configure OpenRegion finished
## Configure ConserveRegion ...
## Configure ConserveRegion finished
## Configure HOMER ...
## Configure HOMER finished
## Configure OrgDb ...
## Loading required package: AnnotationDbi
## Loading required package: Biobase
## Welcome to Bioconductor
## 
##     Vignettes contain introductory material; view with
##     'browseVignettes()'. To cite Bioconductor, see
##     'citation("Biobase")', and for packages 'citation("pkgname")'.
## 
## Configure OrgDb finished
## >>>>>>==========================================================
## Step Name:pipe_UnzipAndMergeBed
## All Parameters for This Step:
## |Input:
## |    bedInput:
## |        "/tmp/RtmpIQNw0S/Rinst167f76ddd29e/enrichTF/extdata/testregion.bed"
## |Output:
## |    bedOutput:
## |        "/tmp/RtmpIQNw0S/Rbuild167f38ca57f8/enrichTF/vignettes/enrichTF-pipeline/Step_00_pipe_UnzipAndMergeBed/testregion.bed"
## |Other Parameters:
## __________________________________________
## Begin to check if it is finished...
## 2020-10-27 19:59:17
## New step. Start processing data:
## processing file:
## source:/tmp/RtmpIQNw0S/Rinst167f76ddd29e/enrichTF/extdata/testregion.bed
## destination:/tmp/RtmpIQNw0S/Rbuild167f38ca57f8/enrichTF/vignettes/enrichTF-pipeline/Step_00_pipe_UnzipAndMergeBed/testregion.bed
## 2020-10-27 19:59:17
## processing finished
## [1] "2020-10-27 19:59:17 EDT"
## [1] "2020-10-27 19:59:17 EDT"
## [1] "2020-10-27 19:59:17 EDT"
## [1] "2020-10-27 19:59:17 EDT"
## [1] "2020-10-27 19:59:17 EDT"
## [1] "2020-10-27 19:59:17 EDT"
## All results have been saved.
## ==========================================================<<<<<<
## 
## >>>>>>==========================================================
## Step Name:pipe_GenBackground
## All Parameters for This Step:
## |Input:
## |    inputForegroundBed:
## |        "/tmp/RtmpIQNw0S/Rbuild167f38ca57f8/enrichTF/vignettes/enrichTF-pipeline/Step_00_pipe_UnzipAndMergeBed/testregion.bed"
## |Output:
## |    outputForegroundBed:
## |        "/tmp/RtmpIQNw0S/Rbuild167f38ca57f8/enrichTF/vignettes/enrichTF-pipeline/Step_01_pipe_GenBackground/testregion.foreground.bed"
## |    outputBackgroundBed:
## |        "/tmp/RtmpIQNw0S/Rbuild167f38ca57f8/enrichTF/vignettes/enrichTF-pipeline/Step_01_pipe_GenBackground/testregion.background.bed"
## |    outputRegionBed:
## |        "/tmp/RtmpIQNw0S/Rbuild167f38ca57f8/enrichTF/vignettes/enrichTF-pipeline/Step_01_pipe_GenBackground/testregion.allregion.bed"
## |Other Parameters:
## |    bsgenome:
## |        An object of BSgenome
## |    regionLen:
## |        1000
## |    sampleNumb:
## |        0
## __________________________________________
## Begin to check if it is finished...
## 2020-10-27 19:59:17
## New step. Start processing data:
## 2020-10-27 19:59:18
## processing finished
## [1] "2020-10-27 19:59:18 EDT"
## [1] "2020-10-27 19:59:18 EDT"
## [1] "2020-10-27 19:59:18 EDT"
## [1] "2020-10-27 19:59:18 EDT"
## [1] "2020-10-27 19:59:18 EDT"
## [1] "2020-10-27 19:59:18 EDT"
## All results have been saved.
## ==========================================================<<<<<<
## 
## >>>>>>==========================================================
## Step Name:pipe_RegionConnectTargetGene
## All Parameters for This Step:
## |Input:
## |    inputForegroundBed:
## |        "/tmp/RtmpIQNw0S/Rbuild167f38ca57f8/enrichTF/vignettes/enrichTF-pipeline/Step_01_pipe_GenBackground/testregion.foreground.bed"
## |    inputBackgroundBed:
## |        "/tmp/RtmpIQNw0S/Rbuild167f38ca57f8/enrichTF/vignettes/enrichTF-pipeline/Step_01_pipe_GenBackground/testregion.background.bed"
## |Output:
## |    outputForegroundBed:
## |        "/tmp/RtmpIQNw0S/Rbuild167f38ca57f8/enrichTF/vignettes/enrichTF-pipeline/Step_02_pipe_RegionConnectTargetGene/testregion.gene.foreground.bed"
## |    ouputForgroundGeneTxt:
## |        "/tmp/RtmpIQNw0S/Rbuild167f38ca57f8/enrichTF/vignettes/enrichTF-pipeline/Step_02_pipe_RegionConnectTargetGene/testregion.gene.foreground.txt"
## |    outputBackgroundBed:
## |        "/tmp/RtmpIQNw0S/Rbuild167f38ca57f8/enrichTF/vignettes/enrichTF-pipeline/Step_02_pipe_RegionConnectTargetGene/testregion.gene.background.bed"
## |    regularGeneCorrBed:
## |        "/tmp/RtmpIQNw0S/Rbuild167f38ca57f8/enrichTF/vignettes/refdir/testgenome/RE_gene_corr.bed"
## |    enhancerRegularGeneCorrBed:
## |        "/tmp/RtmpIQNw0S/Rbuild167f38ca57f8/enrichTF/vignettes/refdir/testgenome/Enhancer_RE_gene_corr.bed"
## |Other Parameters:
## __________________________________________
## Begin to check if it is finished...
## 2020-10-27 19:59:18
## New step. Start processing data:
## 2020-10-27 19:59:19
## processing finished
## [1] "2020-10-27 19:59:19 EDT"
## [1] "2020-10-27 19:59:19 EDT"
## [1] "2020-10-27 19:59:19 EDT"
## [1] "2020-10-27 19:59:19 EDT"
## [1] "2020-10-27 19:59:19 EDT"
## [1] "2020-10-27 19:59:19 EDT"
## All results have been saved.
## ==========================================================<<<<<<
## 
## >>>>>>==========================================================
## Step Name:pipe_FindMotifsInRegions
## All Parameters for This Step:
## |Input:
## |    inputRegionBed:
## |        "/tmp/RtmpIQNw0S/Rbuild167f38ca57f8/enrichTF/vignettes/enrichTF-pipeline/Step_01_pipe_GenBackground/testregion.allregion.bed"
## |Output:
## |    outputRegionMotifBed:
## |        "/tmp/RtmpIQNw0S/Rbuild167f38ca57f8/enrichTF/vignettes/enrichTF-pipeline/Step_03_pipe_FindMotifsInRegions/testregion.region.motif.bed"
## |    outputMotifBed:
## |        "/tmp/RtmpIQNw0S/Rbuild167f38ca57f8/enrichTF/vignettes/enrichTF-pipeline/Step_03_pipe_FindMotifsInRegions/testregion.motif.bed"
## |Other Parameters:
## |    motifRc:
## |        "integrate"
## |    pwmObj:
## |        An object of PWMatrixList
## |    genome:
## |        "hg19"
## |    threads:
## |        2
## __________________________________________
## Begin to check if it is finished...
## 2020-10-27 19:59:19
## New step. Start processing data:
## 2020-10-27 19:59:21
## processing finished
## [1] "2020-10-27 19:59:21 EDT"
## [1] "2020-10-27 19:59:21 EDT"
## [1] "2020-10-27 19:59:21 EDT"
## [1] "2020-10-27 19:59:21 EDT"
## [1] "2020-10-27 19:59:21 EDT"
## [1] "2020-10-27 19:59:21 EDT"
## All results have been saved.
## ==========================================================<<<<<<
## 
## >>>>>>==========================================================
## Step Name:pipe_TFsEnrichInRegions
## All Parameters for This Step:
## |Input:
## |    inputRegionBed:
## |        "/tmp/RtmpIQNw0S/Rbuild167f38ca57f8/enrichTF/vignettes/enrichTF-pipeline/Step_01_pipe_GenBackground/testregion.allregion.bed"
## |    inputRegionMotifBed:
## |        "/tmp/RtmpIQNw0S/Rbuild167f38ca57f8/enrichTF/vignettes/enrichTF-pipeline/Step_03_pipe_FindMotifsInRegions/testregion.region.motif.bed"
## |    inputForegroundGeneBed:
## |        "/tmp/RtmpIQNw0S/Rbuild167f38ca57f8/enrichTF/vignettes/enrichTF-pipeline/Step_02_pipe_RegionConnectTargetGene/testregion.gene.foreground.bed"
## |    inputBackgroundGeneBed:
## |        "/tmp/RtmpIQNw0S/Rbuild167f38ca57f8/enrichTF/vignettes/enrichTF-pipeline/Step_02_pipe_RegionConnectTargetGene/testregion.gene.background.bed"
## |    inputMotifWeights:
## |        "/tmp/RtmpIQNw0S/Rbuild167f38ca57f8/enrichTF/vignettes/refdir/testgenome/MotifWeights.RData"
## |    inputTFgeneRelMtx:
## |        "/tmp/RtmpIQNw0S/Rbuild167f38ca57f8/enrichTF/vignettes/refdir/testgenome/TFgeneRelMtx.RData"
## |    inputMotifTFTable:
## |        "/tmp/RtmpIQNw0S/Rbuild167f38ca57f8/enrichTF/vignettes/refdir/testgenome/MotifTFTable.RData"
## |Output:
## |    outputTFsEnrichTxt:
## |        "/tmp/RtmpIQNw0S/Rbuild167f38ca57f8/enrichTF/vignettes/enrichTF-pipeline/Step_04_pipe_TFsEnrichInRegions/testregion.PECA_TF_enrich.txt"
## |Other Parameters:
## __________________________________________
## Begin to check if it is finished...
## 2020-10-27 19:59:21
## New step. Start processing data:
## 2020-10-27 19:59:39
## processing finished
## [1] "2020-10-27 19:59:39 EDT"
## [1] "2020-10-27 19:59:39 EDT"
## [1] "2020-10-27 19:59:39 EDT"
## [1] "2020-10-27 19:59:39 EDT"
## [1] "2020-10-27 19:59:39 EDT"
## [1] "2020-10-27 19:59:39 EDT"
## All results have been saved.
## ==========================================================<<<<<<
## 

3 How TF Enrichment Works

3.1 Basic Concept for Relations

In this work, we define four kinds of relations which will be introduced in detail. And we will show how to test on enriched TFs rather than enriched motifs.

3.1.1 Region-gene Relation Table

When users input a region list in BED format, the relations between regions and one gene can be obtained from two ways:

  1. The regions which are overlapped with the functional region of this gene. This type of relation can be obtained from gene reference data.

  2. The regions are enhancers or other distal regulatory elements of this gene. This type of relation can be derived from 3D genome sequencing experiments such as Hi-C and HiChIP.

As we have already integrated and organized the potential relations between regions and genes from both gene reference and experiment profiles, users can use this source to obtain region-gene relations for their input region list quite easily.

Moreover, users can also customize potential relations from their own work.

3.1.2 Gene-TF Relation Matrix

The correlation scores between genes and TFs come from our previous approach named PECA. These scores are scaled in the range [-1,1]. Users can also customize this relation matrix based on their own experiment data.

3.1.3 TF-motif Relation Table

As we know, one TF may relate to multiple motifs. Here, we manually annotated more than 700 motifs with their corresponding TFs.

3.1.4 Motif-region Relation Table

We scan all annotated motifs on each input region to check if there are some motifs located in this region. Then, the motif-region relation table can be generated.

3.2 Enrichment Test

Based on the foreground region list from users, this package can randomly sample the same number of regions from genome as background.

3.2.1 Enrichment t-test

Foreground and background region lists can both be connected with genes in Region-gene Relation Table, so both of them will get a gene list. We name them as the foreground gene list and background gene list, respectively.

For each TF, we will perform t-test and obtain p-values for TF-Gene scores of foreground vs those of background. These scores come from Gene-TF Relation Matrix.

3.2.2 Enrichment Fisher’s exact test without threshold

Each TF may be correlated with multiple motifs. For each motif, this package will do the following Fisher’s exact test to test if one motif is enriched in foreground regions.

# with TF’s motif # without TF’s motif
# Foreground regions
# Background regions

Finally, the package will select the motif with the most significant p-value to represent this TF and then compute p-values for all of the TFs.

3.2.3 Enrichment Fisher’s exact test with threshold

Similar to previous section, the package will perform Fisher’s exact test with TF-Gene relation matrix. Different thresholds ([-1,1] with interval 0.1) for TF-Gene relation scores are used to filter the connected genes with foreground and background. In this way, we select the most significant p-value to represent this TF and calculate the p-values for all of TFs.

3.2.4 FDR

Finally, false discover rate correction is applied on the results of Fisher’s exact test with threshold. We rank all of the TFs based on their enrichment significance.

4 Customized Workflow

There are 4 steps to customize the workflow. You can follow the order below:

printMap()

4.1 Prepare inputs

The function of genBackground is to generate random background regions based on the input regions and to set the length of input sequence regions. We try to select background sequence regions that match the GC-content distribution of the input sequence regions. The default length and number of background regions are 1000 and 10,000, respectively. The length of sequence regions used for motif finding is important. Here, we set 1000 as default value, which means for each region, we get sequences from -500 to +500 relative from center. For example,

setGenome("testgenome")
## Configure bsgenome ...
## Configure bsgenome finished
## Configure motifpwm ...
## Configure motifpwm finished
## Configure motifPWMOBJ ...
## Configure motifPWMOBJ finished
## Configure RE_gene_corr ...
## Configure RE_gene_corr finished
## Configure Enhancer_RE_gene_corr ...
## Configure Enhancer_RE_gene_corr finished
## Configure MotifTFTable ...
## Configure MotifTFTable finished
## Configure MotifWeights ...
## Configure MotifWeights finished
## Configure TFgeneRelMtx ...
## Configure TFgeneRelMtx finished
## Configure SampleName ...
## Configure SampleName finished
## Configure OpenRegion ...
## Configure OpenRegion finished
## Configure ConserveRegion ...
## Configure ConserveRegion finished
## Configure HOMER ...
## Configure HOMER finished
## Configure OrgDb ...
## Configure OrgDb finished
foregroundBedPath <- system.file(package = "enrichTF", "extdata","testregion.bed")
gen <- genBackground(inputForegroundBed = foregroundBedPath)
## >>>>>>==========================================================
## Step Name:pipe_GenBackground
## All Parameters for This Step:
## |Input:
## |    inputForegroundBed:
## |        "/tmp/RtmpIQNw0S/Rinst167f76ddd29e/enrichTF/extdata/testregion.bed"
## |Output:
## |    outputForegroundBed:
## |        "/tmp/RtmpIQNw0S/Rbuild167f38ca57f8/enrichTF/vignettes/enrichTF-pipeline/Step_01_pipe_GenBackground/testregion.foreground.bed"
## |    outputBackgroundBed:
## |        "/tmp/RtmpIQNw0S/Rbuild167f38ca57f8/enrichTF/vignettes/enrichTF-pipeline/Step_01_pipe_GenBackground/testregion.background.bed"
## |    outputRegionBed:
## |        "/tmp/RtmpIQNw0S/Rbuild167f38ca57f8/enrichTF/vignettes/enrichTF-pipeline/Step_01_pipe_GenBackground/testregion.allregion.bed"
## |Other Parameters:
## |    bsgenome:
## |        An object of BSgenome
## |    regionLen:
## |        1000
## |    sampleNumb:
## |        0
## __________________________________________
## Begin to check if it is finished...
## [1] "2020-10-27 19:59:39 EDT"
## [1] "2020-10-27 19:59:39 EDT"
## [1] "2020-10-27 19:59:39 EDT"
## [1] "2020-10-27 19:59:39 EDT"
## [1] "2020-10-27 19:59:39 EDT"
## [1] "2020-10-27 19:59:39 EDT"
## 2020-10-27 19:59:39
## New step. Start processing data:
## 2020-10-27 19:59:39
## processing finished
## [1] "2020-10-27 19:59:39 EDT"
## [1] "2020-10-27 19:59:39 EDT"
## [1] "2020-10-27 19:59:39 EDT"
## [1] "2020-10-27 19:59:39 EDT"
## [1] "2020-10-27 19:59:39 EDT"
## [1] "2020-10-27 19:59:39 EDT"
## All results have been saved.
## ==========================================================<<<<<<
## 

4.2 Motifscan

The function of regionConnectTargetGene is to connect foreground and background regions to their target genes, which are predicted from PECA model.

For example,

conTG <- enrichRegionConnectTargetGene(gen)
## >>>>>>==========================================================
## Step Name:pipe_RegionConnectTargetGene
## All Parameters for This Step:
## |Input:
## |    inputForegroundBed:
## |        "/tmp/RtmpIQNw0S/Rbuild167f38ca57f8/enrichTF/vignettes/enrichTF-pipeline/Step_01_pipe_GenBackground/testregion.foreground.bed"
## |    inputBackgroundBed:
## |        "/tmp/RtmpIQNw0S/Rbuild167f38ca57f8/enrichTF/vignettes/enrichTF-pipeline/Step_01_pipe_GenBackground/testregion.background.bed"
## |Output:
## |    outputForegroundBed:
## |        "/tmp/RtmpIQNw0S/Rbuild167f38ca57f8/enrichTF/vignettes/enrichTF-pipeline/Step_02_pipe_RegionConnectTargetGene/testregion.gene.foreground.bed"
## |    ouputForgroundGeneTxt:
## |        "/tmp/RtmpIQNw0S/Rbuild167f38ca57f8/enrichTF/vignettes/enrichTF-pipeline/Step_02_pipe_RegionConnectTargetGene/testregion.gene.foreground.txt"
## |    outputBackgroundBed:
## |        "/tmp/RtmpIQNw0S/Rbuild167f38ca57f8/enrichTF/vignettes/enrichTF-pipeline/Step_02_pipe_RegionConnectTargetGene/testregion.gene.background.bed"
## |    regularGeneCorrBed:
## |        "/tmp/RtmpIQNw0S/Rbuild167f38ca57f8/enrichTF/vignettes/refdir/testgenome/RE_gene_corr.bed"
## |    enhancerRegularGeneCorrBed:
## |        "/tmp/RtmpIQNw0S/Rbuild167f38ca57f8/enrichTF/vignettes/refdir/testgenome/Enhancer_RE_gene_corr.bed"
## |Other Parameters:
## __________________________________________
## Begin to check if it is finished...
## [1] "2020-10-27 19:59:39 EDT"
## [1] "2020-10-27 19:59:39 EDT"
## [1] "2020-10-27 19:59:39 EDT"
## [1] "2020-10-27 19:59:39 EDT"
## [1] "2020-10-27 19:59:39 EDT"
## [1] "2020-10-27 19:59:39 EDT"
## 2020-10-27 19:59:39
## New step. Start processing data:
## 2020-10-27 19:59:40
## processing finished
## [1] "2020-10-27 19:59:40 EDT"
## [1] "2020-10-27 19:59:40 EDT"
## [1] "2020-10-27 19:59:40 EDT"
## [1] "2020-10-27 19:59:40 EDT"
## [1] "2020-10-27 19:59:40 EDT"
## [1] "2020-10-27 19:59:40 EDT"
## All results have been saved.
## ==========================================================<<<<<<
## 

4.3 Connect regions with target genes

The function MotifsInRegions is to scan for motif occurrences using the prepared PWMs and obtain the promising candidate motifs in these regions.

For example,

findMotif <- enrichFindMotifsInRegions(gen,motifRc="integrate")
## >>>>>>==========================================================
## Step Name:pipe_FindMotifsInRegions
## All Parameters for This Step:
## |Input:
## |    inputRegionBed:
## |        "/tmp/RtmpIQNw0S/Rbuild167f38ca57f8/enrichTF/vignettes/enrichTF-pipeline/Step_01_pipe_GenBackground/testregion.allregion.bed"
## |Output:
## |    outputRegionMotifBed:
## |        "/tmp/RtmpIQNw0S/Rbuild167f38ca57f8/enrichTF/vignettes/enrichTF-pipeline/Step_03_pipe_FindMotifsInRegions/testregion.region.motif.bed"
## |    outputMotifBed:
## |        "/tmp/RtmpIQNw0S/Rbuild167f38ca57f8/enrichTF/vignettes/enrichTF-pipeline/Step_03_pipe_FindMotifsInRegions/testregion.motif.bed"
## |Other Parameters:
## |    motifRc:
## |        "integrate"
## |    pwmObj:
## |        An object of PWMatrixList
## |    genome:
## |        "hg19"
## |    threads:
## |        2
## __________________________________________
## Begin to check if it is finished...
## [1] "2020-10-27 19:59:40 EDT"
## [1] "2020-10-27 19:59:40 EDT"
## [1] "2020-10-27 19:59:40 EDT"
## [1] "2020-10-27 19:59:40 EDT"
## [1] "2020-10-27 19:59:40 EDT"
## [1] "2020-10-27 19:59:40 EDT"
## 2020-10-27 19:59:40
## New step. Start processing data:
## 2020-10-27 19:59:41
## processing finished
## [1] "2020-10-27 19:59:41 EDT"
## [1] "2020-10-27 19:59:41 EDT"
## [1] "2020-10-27 19:59:41 EDT"
## [1] "2020-10-27 19:59:41 EDT"
## [1] "2020-10-27 19:59:41 EDT"
## [1] "2020-10-27 19:59:41 EDT"
## All results have been saved.
## ==========================================================<<<<<<
## 

4.4 TF enrichment test

The function of TFsEnrichInRegions is to test whether each TF is enriched on input regions. For example,

result <- enrichTFsEnrichInRegions(gen)
## >>>>>>==========================================================
## Step Name:pipe_TFsEnrichInRegions
## All Parameters for This Step:
## |Input:
## |    inputRegionBed:
## |        "/tmp/RtmpIQNw0S/Rbuild167f38ca57f8/enrichTF/vignettes/enrichTF-pipeline/Step_01_pipe_GenBackground/testregion.allregion.bed"
## |    inputRegionMotifBed:
## |        "/tmp/RtmpIQNw0S/Rbuild167f38ca57f8/enrichTF/vignettes/enrichTF-pipeline/Step_03_pipe_FindMotifsInRegions/testregion.region.motif.bed"
## |    inputForegroundGeneBed:
## |        "/tmp/RtmpIQNw0S/Rbuild167f38ca57f8/enrichTF/vignettes/enrichTF-pipeline/Step_02_pipe_RegionConnectTargetGene/testregion.gene.foreground.bed"
## |    inputBackgroundGeneBed:
## |        "/tmp/RtmpIQNw0S/Rbuild167f38ca57f8/enrichTF/vignettes/enrichTF-pipeline/Step_02_pipe_RegionConnectTargetGene/testregion.gene.background.bed"
## |    inputMotifWeights:
## |        "/tmp/RtmpIQNw0S/Rbuild167f38ca57f8/enrichTF/vignettes/refdir/testgenome/MotifWeights.RData"
## |    inputTFgeneRelMtx:
## |        "/tmp/RtmpIQNw0S/Rbuild167f38ca57f8/enrichTF/vignettes/refdir/testgenome/TFgeneRelMtx.RData"
## |    inputMotifTFTable:
## |        "/tmp/RtmpIQNw0S/Rbuild167f38ca57f8/enrichTF/vignettes/refdir/testgenome/MotifTFTable.RData"
## |Output:
## |    outputTFsEnrichTxt:
## |        "/tmp/RtmpIQNw0S/Rbuild167f38ca57f8/enrichTF/vignettes/enrichTF-pipeline/Step_04_pipe_TFsEnrichInRegions/testregion.PECA_TF_enrich.txt"
## |Other Parameters:
## __________________________________________
## Begin to check if it is finished...
## [1] "2020-10-27 19:59:42 EDT"
## [1] "2020-10-27 19:59:42 EDT"
## [1] "2020-10-27 19:59:42 EDT"
## [1] "2020-10-27 19:59:42 EDT"
## [1] "2020-10-27 19:59:42 EDT"
## [1] "2020-10-27 19:59:42 EDT"
## 2020-10-27 19:59:42
## New step. Start processing data:
## 2020-10-27 19:59:42
## processing finished
## [1] "2020-10-27 19:59:42 EDT"
## [1] "2020-10-27 19:59:42 EDT"
## [1] "2020-10-27 19:59:42 EDT"
## [1] "2020-10-27 19:59:42 EDT"
## [1] "2020-10-27 19:59:42 EDT"
## [1] "2020-10-27 19:59:42 EDT"
## All results have been saved.
## ==========================================================<<<<<<
## 

4.5 Building a pipeline

Users can build a pipeline easily based on pipeFrame (pipeline framework)

library(magrittr)
setGenome("testgenome")
## Configure bsgenome ...
## Configure bsgenome finished
## Configure motifpwm ...
## Configure motifpwm finished
## Configure motifPWMOBJ ...
## Configure motifPWMOBJ finished
## Configure RE_gene_corr ...
## Configure RE_gene_corr finished
## Configure Enhancer_RE_gene_corr ...
## Configure Enhancer_RE_gene_corr finished
## Configure MotifTFTable ...
## Configure MotifTFTable finished
## Configure MotifWeights ...
## Configure MotifWeights finished
## Configure TFgeneRelMtx ...
## Configure TFgeneRelMtx finished
## Configure SampleName ...
## Configure SampleName finished
## Configure OpenRegion ...
## Configure OpenRegion finished
## Configure ConserveRegion ...
## Configure ConserveRegion finished
## Configure HOMER ...
## Configure HOMER finished
## Configure OrgDb ...
## Configure OrgDb finished
foregroundBedPath <- system.file(package = "enrichTF", "extdata","testregion.bed")
result <- genBackground(inputForegroundBed = foregroundBedPath) %>%
enrichRegionConnectTargetGene%>%
enrichFindMotifsInRegions(motifRc="integrate") %>%
enrichTFsEnrichInRegions
## >>>>>>==========================================================
## Step Name:pipe_GenBackground
## All Parameters for This Step:
## |Input:
## |    inputForegroundBed:
## |        "/tmp/RtmpIQNw0S/Rinst167f76ddd29e/enrichTF/extdata/testregion.bed"
## |Output:
## |    outputForegroundBed:
## |        "/tmp/RtmpIQNw0S/Rbuild167f38ca57f8/enrichTF/vignettes/enrichTF-pipeline/Step_01_pipe_GenBackground/testregion.foreground.bed"
## |    outputBackgroundBed:
## |        "/tmp/RtmpIQNw0S/Rbuild167f38ca57f8/enrichTF/vignettes/enrichTF-pipeline/Step_01_pipe_GenBackground/testregion.background.bed"
## |    outputRegionBed:
## |        "/tmp/RtmpIQNw0S/Rbuild167f38ca57f8/enrichTF/vignettes/enrichTF-pipeline/Step_01_pipe_GenBackground/testregion.allregion.bed"
## |Other Parameters:
## |    bsgenome:
## |        An object of BSgenome
## |    regionLen:
## |        1000
## |    sampleNumb:
## |        0
## __________________________________________
## Begin to check if it is finished...
## [1] "2020-10-27 19:59:42 EDT"
## [1] "2020-10-27 19:59:42 EDT"
## [1] "2020-10-27 19:59:42 EDT"
## [1] "2020-10-27 19:59:42 EDT"
## [1] "2020-10-27 19:59:42 EDT"
## [1] "2020-10-27 19:59:42 EDT"
## The step:`pipe_GenBackground` was finished. Nothing to do.
## If you need to redo or rerun this step,please call 'clearStepCache(YourStepObject)'or remove file: /tmp/RtmpIQNw0S/Rbuild167f38ca57f8/enrichTF/vignettes/enrichTF-pipeline/Step_01_pipe_GenBackground/pipeFrame.obj.4f47b312.rds
## ==========================================================<<<<<<
## 
## >>>>>>==========================================================
## Step Name:pipe_RegionConnectTargetGene
## All Parameters for This Step:
## |Input:
## |    inputForegroundBed:
## |        "/tmp/RtmpIQNw0S/Rbuild167f38ca57f8/enrichTF/vignettes/enrichTF-pipeline/Step_01_pipe_GenBackground/testregion.foreground.bed"
## |    inputBackgroundBed:
## |        "/tmp/RtmpIQNw0S/Rbuild167f38ca57f8/enrichTF/vignettes/enrichTF-pipeline/Step_01_pipe_GenBackground/testregion.background.bed"
## |Output:
## |    outputForegroundBed:
## |        "/tmp/RtmpIQNw0S/Rbuild167f38ca57f8/enrichTF/vignettes/enrichTF-pipeline/Step_02_pipe_RegionConnectTargetGene/testregion.gene.foreground.bed"
## |    ouputForgroundGeneTxt:
## |        "/tmp/RtmpIQNw0S/Rbuild167f38ca57f8/enrichTF/vignettes/enrichTF-pipeline/Step_02_pipe_RegionConnectTargetGene/testregion.gene.foreground.txt"
## |    outputBackgroundBed:
## |        "/tmp/RtmpIQNw0S/Rbuild167f38ca57f8/enrichTF/vignettes/enrichTF-pipeline/Step_02_pipe_RegionConnectTargetGene/testregion.gene.background.bed"
## |    regularGeneCorrBed:
## |        "/tmp/RtmpIQNw0S/Rbuild167f38ca57f8/enrichTF/vignettes/refdir/testgenome/RE_gene_corr.bed"
## |    enhancerRegularGeneCorrBed:
## |        "/tmp/RtmpIQNw0S/Rbuild167f38ca57f8/enrichTF/vignettes/refdir/testgenome/Enhancer_RE_gene_corr.bed"
## |Other Parameters:
## __________________________________________
## Begin to check if it is finished...
## [1] "2020-10-27 19:59:42 EDT"
## [1] "2020-10-27 19:59:42 EDT"
## [1] "2020-10-27 19:59:42 EDT"
## [1] "2020-10-27 19:59:42 EDT"
## [1] "2020-10-27 19:59:42 EDT"
## [1] "2020-10-27 19:59:42 EDT"
## The step:`pipe_RegionConnectTargetGene` was finished. Nothing to do.
## If you need to redo or rerun this step,please call 'clearStepCache(YourStepObject)'or remove file: /tmp/RtmpIQNw0S/Rbuild167f38ca57f8/enrichTF/vignettes/enrichTF-pipeline/Step_02_pipe_RegionConnectTargetGene/pipeFrame.obj.6fc9b6c1.rds
## ==========================================================<<<<<<
## 
## >>>>>>==========================================================
## Step Name:pipe_FindMotifsInRegions
## All Parameters for This Step:
## |Input:
## |    inputRegionBed:
## |        "/tmp/RtmpIQNw0S/Rbuild167f38ca57f8/enrichTF/vignettes/enrichTF-pipeline/Step_01_pipe_GenBackground/testregion.allregion.bed"
## |Output:
## |    outputRegionMotifBed:
## |        "/tmp/RtmpIQNw0S/Rbuild167f38ca57f8/enrichTF/vignettes/enrichTF-pipeline/Step_03_pipe_FindMotifsInRegions/testregion.region.motif.bed"
## |    outputMotifBed:
## |        "/tmp/RtmpIQNw0S/Rbuild167f38ca57f8/enrichTF/vignettes/enrichTF-pipeline/Step_03_pipe_FindMotifsInRegions/testregion.motif.bed"
## |Other Parameters:
## |    motifRc:
## |        "integrate"
## |    pwmObj:
## |        An object of PWMatrixList
## |    genome:
## |        "hg19"
## |    threads:
## |        2
## __________________________________________
## Begin to check if it is finished...
## [1] "2020-10-27 19:59:42 EDT"
## [1] "2020-10-27 19:59:42 EDT"
## [1] "2020-10-27 19:59:42 EDT"
## [1] "2020-10-27 19:59:42 EDT"
## [1] "2020-10-27 19:59:42 EDT"
## [1] "2020-10-27 19:59:42 EDT"
## The step:`pipe_FindMotifsInRegions` was finished. Nothing to do.
## If you need to redo or rerun this step,please call 'clearStepCache(YourStepObject)'or remove file: /tmp/RtmpIQNw0S/Rbuild167f38ca57f8/enrichTF/vignettes/enrichTF-pipeline/Step_03_pipe_FindMotifsInRegions/pipeFrame.obj.6f86e46b.rds
## ==========================================================<<<<<<
## 
## >>>>>>==========================================================
## Step Name:pipe_TFsEnrichInRegions
## All Parameters for This Step:
## |Input:
## |    inputRegionBed:
## |        "/tmp/RtmpIQNw0S/Rbuild167f38ca57f8/enrichTF/vignettes/enrichTF-pipeline/Step_01_pipe_GenBackground/testregion.allregion.bed"
## |    inputRegionMotifBed:
## |        "/tmp/RtmpIQNw0S/Rbuild167f38ca57f8/enrichTF/vignettes/enrichTF-pipeline/Step_03_pipe_FindMotifsInRegions/testregion.region.motif.bed"
## |    inputForegroundGeneBed:
## |        "/tmp/RtmpIQNw0S/Rbuild167f38ca57f8/enrichTF/vignettes/enrichTF-pipeline/Step_02_pipe_RegionConnectTargetGene/testregion.gene.foreground.bed"
## |    inputBackgroundGeneBed:
## |        "/tmp/RtmpIQNw0S/Rbuild167f38ca57f8/enrichTF/vignettes/enrichTF-pipeline/Step_02_pipe_RegionConnectTargetGene/testregion.gene.background.bed"
## |    inputMotifWeights:
## |        "/tmp/RtmpIQNw0S/Rbuild167f38ca57f8/enrichTF/vignettes/refdir/testgenome/MotifWeights.RData"
## |    inputTFgeneRelMtx:
## |        "/tmp/RtmpIQNw0S/Rbuild167f38ca57f8/enrichTF/vignettes/refdir/testgenome/TFgeneRelMtx.RData"
## |    inputMotifTFTable:
## |        "/tmp/RtmpIQNw0S/Rbuild167f38ca57f8/enrichTF/vignettes/refdir/testgenome/MotifTFTable.RData"
## |Output:
## |    outputTFsEnrichTxt:
## |        "/tmp/RtmpIQNw0S/Rbuild167f38ca57f8/enrichTF/vignettes/enrichTF-pipeline/Step_04_pipe_TFsEnrichInRegions/testregion.PECA_TF_enrich.txt"
## |Other Parameters:
## __________________________________________
## Begin to check if it is finished...
## [1] "2020-10-27 19:59:42 EDT"
## [1] "2020-10-27 19:59:42 EDT"
## [1] "2020-10-27 19:59:42 EDT"
## [1] "2020-10-27 19:59:42 EDT"
## [1] "2020-10-27 19:59:42 EDT"
## [1] "2020-10-27 19:59:42 EDT"
## The step:`pipe_TFsEnrichInRegions` was finished. Nothing to do.
## If you need to redo or rerun this step,please call 'clearStepCache(YourStepObject)'or remove file: /tmp/RtmpIQNw0S/Rbuild167f38ca57f8/enrichTF/vignettes/enrichTF-pipeline/Step_04_pipe_TFsEnrichInRegions/pipeFrame.obj.ced90755.rds
## ==========================================================<<<<<<
## 

4.6 Result

Here we show an example result when applying this package on our own data.

examplefile  <- system.file(package = "enrichTF", "extdata","result.example.txt")
read.table(examplefile, sep='\t', header = TRUE)%>%
  knitr::kable() 
TF Motif_enrichment Targt_gene_enrichment P_value FDR
MITF 0.0000000 0.0383390 0e+00 0.00e+00
TFEC 0.0000000 0.4857386 0e+00 0.00e+00
MLX 0.0000000 0.8147766 0e+00 0.00e+00
TFEB 0.0000000 0.5303603 0e+00 0.00e+00
TFE3 0.0000000 0.4067377 0e+00 0.00e+00
MZF1 0.0000000 0.0532782 0e+00 5.00e-07
USF2 0.0000000 0.5766573 0e+00 9.00e-07
USF1 0.0000000 0.2185547 0e+00 9.00e-07
CEBPA 0.0000012 0.0046147 0e+00 1.50e-06
PKNOX2 0.0000001 0.0140376 0e+00 2.30e-06
FOXA3 0.0000071 0.8939957 0e+00 2.30e-06
CEBPB 0.0000012 0.0118270 0e+00 2.30e-06
PKNOX1 0.0000001 0.9561768 0e+00 2.30e-06
MEIS1 0.0000001 0.1229445 1e-07 2.40e-06
TGIF1 0.0000001 0.2912322 1e-07 2.40e-06
TGIF2 0.0000001 0.9898682 1e-07 2.40e-06
FOXD3 0.0000071 0.8125302 1e-07 4.10e-06
MEIS2 0.0000001 0.8463790 1e-07 5.00e-06
DLX3 0.0000477 0.0043755 2e-07 5.90e-06
MYCN 0.0000004 0.4520504 2e-07 7.30e-06
FOXG1 0.0000071 0.8432650 2e-07 7.40e-06
ID1 0.0000017 0.0093116 3e-07 9.60e-06
MAX 0.0000017 0.5516440 3e-07 1.04e-05
FOXC2 0.0000071 0.8538362 4e-07 1.09e-05
DLX5 0.0000477 0.0017341 4e-07 1.09e-05
MYC 0.0000004 0.2194055 4e-07 1.11e-05
MEIS3 0.0000001 0.8591956 4e-07 1.11e-05
RORA 0.0001585 0.0021495 4e-07 1.11e-05
CREB3L2 0.0000017 0.6550453 5e-07 1.15e-05

5 Session Information

sessionInfo()
## R version 4.0.3 (2020-10-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.5 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.12-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.12-bioc/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats4    parallel  stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] magrittr_1.5                      org.Hs.eg.db_3.12.0              
##  [3] AnnotationDbi_1.52.0              Biobase_2.50.0                   
##  [5] BSgenome.Hsapiens.UCSC.hg19_1.4.3 BSgenome_1.58.0                  
##  [7] rtracklayer_1.50.0                Biostrings_2.58.0                
##  [9] XVector_0.30.0                    GenomicRanges_1.42.0             
## [11] GenomeInfoDb_1.26.0               IRanges_2.24.0                   
## [13] S4Vectors_0.28.0                  BiocGenerics_0.36.0              
## [15] enrichTF_1.6.0                    pipeFrame_1.6.0                  
## 
## loaded via a namespace (and not attached):
##   [1] readxl_1.3.1                shadowtext_0.0.7           
##   [3] backports_1.1.10            fastmatch_1.1-0            
##   [5] plyr_1.8.6                  igraph_1.2.6               
##   [7] splines_4.0.3               heatmap3_1.1.7             
##   [9] BiocParallel_1.24.0         ggplot2_3.3.2              
##  [11] TFBSTools_1.28.0            digest_0.6.27              
##  [13] htmltools_0.5.0             GOSemSim_2.16.0            
##  [15] viridis_0.5.1               GO.db_3.12.0               
##  [17] memoise_1.1.0               JASPAR2018_1.1.1           
##  [19] openxlsx_4.2.3              fastcluster_1.1.25         
##  [21] readr_1.4.0                 annotate_1.68.0            
##  [23] graphlayouts_0.7.1          matrixStats_0.57.0         
##  [25] R.utils_2.10.1              enrichplot_1.10.0          
##  [27] colorspace_1.4-1            blob_1.2.1                 
##  [29] ggrepel_0.8.2               haven_2.3.1                
##  [31] xfun_0.18                   dplyr_1.0.2                
##  [33] crayon_1.3.4                RCurl_1.98-1.2             
##  [35] jsonlite_1.7.1              scatterpie_0.1.5           
##  [37] TFMPvalue_0.0.8             glue_1.4.2                 
##  [39] polyclip_1.10-0             gtable_0.3.0               
##  [41] zlibbioc_1.36.0             DelayedArray_0.16.0        
##  [43] car_3.0-10                  abind_1.4-5                
##  [45] scales_1.1.1                DOSE_3.16.0                
##  [47] DBI_1.1.0                   rstatix_0.6.0              
##  [49] Rcpp_1.0.5                  viridisLite_0.3.0          
##  [51] xtable_1.8-4                foreign_0.8-80             
##  [53] bit_4.0.4                   htmlwidgets_1.5.2          
##  [55] httr_1.4.2                  fgsea_1.16.0               
##  [57] RColorBrewer_1.1-2          ellipsis_0.3.1             
##  [59] pkgconfig_2.0.3             XML_3.99-0.5               
##  [61] R.methodsS3_1.8.1           farver_2.0.3               
##  [63] tidyselect_1.1.0            rlang_0.4.8                
##  [65] reshape2_1.4.4              cellranger_1.1.0           
##  [67] munsell_0.5.0               tools_4.0.3                
##  [69] visNetwork_2.0.9            downloader_0.4             
##  [71] DirichletMultinomial_1.32.0 generics_0.0.2             
##  [73] RSQLite_2.2.1               broom_0.7.2                
##  [75] evaluate_0.14               stringr_1.4.0              
##  [77] yaml_2.2.1                  knitr_1.30                 
##  [79] bit64_4.0.5                 tidygraph_1.2.0            
##  [81] zip_2.1.1                   caTools_1.18.0             
##  [83] purrr_0.3.4                 KEGGREST_1.30.0            
##  [85] ggraph_2.0.3                R.oo_1.24.0                
##  [87] poweRlaw_0.70.6             pracma_2.2.9               
##  [89] DO.db_2.9                   compiler_4.0.3             
##  [91] curl_4.3                    png_0.1-7                  
##  [93] ggsignif_0.6.0              tibble_3.0.4               
##  [95] tweenr_1.0.1                stringi_1.5.3              
##  [97] highr_0.8                   forcats_0.5.0              
##  [99] lattice_0.20-41             CNEr_1.26.0                
## [101] Matrix_1.2-18               vctrs_0.3.4                
## [103] pillar_1.4.6                lifecycle_0.2.0            
## [105] BiocManager_1.30.10         data.table_1.13.2          
## [107] cowplot_1.1.0               bitops_1.0-6               
## [109] qvalue_2.22.0               R6_2.4.1                   
## [111] gridExtra_2.3               rio_0.5.16                 
## [113] MASS_7.3-53                 gtools_3.8.2               
## [115] seqLogo_1.56.0              SummarizedExperiment_1.20.0
## [117] GenomicAlignments_1.26.0    Rsamtools_2.6.0            
## [119] GenomeInfoDbData_1.2.4      hms_0.5.3                  
## [121] clusterProfiler_3.18.0      motifmatchr_1.12.0         
## [123] grid_4.0.3                  tidyr_1.1.2                
## [125] rmarkdown_2.5               rvcheck_0.1.8              
## [127] carData_3.0-4               MatrixGenerics_1.2.0       
## [129] ggpubr_0.4.0                ggforce_0.3.2