YY1 ChIA-PET motif analysis (single call)

Jennifer Hammelman, Konstantin Krismer

2021-07-12

set.seed(17)
library(spatzie)

This vignette describes how to use spatzie to identify pairs of transcription factors whose sequence motifs (that describe their binding sites) are co-enriched in enhancers and promoters that interact with each other. ChIA-PET (Fullwood and Ruan 2009), HiChIP (Mumbach et al. 2016) or Hi-C (Lieberman-Aiden et al. 2009) are molecular biology assays commonly used to investigate long-range genomic interactions and the data they generate, once properly processed (BEDPE format), serves as input to spatzie co-enrichment analyses.

Load data

Here we use interactions data in BEDPE format based on a ChIA-PET assay. Interactions data in BEDPE format is a tab-separated file, where each line describes one interaction between two anchors, i.e., two regions of the genome that are potentially far away from each other.

yy1_interactions_file is a toy BEDPE example data from a ChIA-PET experiment in murine embryonic stem cells, targeting the transcription factor YY1.

motifs_file is a toy motif database. The HOCOMOCO motif database (Kulakovskiy et al. 2018) is commonly used, but any motif file compatible with TFBSTools::readJASPARMatrix() can be used.

yy1_interactions_file <- system.file("extdata/yy1_interactions.bedpe.gz",
                                     package = "spatzie")
int_data_df <- read.table(gzfile(yy1_interactions_file), header = TRUE,
                          sep = "\t")

motifs_file <- system.file("extdata/motifs_subset.txt.gz",
                           package = "spatzie")

Run spatzie enhancer-promoter motif co-enrichment analysis

spatzie::find_ep_coenrichment() identifies co-enriched pairs of motifs in enhancer-promoter interactions by first annotating interaction anchors in int_data_df and discarding interactions that are not between promoters and enhancers. Second, anchor regions are scanned for motif hits using spatzie::scan_motifs(). Third, motifs present in less than a certain fraction of interactions are discarded (spatzie::filter_motifs()). And fourth, spatzie::anchor_pair_enrich() identifies co-enriched motif pairs, i.e., motif A is consistently present in promoters that interact with enhancers containing motif B.

res <- spatzie::find_ep_coenrichment(int_data_df, motifs_file,
                                     motifs_file_matrix_format = "pfm",
                                     genome_id = "mm9",
                                     cooccurrence_method = "count")

For more information, please see the help page (?spatzie::find_ep_coenrichment) and the spatzie paper (citation("spatzie")).

Results

Interaction type pie chart

The plot shows the types of interactions before filtering.

Motif co-enrichment matrix

The heatmap shows co-enrichment for all motif pairs.

plot of chunk sc_compute_cosignificance_count_save

YY1 binds enhancer and promoter sites providing scaffolding that forms enhancer-promoter interactions in mouse stem cells (Weintraub et al. 2017). As expected, spatzie identified a statistically significant co-occurrence of YY1 motifs indicating this dependency.

When interpreting spatzie results, keep in mind that motif databases such as HOCOMOCO often include groups of transcription factors with highly similar DNA-binding motifs (in this example YY1 and ZF.5), and the putative co-enrichment of one pair of transcription factor binding sites might be explained by another pair with highly similar motifs.

Please note that the motifs and the interactions data used in this vignette are dummy data used for demonstration purposes only.

Additional information

Most of the functionality of the spatzie package is also offered through the website at https://spatzie.mit.edu.

For a more detailed discussion on spatzie, please have a look at the paper:

spatzie: An R package for identifying significant transcription factor motif co-enrichment from enhancer-promoter interactions
Jennifer Hammelman, Konstantin Krismer, and David K. Gifford
Nucleic Acids Research, 2022, gkac036; DOI: https://doi.org/10.1093/nar/gkac036

Session info

sessionInfo()
## R version 4.1.0 (2021-05-18)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19042)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] BSgenome.Mmusculus.UCSC.mm9_1.4.0 BSgenome_1.60.0                   rtracklayer_1.52.0               
##  [4] Biostrings_2.60.1                 XVector_0.32.0                    GenomicRanges_1.44.0             
##  [7] GenomeInfoDb_1.28.1               IRanges_2.26.0                    S4Vectors_0.30.0                 
## [10] BiocGenerics_0.38.0               spatzie_0.99.6                    usethis_2.0.1                    
## 
## loaded via a namespace (and not attached):
##   [1] colorspace_2.0-2                        rjson_0.2.20                            ellipsis_0.3.2                         
##   [4] biovizBase_1.40.0                       htmlTable_2.2.1                         base64enc_0.1-3                        
##   [7] fs_1.5.0                                dichromat_2.0-0                         rstudioapi_0.13                        
##  [10] farver_2.1.0                            bit64_4.0.5                             AnnotationDbi_1.54.1                   
##  [13] fansi_0.5.0                             xml2_1.3.2                              splines_4.1.0                          
##  [16] R.methodsS3_1.8.1                       cachem_1.0.5                            knitr_1.33                             
##  [19] Formula_1.2-4                           Rsamtools_2.8.0                         seqLogo_1.58.0                         
##  [22] annotate_1.70.0                         cluster_2.1.2                           GO.db_3.13.0                           
##  [25] dbplyr_2.1.1                            png_0.1-7                               R.oo_1.24.0                            
##  [28] pheatmap_1.0.12                         readr_1.4.0                             compiler_4.1.0                         
##  [31] httr_1.4.2                              backports_1.2.1                         lazyeval_0.2.2                         
##  [34] assertthat_0.2.1                        Matrix_1.3-4                            fastmap_1.1.0                          
##  [37] htmltools_0.5.1.1                       prettyunits_1.1.1                       tools_4.1.0                            
##  [40] igraph_1.2.6                            gtable_0.3.0                            glue_1.4.2                             
##  [43] TFMPvalue_0.0.8                         GenomeInfoDbData_1.2.6                  reshape2_1.4.4                         
##  [46] dplyr_1.0.7                             rappdirs_0.3.3                          Rcpp_1.0.7                             
##  [49] TxDb.Mmusculus.UCSC.mm9.knownGene_3.2.2 Biobase_2.52.0                          vctrs_0.3.8                            
##  [52] xfun_0.24                               CNEr_1.28.0                             stringr_1.4.0                          
##  [55] lifecycle_1.0.0                         ensembldb_2.16.2                        restfulr_0.0.13                        
##  [58] poweRlaw_0.70.6                         gtools_3.9.2                            InteractionSet_1.20.0                  
##  [61] XML_3.99-0.6                            zlibbioc_1.38.0                         scales_1.1.1                           
##  [64] VariantAnnotation_1.38.0                ProtGenerics_1.24.0                     GenomicInteractions_1.26.0             
##  [67] hms_1.1.0                               MatrixGenerics_1.4.0                    SummarizedExperiment_1.22.0            
##  [70] AnnotationFilter_1.16.0                 RColorBrewer_1.1-2                      yaml_2.2.1                             
##  [73] curl_4.3.2                              memoise_2.0.0                           gridExtra_2.3                          
##  [76] ggplot2_3.3.5                           biomaRt_2.48.2                          rpart_4.1-15                           
##  [79] latticeExtra_0.6-29                     stringi_1.6.2                           RSQLite_2.2.7                          
##  [82] highr_0.9                               BiocIO_1.2.0                            checkmate_2.0.0                        
##  [85] GenomicFeatures_1.44.0                  caTools_1.18.2                          filelock_1.0.2                         
##  [88] BiocParallel_1.26.1                     rlang_0.4.11                            pkgconfig_2.0.3                        
##  [91] matrixStats_0.59.0                      bitops_1.0-7                            evaluate_0.14                          
##  [94] pracma_2.3.3                            lattice_0.20-44                         purrr_0.3.4                            
##  [97] labeling_0.4.2                          htmlwidgets_1.5.3                       GenomicAlignments_1.28.0               
## [100] bit_4.0.4                               tidyselect_1.1.1                        plyr_1.8.6                             
## [103] magrittr_2.0.1                          R6_2.5.0                                generics_0.1.0                         
## [106] Hmisc_4.5-0                             DelayedArray_0.18.0                     DBI_1.1.1                              
## [109] pillar_1.6.1                            foreign_0.8-81                          survival_3.2-11                        
## [112] KEGGREST_1.32.0                         RCurl_1.98-1.3                          nnet_7.3-16                            
## [115] tibble_3.1.2                            crayon_1.4.1                            utf8_1.2.1                             
## [118] BiocFileCache_2.0.0                     rmarkdown_2.9                           jpeg_0.1-8.1                           
## [121] progress_1.2.2                          TFBSTools_1.30.0                        grid_4.1.0                             
## [124] data.table_1.14.0                       blob_1.2.1                              digest_0.6.27                          
## [127] xtable_1.8-4                            R.utils_2.10.1                          munsell_0.5.0                          
## [130] DirichletMultinomial_1.34.0             motifmatchr_1.14.0                      Gviz_1.36.2

References

Fullwood, M. J., and Y. Ruan. 2009. “ChIP-based methods for the identification of long-range chromatin interactions.” J. Cell. Biochem. 107 (1): 30–39.

Kulakovskiy, Ivan V, Ilya E Vorontsov, Ivan S Yevshin, Ruslan N Sharipov, Alla D Fedorova, Eugene I Rumynskiy, Yulia A Medvedeva, et al. 2018. “HOCOMOCO: towards a complete collection of transcription factor binding models for human and mouse via large-scale ChIP-Seq analysis.” Nucleic Acids Research 46 (D1): D252–D259. https://doi.org/10.1093/nar/gkx1106.

Lieberman-Aiden, E., N. L. van Berkum, L. Williams, M. Imakaev, T. Ragoczy, A. Telling, I. Amit, et al. 2009. “Comprehensive mapping of long-range interactions reveals folding principles of the human genome.” Science 326 (5950): 289–93.

Mumbach, M. R., A. J. Rubin, R. A. Flynn, C. Dai, P. A. Khavari, W. J. Greenleaf, and H. Y. Chang. 2016. “HiChIP: efficient and sensitive analysis of protein-directed genome architecture.” Nat. Methods 13 (11): 919–22.

Weintraub, Abraham S, Charles H Li, Alicia V Zamudio, Alla A Sigova, Nancy M Hannett, Daniel S Day, Brian J Abraham, Malkiel A Cohen, Behnam Nabet, and Dennis L Buckley. 2017. “YY1 is a structural regulator of enhancer-promoter loops.” Cell 171 (7): 1573–88.