The 4D Nucleome Data Coordination and Integration Center (DCIC) has developed and actively maintains a data portal providing public access to a wealth of resources to investigate 3D chromatin architecture. Notably, 3D chromatin conformation libraries relying on different technologies (“in situ” or “dilution” Hi-C, Capture Hi-C, Micro-C, DNase Hi-C, …), generated by 50+ collaborating labs, were homogenously processed, yielding more than 350 sets of processed files.
fourDNData
(read 4DN-Data) is a package giving programmatic access
to these uniformly processed Hi-C contact files.
The fourDNData()
function provides a gateway to 4DN-hosted Hi-C files,
including contact matrices (in .hic
or .mcool
) and other Hi-C derived
files such as annotated compartments, domains, insulation scores, or
.pairs
files.
library(fourDNData)
head(fourDNData())
#> experimentSetAccession fileType size organism experimentType details
#> 1 4DNES18BMU79 pairs 10151.53 mouse in situ Hi-C DpnII
#> 3 4DNES18BMU79 hic 5285.82 mouse in situ Hi-C DpnII
#> 4 4DNES18BMU79 mcool 6110.75 mouse in situ Hi-C DpnII
#> 5 4DNES18BMU79 boundaries 0.12 mouse in situ Hi-C DpnII
#> 6 4DNES18BMU79 insulation 7.18 mouse in situ Hi-C DpnII
#> 7 4DNES18BMU79 compartments 0.18 mouse in situ Hi-C DpnII
#> dataset
#> 1 Hi-C on Mouse Olfactory System cells
#> 3 Hi-C on Mouse Olfactory System cells
#> 4 Hi-C on Mouse Olfactory System cells
#> 5 Hi-C on Mouse Olfactory System cells
#> 6 Hi-C on Mouse Olfactory System cells
#> 7 Hi-C on Mouse Olfactory System cells
#> condition
#> 1 Mature olfactory sensory neurons with conditional Ldb1 knockout
#> 3 Mature olfactory sensory neurons with conditional Ldb1 knockout
#> 4 Mature olfactory sensory neurons with conditional Ldb1 knockout
#> 5 Mature olfactory sensory neurons with conditional Ldb1 knockout
#> 6 Mature olfactory sensory neurons with conditional Ldb1 knockout
#> 7 Mature olfactory sensory neurons with conditional Ldb1 knockout
#> biosource biosourceType publication
#> 1 olfactory receptor cell primary cell Monahan K et al. (2019)
#> 3 olfactory receptor cell primary cell Monahan K et al. (2019)
#> 4 olfactory receptor cell primary cell Monahan K et al. (2019)
#> 5 olfactory receptor cell primary cell Monahan K et al. (2019)
#> 6 olfactory receptor cell primary cell Monahan K et al. (2019)
#> 7 olfactory receptor cell primary cell Monahan K et al. (2019)
#> URL
#> 1 https://4dn-open-data-public.s3.amazonaws.com/fourfront-webprod/wfoutput/49504f97-904e-48c1-8c20-1033680b66da/4DNFIC5AHBPV.pairs.gz
#> 3 https://4dn-open-data-public.s3.amazonaws.com/fourfront-webprod/wfoutput/6cd4378a-8f51-4e65-99eb-15f5c80abf8d/4DNFIT4I5C6Z.hic
#> 4 https://4dn-open-data-public.s3.amazonaws.com/fourfront-webprod/wfoutput/01fb704f-2fd7-48c6-91af-c5f4584529ed/4DNFIVPAXJO8.mcool
#> 5 https://4dn-open-data-public.s3.amazonaws.com/fourfront-webprod/wfoutput/5c07cdee-53e2-43e0-8853-cfe5f057b3f1/4DNFIR3XCIMA.bed.gz
#> 6 https://4dn-open-data-public.s3.amazonaws.com/fourfront-webprod/wfoutput/d1f4beb9-701f-4188-abe2-6271fe658770/4DNFIXKKNMS7.bw
#> 7 https://4dn-open-data-public.s3.amazonaws.com/fourfront-webprod/wfoutput/3d429647-51c8-4e3a-a18b-eec0b1480905/4DNFIN13N8C1.bw
cool_file <- fourDNData('4DNESDP9ECMN')
cool_file
#> experimentSetAccession fileType size organism experimentType details
#> 1067 4DNESDP9ECMN pairs 14.77 human in situ Hi-C MboI
#> 1069 4DNESDP9ECMN hic 197.60 human in situ Hi-C MboI
#> 1070 4DNESDP9ECMN mcool 48.27 human in situ Hi-C MboI
#> 1071 4DNESDP9ECMN compartments 0.20 human in situ Hi-C MboI
#> dataset
#> 1067 Hi-C on GM12878 cells - protocol variations
#> 1069 Hi-C on GM12878 cells - protocol variations
#> 1070 Hi-C on GM12878 cells - protocol variations
#> 1071 Hi-C on GM12878 cells - protocol variations
#> condition
#> 1067 in situ Hi-C on GM12878 crosslinking titration - 1% FA, 1 min, RT
#> 1069 in situ Hi-C on GM12878 crosslinking titration - 1% FA, 1 min, RT
#> 1070 in situ Hi-C on GM12878 crosslinking titration - 1% FA, 1 min, RT
#> 1071 in situ Hi-C on GM12878 crosslinking titration - 1% FA, 1 min, RT
#> biosource biosourceType publication
#> 1067 GM12878 immortalized cell line Sanborn AL et al. (2015)
#> 1069 GM12878 immortalized cell line Sanborn AL et al. (2015)
#> 1070 GM12878 immortalized cell line Sanborn AL et al. (2015)
#> 1071 GM12878 immortalized cell line Sanborn AL et al. (2015)
#> URL
#> 1067 https://4dn-open-data-public.s3.amazonaws.com/fourfront-webprod/wfoutput/c2ae7404-501a-4d80-957b-cd677e2bd38a/4DNFIU5XG6TN.pairs.gz
#> 1069 https://4dn-open-data-public.s3.amazonaws.com/fourfront-webprod/wfoutput/70c1472d-cf3a-41d7-8682-cd03b7cc978d/4DNFI2AGEBE5.hic
#> 1070 https://4dn-open-data-public.s3.amazonaws.com/fourfront-webprod/wfoutput/c81d77c0-b57e-4a29-80ac-ec6ab0714f57/4DNFI4988896.mcool
#> 1071 https://4dn-open-data-public.s3.amazonaws.com/fourfront-webprod/wfoutput/dc07042c-62d5-46ae-905d-8ec99b10cf9a/4DNFIDO8B3C6.bw
fourDNData
package can be installed from Bioconductor using the following
command:
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("fourDNData")
The HiCExperiment
package can be used to import .mcool
files provided by
fourDNData
. Refer to HiCExperiment
package documentation for further
information.
library(HiCExperiment)
#> Consider using the `HiContacts` package to perform advanced genomic operations
#> on `HiCExperiment` objects.
#>
#> Read "Orchestrating Hi-C analysis with Bioconductor" online book to learn more:
#> https://js2264.github.io/OHCA/
ID <- '4DNESDP9ECMN'
cf <- CoolFile(
path = fourDNData(ID, type = 'mcool'),
metadata = as.list(fourDNData()[fourDNData()$experimentSetAccession == ID,])
)
x <- import(cf, resolution = 250000, focus = 'chr5:10000000-50000000')
x
#> `HiCExperiment` object with 7,466 contacts over 161 regions
#> -------
#> fileName: "/home/biocbuild/.cache/R/fourDNData/2d9d63fd178f6_4DNFI4988896.mcool"
#> focus: "chr5:10,000,000-50,000,000"
#> resolutions(13): 1000 2000 ... 5000000 10000000
#> active resolution: 250000
#> interactions: 2158
#> scores(2): count balanced
#> topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0)
#> pairsFile: N/A
#> metadata(12): experimentSetAccession fileType ... publication URL
interactions(x)
#> GInteractions object with 2158 interactions and 4 metadata columns:
#> seqnames1 ranges1 seqnames2 ranges2 |
#> <Rle> <IRanges> <Rle> <IRanges> |
#> [1] chr5 10000001-10250000 --- chr5 10000001-10250000 |
#> [2] chr5 10000001-10250000 --- chr5 10250001-10500000 |
#> [3] chr5 10000001-10250000 --- chr5 10500001-10750000 |
#> [4] chr5 10000001-10250000 --- chr5 10750001-11000000 |
#> [5] chr5 10000001-10250000 --- chr5 11250001-11500000 |
#> ... ... ... ... ... ... .
#> [2154] chr5 46000001-46250000 --- chr5 46250001-46500000 |
#> [2155] chr5 46250001-46500000 --- chr5 46250001-46500000 |
#> [2156] chr5 46250001-46500000 --- chr5 47000001-47250000 |
#> [2157] chr5 49500001-49750000 --- chr5 49500001-49750000 |
#> [2158] chr5 49750001-50000000 --- chr5 49750001-50000000 |
#> bin_id1 bin_id2 count balanced
#> <numeric> <numeric> <numeric> <numeric>
#> [1] 3560 3560 30 0.3097516
#> [2] 3560 3561 7 0.0574021
#> [3] 3560 3562 2 0.0187244
#> [4] 3560 3563 6 0.0567218
#> [5] 3560 3565 1 0.0108409
#> ... ... ... ... ...
#> [2154] 3704 3705 2 NaN
#> [2155] 3705 3705 5 NaN
#> [2156] 3705 3708 1 NaN
#> [2157] 3718 3718 11 0.320998
#> [2158] 3719 3719 1 NaN
#> -------
#> regions: 161 ranges and 4 metadata columns
#> seqinfo: 24 sequences from an unspecified genome
as(x, 'ContactMatrix')
#> class: ContactMatrix
#> dim: 161 161
#> type: dgCMatrix
#> rownames: NULL
#> colnames: NULL
#> metadata(0):
#> regions: 161
Rather than importing multiple files corresponding to a single experimentSet
accession ID one by one, one can import all the available files associated with
a experimentSet accession ID into a HiCExperiment
object by using the
fourDNHiCExperiment()
function.
library(HiCExperiment)
x <- fourDNHiCExperiment('4DNESDP9ECMN')
#> Fetching local Hi-C contact map from Bioc cache
#> Fetching local compartments bigwig file from Bioc cache
#> Insulation not found for the provided experimentSet accession.
#> Borders not found for the provided experimentSet accession.
#> Importing contacts in memory
sessionInfo()
#> R version 4.3.0 RC (2023-04-13 r84269)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 22.04.2 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.17-bioc/R/lib/libRblas.so
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: America/New_York
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] HiCExperiment_1.0.0 fourDNData_1.0.0 BiocStyle_2.28.0
#>
#> loaded via a namespace (and not attached):
#> [1] tidyselect_1.2.0 dplyr_1.1.2
#> [3] blob_1.2.4 filelock_1.0.2
#> [5] Biostrings_2.68.0 bitops_1.0-7
#> [7] fastmap_1.1.1 RCurl_1.98-1.12
#> [9] BiocFileCache_2.8.0 GenomicAlignments_1.36.0
#> [11] XML_3.99-0.14 digest_0.6.31
#> [13] lifecycle_1.0.3 RSQLite_2.3.1
#> [15] magrittr_2.0.3 compiler_4.3.0
#> [17] rlang_1.1.0 sass_0.4.5
#> [19] tools_4.3.0 utf8_1.2.3
#> [21] yaml_2.3.7 rtracklayer_1.60.0
#> [23] knitr_1.42 bit_4.0.5
#> [25] curl_5.0.0 DelayedArray_0.26.0
#> [27] BiocParallel_1.34.0 withr_2.5.0
#> [29] purrr_1.0.1 BiocGenerics_0.46.0
#> [31] grid_4.3.0 stats4_4.3.0
#> [33] fansi_1.0.4 Rhdf5lib_1.22.0
#> [35] SummarizedExperiment_1.30.0 cli_3.6.1
#> [37] rmarkdown_2.21 crayon_1.5.2
#> [39] generics_0.1.3 httr_1.4.5
#> [41] tzdb_0.3.0 rjson_0.2.21
#> [43] DBI_1.1.3 cachem_1.0.7
#> [45] rhdf5_2.44.0 zlibbioc_1.46.0
#> [47] parallel_4.3.0 BiocManager_1.30.20
#> [49] XVector_0.40.0 restfulr_0.0.15
#> [51] matrixStats_0.63.0 vctrs_0.6.2
#> [53] Matrix_1.5-4 jsonlite_1.8.4
#> [55] bookdown_0.33 IRanges_2.34.0
#> [57] S4Vectors_0.38.0 bit64_4.0.5
#> [59] strawr_0.0.91 jquerylib_0.1.4
#> [61] glue_1.6.2 codetools_0.2-19
#> [63] GenomeInfoDb_1.36.0 GenomicRanges_1.52.0
#> [65] BiocIO_1.10.0 tibble_3.2.1
#> [67] pillar_1.9.0 htmltools_0.5.5
#> [69] rhdf5filters_1.12.0 GenomeInfoDbData_1.2.10
#> [71] R6_2.5.1 dbplyr_2.3.2
#> [73] vroom_1.6.1 evaluate_0.20
#> [75] lattice_0.21-8 Biobase_2.60.0
#> [77] Rsamtools_2.16.0 memoise_2.0.1
#> [79] bslib_0.4.2 Rcpp_1.0.10
#> [81] InteractionSet_1.28.0 xfun_0.39
#> [83] MatrixGenerics_1.12.0 pkgconfig_2.0.3