The aim of HiCDOC is to detect significant A/B compartment changes, using Hi-C data with replicates.
HiCDOC normalizes intrachromosomal Hi-C matrices, uses unsupervised learning to predict A/B compartments from multiple replicates, and detects significant compartment changes between experiment conditions.
It provides a collection of functions assembled into a pipeline:
HiCDOC can be installed from Bioconductor:
if (!requireNamespace("BiocManager", quietly=TRUE))
The package can then be loaded:
HiCDOC can import Hi-C data sets in various different formats:
- Tabular .tsv
- Cooler .cool
or .mcool
- Juicer .hic
- HiC-Pro .matrix
and .bed
A tabular file is a tab-separated multi-replicate sparse matrix with a header:
chromosome position 1 position 2 C1.R1 C1.R2 C2.R1 … Y 1500000 7500000 145 184 72 … …
The number of interactions between position 1
and position 2
are reported in each condition.replicate
column. There is no
limit to the number of conditions and replicates.
To load Hi-C data in this format:
hic.experiment <- HiCDOCDataSetFromTabular('path/to/data.tsv')
To load .cool
or .mcool
files generated by Cooler
(Abdennur and Mirny 2019):
# Path to each file
paths = c(
# Replicate and condition of each file. Can be names instead of numbers.
replicates <- c(1, 2, 1, 2, 1)
conditions <- c(1, 1, 2, 2, 3)
# Resolution to select in .mcool files
binSize = 500000
# Instantiation of data set
hic.experiment <- HiCDOCDataSetFromCool(
replicates = replicates,
conditions = conditions,
binSize = binSize # Specified for .mcool files.
To load .hic
files generated by Juicer (Durand 2016):
# Path to each file
paths = c(
# Replicate and condition of each file. Can be names instead of numbers.
replicates <- c(1, 2, 1, 2, 1)
conditions <- c(1, 1, 2, 2, 3)
# Resolution to select
binSize <- 500000
# Instantiation of data set
hic.experiment <- HiCDOCDataSetFromHiC(
replicates = replicates,
conditions = conditions,
binSize = binSize
To load .matrix
and .bed
files generated by HiC-Pro
(Servant 2015):
# Path to each matrix file
matrixPaths = c(
# Path to each bed file
bedPaths = c(
# Replicate and condition of each file. Can be names instead of numbers.
replicates <- c(1, 2, 1, 2, 1)
conditions <- c(1, 1, 2, 2, 3)
# Instantiation of data set
hic.experiment <- HiCDOCDataSetFromHiCPro(
matrixPaths = matrixPaths,
bedPaths = bedPaths,
replicates = replicates,
conditions = conditions
An example dataset can be loaded from the HiCDOC package:
Once your data is loaded, you can run all the filtering, normalization, and
prediction steps with the command : HiCDOC(exampleHiCDOCDataSet)
This one-liner runs all the steps detailed below.
Remove small chromosomes of length smaller than 100 positions (100 is the default value):
hic.experiment <- filterSmallChromosomes(exampleHiCDOCDataSet, threshold = 100)
#> Keeping chromosomes with at least 100 positions.
#> Kept 3 chromosomes: X, Y, Z
#> Removed 1 chromosome: W
Remove sparse replicates filled with less than 30% non-zero interactions (30% is the default value):
hic.experiment <- filterSparseReplicates(hic.experiment, threshold = 0.3)
#> Keeping replicates filled with at least 30% non-zero interactions.
#> Removed interactions matrix of chromosome X, condition 1, replicate R2 filled at 2.347%.
#> Removed 1 replicate in total.
Remove weak positions with less than 1 interaction in average (1 is the default value):
hic.experiment <- filterWeakPositions(hic.experiment, threshold = 1)
#> Keeping positions with interactions average greater or equal to 1.
#> Chromosome X: 2 positions removed, 118 positions remaining.
#> Chromosome Y: 3 positions removed, 157 positions remaining.
#> Chromosome Z: 0 positions removed, 200 positions remaining.
#> Removed 5 positions in total.
Normalize technical biases such as sequencing depth (inter-matrix normalization) so that matrices are comparable :
suppressWarnings(hic.experiment <- normalizeTechnicalBiases(hic.experiment))
#> Normalizing technical biases.
This normalization uses uses cyclic loess normalization from [multiHiCcompare package] (Stansfield, Cresswell, and Dozmorov 2019).
Note : For large dataset, it is highly recommended to set a value for
parameter to reduce computing time and necessary memory. See
Normalize biological biases, such as GC content, number of restriction sites, etc. (intra-matrix normalization):
hic.experiment <- normalizeBiologicalBiases(hic.experiment)
#> Chromosome X: normalizing biological biases.
#> Chromosome Y: normalizing biological biases.
#> Chromosome Z: normalizing biological biases.
Normalize the linear distance effect resulting from more interactions between
closer genomic regions (20000 is the default value for loessSampleSize
hic.experiment <-
normalizeDistanceEffect(hic.experiment, loessSampleSize = 20000)
#> Chromosome X: normalizing distance effect.
#> Chromosome Y: normalizing distance effect.
#> Chromosome Z: normalizing distance effect.
Predict A and B compartments and detect significant differences, here using the default values as parameters:
hic.experiment <- detectCompartments(hic.experiment)
#> Clustering genomic positions.
#> Predicting A/B compartments.
#> Detecting significant differences.
Plot the interaction matrix of each replicate:
p <- plotInteractions(hic.experiment, chromosome = "Y")