Contacts
classHiContacts
package implements the new Contacts
S4 class. It is build
on pre-existing Bioconductor classes, namely InteractionSet
,
GenomicInterations
and ContactMatrix
(Lun, Perry & Ing-Simmons, F1000Research 2016
), and leverages them to
import locally stored .(m)cool
files. It further provides analytical
and visualization tools to investigate contact maps directly in R
.
showClass("Contacts")
#> Class "Contacts" [package "HiContacts"]
#>
#> Slots:
#>
#> Name: fileName focus resolutions
#> Class: character characterOrNULL numeric
#>
#> Name: resolution interactions scores
#> Class: numeric GInteractions SimpleList
#>
#> Name: topologicalFeatures pairsFile metadata
#> Class: SimpleList characterOrNULL list
#>
#> Extends: "Annotated"
contacts <- contacts_yeast()
#> snapshotDate(): 2022-10-24
#> see ?HiContactsData and browseVignettes('HiContactsData') for documentation
#> loading from cache
contacts
#> `Contacts` object with 74,360 interactions over 802 regions
#> -------
#> fileName: "/home/biocbuild/.cache/R/ExperimentHub/1244a41ea62ba2_7752"
#> focus: "II"
#> resolutions(5): 1000 2000 4000 8000 16000
#> current resolution: 1000
#> interactions: 74360
#> scores(2): raw balanced
#> topologicalFeatures: loops(0) borders(0) compartments(0) viewpoints(0)
#> pairsFile: N/A
#> metadata(0):
citation('HiContacts')
#>
#> To cite package 'HiContacts' in publications use:
#>
#> Serizay J (2022). _HiContacts: HiContacts: R interface to cool
#> files_. R package version 1.1.0,
#> <https://github.com/js2264/HiContacts>.
#>
#> A BibTeX entry for LaTeX users is
#>
#> @Manual{,
#> title = {HiContacts: HiContacts: R interface to cool files},
#> author = {Jacques Serizay},
#> year = {2022},
#> note = {R package version 1.1.0},
#> url = {https://github.com/js2264/HiContacts},
#> }
.(m)/cool
files as Contacts
objectsThe HiContactsData
package gives access to a range of toy datasets stored
by Bioconductor in the ExperimentHub
.
library(HiContacts)
library(HiContactsData)
cool_file <- HiContactsData('yeast_wt', format = 'cool')
#> snapshotDate(): 2022-10-24
#> see ?HiContactsData and browseVignettes('HiContactsData') for documentation
#> loading from cache
cool_file
#> EH7701
#> "/home/biocbuild/.cache/R/ExperimentHub/c5a68237e7c40_7751"
The Contacts()
function can be used to import a Hi-C matrix locally stored
as a cool
/mcool
file. It creates a Contacts
object.
range <- 'I:20000-80000' # range of interest
contacts <- Contacts(cool_file, focus = range)
summary(contacts)
#> `Contacts` object with 1,653 interactions over 59 regions
contacts
#> `Contacts` object with 1,653 interactions over 59 regions
#> -------
#> fileName: "/home/biocbuild/.cache/R/ExperimentHub/c5a68237e7c40_7751"
#> focus: "I:20,000-80,000"
#> resolutions(1): 1000
#> current resolution: 1000
#> interactions: 1653
#> scores(2): raw balanced
#> topologicalFeatures: loops(0) borders(0) compartments(0) viewpoints(0)
#> pairsFile: N/A
#> metadata(0):
Contacts()
works with .mcool
files as well: in this case, the user needs to
specify the resolution at which count values are recovered.
mcool_file <- HiContactsData('yeast_wt', format = 'mcool')
#> snapshotDate(): 2022-10-24
#> see ?HiContactsData and browseVignettes('HiContactsData') for documentation
#> loading from cache
range <- 'II:0-800000' # range of interest
lsCoolResolutions(mcool_file, verbose = TRUE)
#> resolutions(5): 1000 2000 4000 8000 16000
#>
# contacts <- Contacts(mcool_file, focus = range) # This would throw an error!
contacts <- Contacts(mcool_file, focus = range, res = 1000)
contacts
#> `Contacts` object with 73,257 interactions over 790 regions
#> -------
#> fileName: "/home/biocbuild/.cache/R/ExperimentHub/1244a41ea62ba2_7752"
#> focus: "II:0-800,000"
#> resolutions(5): 1000 2000 4000 8000 16000
#> current resolution: 1000
#> interactions: 73257
#> scores(2): raw balanced
#> topologicalFeatures: loops(0) borders(0) compartments(0) viewpoints(0)
#> pairsFile: N/A
#> metadata(0):
One can also extract a count matrix from a .(m)cool
file that is not
centered at the diagonal. To do this, specify a couple of coordinates in the
focus
argument using a character string formatted as "... x ..."
:
contacts <- Contacts(mcool_file, focus = 'II:0-500000 x II:100000-600000', res = 1000)
is_symmetrical(contacts)
#> [1] FALSE
Slots for a Contacts
object can be accessed using the following getters
:
fileName(contacts)
#> [1] "/home/biocbuild/.cache/R/ExperimentHub/1244a41ea62ba2_7752"
focus(contacts)
#> [1] "II:0-500000 x II:100000-600000"
resolutions(contacts)
#> [1] 1000 2000 4000 8000 16000
resolution(contacts)
#> [1] 1000
interactions(contacts)
#> GInteractions object with 41468 interactions and 2 metadata columns:
#> seqnames1 ranges1 seqnames2 ranges2 | bin_id1
#> <Rle> <IRanges> <Rle> <IRanges> | <numeric>
#> [1] II 1-1000 --- II 105001-106000 | 232
#> [2] II 1-1000 --- II 119001-120000 | 232
#> [3] II 1-1000 --- II 126001-127000 | 232
#> [4] II 1-1000 --- II 136001-137000 | 232
#> [5] II 1-1000 --- II 255001-256000 | 232
#> ... ... ... ... ... ... . ...
#> [41464] II 498001-499000 --- II 586001-587000 | 730
#> [41465] II 498001-499000 --- II 587001-588000 | 730
#> [41466] II 498001-499000 --- II 591001-592000 | 730
#> [41467] II 498001-499000 --- II 594001-595000 | 730
#> [41468] II 499001-500000 --- II 499001-500000 | 731
#> bin_id2
#> <numeric>
#> [1] 337
#> [2] 351
#> [3] 358
#> [4] 368
#> [5] 487
#> ... ...
#> [41464] 818
#> [41465] 819
#> [41466] 823
#> [41467] 826
#> [41468] 731
#> -------
#> regions: 588 ranges and 3 metadata columns
#> seqinfo: 16 sequences from an unspecified genome
scores(contacts)
#> List of length 2
#> names(2): raw balanced
tail(scores(contacts, 1))
#> [1] 1 1 2 1 1 31
tail(scores(contacts, 'balanced'))
#> [1] 0.0006993195 0.0007046892 0.0012934069 0.0017022105 0.0008050573
#> [6] 0.0141594924
topologicalFeatures(contacts)
#> List of length 4
#> names(4): loops borders compartments viewpoints
pairsFile(contacts)
#> NULL
metadata(contacts)
#> list()
Several extra functions are available as well:
seqinfo(contacts) ## To recover the `Seqinfo` object from the `.(m)cool` file
#> Seqinfo object with 16 sequences from an unspecified genome:
#> seqnames seqlengths isCircular genome
#> I 230218 <NA> <NA>
#> II 813184 <NA> <NA>
#> III 316620 <NA> <NA>
#> IV 1531933 <NA> <NA>
#> V 576874 <NA> <NA>
#> ... ... ... ...
#> XII 1078177 <NA> <NA>
#> XIII 924431 <NA> <NA>
#> XIV 784333 <NA> <NA>
#> XV 1091291 <NA> <NA>
#> XVI 948066 <NA> <NA>
bins(contacts) ## To bin the genome at the current resolution
#> GRanges object with 12079 ranges and 1 metadata column:
#> seqnames ranges strand | bin_id
#> <Rle> <IRanges> <Rle> | <integer>
#> I_1_1000 I 1-1000 * | 1
#> I_1001_2000 I 1001-2000 * | 2
#> I_2001_3000 I 2001-3000 * | 3
#> I_3001_4000 I 3001-4000 * | 4
#> I_4001_5000 I 4001-5000 * | 5
#> ... ... ... ... . ...
#> XVI_944001_945000 XVI 944001-945000 * | 12075
#> XVI_945001_946000 XVI 945001-946000 * | 12076
#> XVI_946001_947000 XVI 946001-947000 * | 12077
#> XVI_947001_948000 XVI 947001-948000 * | 12078
#> XVI_948001_948066 XVI 948001-948066 * | 12079
#> -------
#> seqinfo: 16 sequences from an unspecified genome
regions(contacts) ## To extract unique regions of the contact matrix
#> GRanges object with 588 ranges and 3 metadata columns:
#> seqnames ranges strand | chr center bin_id
#> <Rle> <IRanges> <Rle> | <Rle> <integer> <integer>
#> II_1_1000 II 1-1000 * | II 500 232
#> II_1001_2000 II 1001-2000 * | II 1500 233
#> II_5001_6000 II 5001-6000 * | II 5500 237
#> II_6001_7000 II 6001-7000 * | II 6500 238
#> II_7001_8000 II 7001-8000 * | II 7500 239
#> ... ... ... ... . ... ... ...
#> II_593001_594000 II 593001-594000 * | II 593500 825
#> II_594001_595000 II 594001-595000 * | II 594500 826
#> II_595001_596000 II 595001-596000 * | II 595500 827
#> II_596001_597000 II 596001-597000 * | II 596500 828
#> II_597001_598000 II 597001-598000 * | II 597500 829
#> -------
#> seqinfo: 16 sequences from an unspecified genome
anchors(contacts) ## To extract "first" and "second" anchors for each interaction
#> $first
#> GRanges object with 41468 ranges and 3 metadata columns:
#> seqnames ranges strand | chr center bin_id
#> <Rle> <IRanges> <Rle> | <Rle> <integer> <integer>
#> [1] II 1-1000 * | II 500 232
#> [2] II 1-1000 * | II 500 232
#> [3] II 1-1000 * | II 500 232
#> [4] II 1-1000 * | II 500 232
#> [5] II 1-1000 * | II 500 232
#> ... ... ... ... . ... ... ...
#> [41464] II 498001-499000 * | II 498500 730
#> [41465] II 498001-499000 * | II 498500 730
#> [41466] II 498001-499000 * | II 498500 730
#> [41467] II 498001-499000 * | II 498500 730
#> [41468] II 499001-500000 * | II 499500 731
#> -------
#> seqinfo: 16 sequences from an unspecified genome
#>
#> $second
#> GRanges object with 41468 ranges and 3 metadata columns:
#> seqnames ranges strand | chr center bin_id
#> <Rle> <IRanges> <Rle> | <Rle> <integer> <integer>
#> [1] II 105001-106000 * | II 105500 337
#> [2] II 119001-120000 * | II 119500 351
#> [3] II 126001-127000 * | II 126500 358
#> [4] II 136001-137000 * | II 136500 368
#> [5] II 255001-256000 * | II 255500 487
#> ... ... ... ... . ... ... ...
#> [41464] II 586001-587000 * | II 586500 818
#> [41465] II 587001-588000 * | II 587500 819
#> [41466] II 591001-592000 * | II 591500 823
#> [41467] II 594001-595000 * | II 594500 826
#> [41468] II 499001-500000 * | II 499500 731
#> -------
#> seqinfo: 16 sequences from an unspecified genome
Add any scores
metric using a numerical vector.
scores(contacts, 'random') <- runif(length(contacts))
scores(contacts)
#> List of length 3
#> names(3): raw balanced random
tail(scores(contacts, 'random'))
#> [1] 0.7962524 0.3564971 0.3176975 0.6596419 0.7242686 0.2961170
Add topologicalFeatures
using GRanges
or Pairs
.
topologicalFeatures(contacts, 'viewpoints') <- GRanges("II:300000-320000")
topologicalFeatures(contacts)
#> List of length 4
#> names(4): loops borders compartments viewpoints
topologicalFeatures(contacts, 'viewpoints')
#> GRanges object with 1 range and 0 metadata columns:
#> seqnames ranges strand
#> <Rle> <IRanges> <Rle>
#> [1] II 300000-320000 *
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
Contacts
Using the as()
function, Contacts
can be coerced in GInteractions
,
ContactMatrix
and matrix
seamlessly.
as(contacts, "GInteractions")
#> GInteractions object with 41468 interactions and 2 metadata columns:
#> seqnames1 ranges1 seqnames2 ranges2 | bin_id1
#> <Rle> <IRanges> <Rle> <IRanges> | <numeric>
#> [1] II 1-1000 --- II 105001-106000 | 232
#> [2] II 1-1000 --- II 119001-120000 | 232
#> [3] II 1-1000 --- II 126001-127000 | 232
#> [4] II 1-1000 --- II 136001-137000 | 232
#> [5] II 1-1000 --- II 255001-256000 | 232
#> ... ... ... ... ... ... . ...
#> [41464] II 498001-499000 --- II 586001-587000 | 730
#> [41465] II 498001-499000 --- II 587001-588000 | 730
#> [41466] II 498001-499000 --- II 591001-592000 | 730
#> [41467] II 498001-499000 --- II 594001-595000 | 730
#> [41468] II 499001-500000 --- II 499001-500000 | 731
#> bin_id2
#> <numeric>
#> [1] 337
#> [2] 351
#> [3] 358
#> [4] 368
#> [5] 487
#> ... ...
#> [41464] 818
#> [41465] 819
#> [41466] 823
#> [41467] 826
#> [41468] 731
#> -------
#> regions: 588 ranges and 3 metadata columns
#> seqinfo: 16 sequences from an unspecified genome
as(contacts, "ContactMatrix")
#> class: ContactMatrix
#> dim: 588 588
#> type: matrix
#> rownames: NULL
#> colnames: NULL
#> metadata(0):
#> regions: 588
as(contacts, "matrix")[1:10, 1:10]
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#> [1,] NA NA NA NA NA NA NA NA NA NA
#> [2,] NA NA NA NA NA NA NA NA NA NA
#> [3,] NA NA NA NA NA NA NA NA NA NA
#> [4,] NA NA NA NA NA NA NA NA NA NA
#> [5,] NA NA NA NA NA NA NA NA NA NA
#> [6,] NA NA NA NA NA NA NA NA NA NA
#> [7,] NA NA NA NA NA NA NA NA NA NA
#> [8,] NA NA NA NA NA NA NA NA NA NA
#> [9,] NA NA NA NA NA NA NA NA NA NA
#> [10,] NA NA NA NA NA NA NA NA NA NA
The plotMatrix
function takes a Contacts
object and plots it as a heatmap.
Use the use.scores
argument to specify which type of interaction scores to use
in the contact maps (e.g. raw
, balanced
, …). By default, plotMatrix()
looks for balanced scores. If they are not stored in the original .(m)/cool
file,
plotMatrix()
simply takes the first scores available.
plotMatrix(contacts, use.scores = 'balanced', limits = c(-4, -1), dpi = 120)