Fit-Hi-C is a tool for assigning statistical confidence estimates to intra-chromosomal contact maps produced by genome-wide genome architecture assays such as Hi-C. Compared to Python original, Fit-Hi-C R port has the following advantages:
To install this package, start R and enter
## try http:// if https:// URLs are not supported
source("https://bioconductor.org/biocLite.R")
biocLite("FitHiC")
There are two ways to retrieve development versions
## try http:// if https:// URLs are not supported
source("https://bioconductor.org/biocLite.R")
biocLite("BiocInstaller")
useDevel()
biocLite("FitHiC")
x.y.z
, open a terminal and enterwget http://bioconductor.org/packages/devel/bioc/src/contrib/FitHiC_x.y.z.tar.gz .
R CMD INSTALL FitHiC_x.y.z.tar.gz
Before running Fit-Hi-C, two input files should be prepared.
Chromosome.Name | Column.2 | Mid.Point | Hit.Count | Column.5 |
---|---|---|---|---|
1 | 0 | 1305 | 0 | 0 |
1 | 0 | 2635 | 233 | 1 |
1 | 0 | 4756 | 876 | 1 |
1 | 0 | 8568 | 1076 | 1 |
1 | 0 | 10384 | 1210 | 1 |
1 | 0 | 12246 | 639 | 1 |
Chromosome1.Name | Mid.Point.1 | Chromosome2.Name | Mid.Point.2 | Hit.Count |
---|---|---|---|---|
10 | 100894 | 10 | 150593 | 2 |
10 | 100894 | 10 | 162267 | 1 |
10 | 100894 | 10 | 169783 | 2 |
10 | 100894 | 10 | 179515 | 3 |
10 | 100894 | 10 | 182528 | 1 |
10 | 100894 | 10 | 185071 | 1 |
Besides, OUTDIR, the path where the output files will be stored, is also required to be specified.
After the input data is well prepared, you can easily run Fit-Hi-C in R as:
library("FitHiC")
FitHiC(FRAGSFILE, INTERSFILE, OUTDIR, ...)
If you want to output images simultaneously, explicitly set visual
to TRUE:
library("FitHiC")
FitHiC(FRAGSFILE, INTERSFILE, OUTDIR, ..., visual=TRUE)
The pre-processed Hi-C data is from Yeast - EcoRI 1. FRAGSFILE and INTERSFILE are located in system.file("extdata", "fragmentLists/Duan_yeast_EcoRI.gz", package = "FitHiC")
and system.file( "extdata", "contactCounts/Duan_yeast_EcoRI.gz", package = "FitHiC")
, respectively. When input data is ready, run as follows:
library("FitHiC")
fragsfile <- system.file("extdata", "fragmentLists/Duan_yeast_EcoRI.gz",
package = "FitHiC")
intersfile <- system.file("extdata", "contactCounts/Duan_yeast_EcoRI.gz",
package = "FitHiC")
FitHiC(fragsfile, intersfile, getwd(), libname="Duan_yeast_EcoRI",
distUpThres=250000, distLowThres=10000)
Internally, Fit-Hi-C will successively call generate_FragPairs
, read_ICE_biases
, read_All_Interactions
, calculateing_Probabilities
, fit_Spline
methods. The execution of Fit-Hi-C will be successfully completed till the following log appears:
## Fit-Hi-C is processing ...
## Running generate_FragPairs method ...
## Complete generate_FragPairs method [OK]
## Running read_All_Interactions method ...
## Complete read_All_Interactions method [OK]
## Running calculating_Probabilities method ...
## Writing Duan_yeast_EcoRI.fithic_pass1.txt
## Complete calculating_Probabilities method [OK]
## Running fit_Spline method ...
## Writing p-values to file Duan_yeast_EcoRI.spline_pass1.significances.txt.gz
## Complete fit_Spline method [OK]
## Running calculating_Probabilities method ...
## Writing Duan_yeast_EcoRI.fithic_pass2.txt
## Complete calculating_Probabilities method [OK]
## Running fit_Spline method ...
## Writing p-values to file Duan_yeast_EcoRI.spline_pass2.significances.txt.gz
## Complete fit_Spline method [OK]
## Execution of Fit-Hi-C completed successfully. [DONE]
## .Primitive("return")
The output files come from two internal methods called by Fit-Hi-C.
avgGenomicDist | contactProbability | standardError | noOfLocusPairs | totalOfContactCounts |
---|---|---|---|---|
10105 | 3.12e-05 | 2.7e-06 | 322 | 22212 |
10315 | 3.05e-05 | 2.5e-06 | 330 | 22251 |
10545 | 2.87e-05 | 2.1e-06 | 350 | 22191 |
10779 | 2.97e-05 | 3.0e-06 | 344 | 22583 |
10982 | 3.16e-05 | 2.7e-06 | 323 | 22532 |
11196 | 3.32e-05 | 2.7e-06 | 302 | 22185 |
avgGenomicDist | contactProbability | standardError | noOfLocusPairs | totalOfContactCounts |
---|---|---|---|---|
10107 | 1.15e-05 | 8e-07 | 252 | 6428 |
10317 | 1.31e-05 | 9e-07 | 266 | 7709 |
10546 | 1.43e-05 | 8e-07 | 281 | 8887 |
10779 | 1.27e-05 | 8e-07 | 285 | 7974 |
10982 | 1.32e-05 | 8e-07 | 255 | 7426 |
11196 | 1.40e-05 | 8e-07 | 238 | 7356 |
chr1 | fragmentMid1 | chr2 | fragmentMid2 | contactCount | p_value | q_value |
---|---|---|---|---|---|---|
10 | 100894 | 10 | 150593 | 2 | 0.9988785 | 1 |
10 | 100894 | 10 | 162267 | 1 | 0.9985433 | 1 |
10 | 100894 | 10 | 169783 | 2 | 0.9708609 | 1 |
10 | 100894 | 10 | 179515 | 3 | 0.8072602 | 1 |
10 | 100894 | 10 | 182528 | 1 | 0.9831568 | 1 |
10 | 100894 | 10 | 185071 | 1 | 0.9795001 | 1 |
chr1 | fragmentMid1 | chr2 | fragmentMid2 | contactCount | p_value | q_value |
---|---|---|---|---|---|---|
10 | 100894 | 10 | 150593 | 2 | 0.9813195 | 1 |
10 | 100894 | 10 | 162267 | 1 | 0.9902851 | 1 |
10 | 100894 | 10 | 169783 | 2 | 0.8983241 | 1 |
10 | 100894 | 10 | 179515 | 3 | 0.6547083 | 1 |
10 | 100894 | 10 | 182528 | 1 | 0.9571117 | 1 |
10 | 100894 | 10 | 185071 | 1 | 0.9501637 | 1 |
If visual
is set to TRUE, corresponding images will be also outputed:
For questions about the use of Fit-Hi-C method, to request pre-processed Hi-C data or additional features and scripts, or to report bugs and provide feedback please e-mail Ferhat Ay.
Ferhat Ay <ferhatay at uw period edu>
Duan Z, et al. 2010. A three-dimensional model of the yeast genome. Nature 465: 363–367.↩