library(NanoMethViz)
In order to use this package, your data must be converted from the output of methylation calling software to a tabix indexed bgzipped format. The data needs to be sorted by genomic position to respect the requirements of the samtools tabix indexing tool. On Linux and macOS systems this is done using the bash sort
utility, which is memory efficient, but on Windows this is done by loading the entire table and sorting within R.
We currently support output from
If you would like any further other formats supported please create an issue at https://github.com/Shians/NanoMethViz/issues.
The conversion can be done using the create_tabix_file()
function. We provide example data of nanopolish output within the package, we can look inside to see how the data looks coming out of nanopolish
methy_calls <- system.file(package = "NanoMethViz",
c("sample1_nanopolish.tsv.gz", "sample2_nanopolish.tsv.gz"))
# have a look at the first 10 rows of methy_data
methy_calls_example <- read.table(
methy_calls[1], sep = "\t", header = TRUE, nrows = 6)
methy_calls_example
## chromosome strand start end read_name
## 1 chr1 - 127732476 127732476 e648c4e3-ca6a-4671-af17-86dab4c819eb
## 2 chr11 - 115423144 115423144 726dd8b5-1531-4279-9cf0-a7e4d5ea0478
## 3 chr11 + 69150806 69150814 34f9ee3e-4b27-4d2d-a203-4067f0662044
## 4 chr1 + 170484965 170484965 d8309c06-375f-4dfe-b22e-0c47af888cd9
## 5 chrY - 4082060 4082060 f68940f6-4236-4f0f-9af7-a81b5c2911b6
## 6 chr8 + 120733312 120733312 13ae181f-b88b-4d6c-a815-553ff2e25312
## log_lik_ratio log_lik_methylated log_lik_unmethylated num_calling_strands
## 1 -5.91 -100.38 -94.47 1
## 2 -8.07 -115.21 -107.13 1
## 3 -1.65 -183.12 -181.47 1
## 4 2.74 -112.14 -114.88 1
## 5 -1.78 -135.09 -133.32 1
## 6 5.02 -129.31 -134.33 1
## num_motifs sequence
## 1 1 CATTACGTTTC
## 2 1 AACTTCGTTGA
## 3 2 GGTCACGGGAATCCGGTTC
## 4 1 AGAAGCGCTAA
## 5 1 CTCACCGTATA
## 6 1 TCTGACGTTGA
We then create a temporary path to store a converted file, this will be deleted once you exit your R session. Once create_tabix_file()
is run, it will create a .bgz file along with its tabix index. Because we have a small amount of data, we can read in a small portion of it for inspection, do not do this with large datasets as it decompresses all the data and will take very long to run.
To import data from Megalodon’s modification calls, the per-read modified bases file must be generated. This can be done by either adding --write-mods-text
argument to Megalodon run or using the megalodon_extras per_read_text modified_bases
utility.
methy_tabix <- file.path(tempdir(), "methy_data.bgz")
samples <- c("sample1", "sample2")
# you should see messages when running this yourself
create_tabix_file(methy_calls, methy_tabix, samples)
# don't do this with actual data
# we have to use gzfile to tell R that we have a gzip compressed file
methy_data <- read.table(
gzfile(methy_tabix), col.names = methy_col_names(), nrows = 6)
methy_data
## sample chr pos strand statistic read_name
## 1 sample2 chr1 5141050 - 6.93 3818f2e2-d520-4305-bbab-efad891f67f2
## 2 sample1 chr1 6283067 - 1.05 36e3c55f-c41f-4bd6-b371-54368d013008
## 3 sample1 chr1 7975278 - 1.39 6f6cbc59-af4c-4dfa-8e48-ef4ac4eeb13b
## 4 sample1 chr1 10230292 - 2.19 fbe53b38-e264-4c7a-824e-2651c22f8ea6
## 5 sample1 chr1 13127127 - 2.51 7660ba1f-9b44-4783-b901-ed79b2f0481b
## 6 sample1 chr1 13127134 - 2.51 7660ba1f-9b44-4783-b901-ed79b2f0481b
Now methy_tabix
will be the path to a tabix object that is ready for use with NanoMethViz. Please head over to the “Introduction” vignette to see how to use this data for visualisation!
The methylation data can be exported into formats appropriate for bsseq, DSS, or edgeR.
Both bsseq and DSS make use of the BSSeq object, and these can be obtained from the NanoMethResult objects using the methy_to_bsseq()
function.
nmr <- load_example_nanomethresult()
bss <- methy_to_bsseq(nmr)
bss
## An object of type 'BSseq' with
## 4778 methylation loci
## 6 samples
## has not been smoothed
## All assays are in-memory
edgeR can also be used to perform differential methylation analysis: https://f1000research.com/articles/6-2055. BSSeq objects can be converted into the appropriate format using the bsseq_to_edger()
function. This can be used to count reads on a per-site basis or over regions.
gene_regions <- exons_to_genes(NanoMethViz::exons(nmr))
edger_mat <- bsseq_to_edger(bss, gene_regions)
edger_mat
## B6Cast_Prom_1_bl6_Me B6Cast_Prom_1_bl6_Un B6Cast_Prom_1_cast_Me
## 12189 3259 2033 2628
## 12190 2384 1349 1604
## 16210 2696 1173 1564
## 17263 1752 1672 2156
## 18616 1812 595 1436
## 213742 1264 803 848
## B6Cast_Prom_1_cast_Un B6Cast_Prom_2_bl6_Me B6Cast_Prom_2_bl6_Un
## 12189 1970 3380 1764
## 12190 1627 2564 1853
## 16210 1788 2544 895
## 17263 1831 2027 1698
## 18616 1184 1690 573
## 213742 1465 1012 596
## B6Cast_Prom_2_cast_Me B6Cast_Prom_2_cast_Un B6Cast_Prom_3_bl6_Me
## 12189 2663 2043 3658
## 12190 1860 1573 2110
## 16210 1630 1693 1955
## 17263 1349 1153 1787
## 18616 1442 1520 3011
## 213742 1072 1040 1432
## B6Cast_Prom_3_bl6_Un B6Cast_Prom_3_cast_Me B6Cast_Prom_3_cast_Un
## 12189 2341 2565 1968
## 12190 1502 1884 1663
## 16210 769 1466 1787
## 17263 1880 1883 1694
## 18616 1331 948 931
## 213742 728 653 1189