Importing/Exporting Data

library(NanoMethViz)

In order to use this package, your data must be converted from the output of methylation calling software to a tabix indexed bgzipped format. The data needs to be sorted by genomic position to respect the requirements of the samtools tabix indexing tool. On Linux and macOS systems this is done using the bash sort utility, which is memory efficient, but on Windows this is done by loading the entire table and sorting within R.

We currently support output from

Nanopolish
f5c
Megalodon

If you would like any further other formats supported please create an issue at https://github.com/Shians/NanoMethViz/issues.

Data example

The conversion can be done using the create_tabix_file() function. We provide example data of nanopolish output within the package, we can look inside to see how the data looks coming out of nanopolish

methy_calls <- system.file(package = "NanoMethViz",
    c("sample1_nanopolish.tsv.gz", "sample2_nanopolish.tsv.gz"))

# have a look at the first 10 rows of methy_data
methy_calls_example <- read.table(
    methy_calls[1], sep = "\t", header = TRUE, nrows = 6)

methy_calls_example

##   chromosome strand     start       end                            read_name
## 1       chr1      - 127732476 127732476 e648c4e3-ca6a-4671-af17-86dab4c819eb
## 2      chr11      - 115423144 115423144 726dd8b5-1531-4279-9cf0-a7e4d5ea0478
## 3      chr11      +  69150806  69150814 34f9ee3e-4b27-4d2d-a203-4067f0662044
## 4       chr1      + 170484965 170484965 d8309c06-375f-4dfe-b22e-0c47af888cd9
## 5       chrY      -   4082060   4082060 f68940f6-4236-4f0f-9af7-a81b5c2911b6
## 6       chr8      + 120733312 120733312 13ae181f-b88b-4d6c-a815-553ff2e25312
##   log_lik_ratio log_lik_methylated log_lik_unmethylated num_calling_strands
## 1         -5.91            -100.38               -94.47                   1
## 2         -8.07            -115.21              -107.13                   1
## 3         -1.65            -183.12              -181.47                   1
## 4          2.74            -112.14              -114.88                   1
## 5         -1.78            -135.09              -133.32                   1
## 6          5.02            -129.31              -134.33                   1
##   num_motifs            sequence
## 1          1         CATTACGTTTC
## 2          1         AACTTCGTTGA
## 3          2 GGTCACGGGAATCCGGTTC
## 4          1         AGAAGCGCTAA
## 5          1         CTCACCGTATA
## 6          1         TCTGACGTTGA

We then create a temporary path to store a converted file, this will be deleted once you exit your R session. Once create_tabix_file() is run, it will create a .bgz file along with its tabix index. Because we have a small amount of data, we can read in a small portion of it for inspection, do not do this with large datasets as it decompresses all the data and will take very long to run.

Megalodon Data

To import data from Megalodon’s modification calls, the per-read modified bases file must be generated. This can be done by either adding --write-mods-text argument to Megalodon run or using the megalodon_extras per_read_text modified_bases utility.

Importing data

methy_tabix <- file.path(tempdir(), "methy_data.bgz")
samples <- c("sample1", "sample2")

# you should see messages when running this yourself
create_tabix_file(methy_calls, methy_tabix, samples)

# don't do this with actual data
# we have to use gzfile to tell R that we have a gzip compressed file
methy_data <- read.table(
    gzfile(methy_tabix), col.names = methy_col_names(), nrows = 6)

methy_data

##    sample  chr      pos strand statistic                            read_name
## 1 sample2 chr1  5141050      -      6.93 3818f2e2-d520-4305-bbab-efad891f67f2
## 2 sample1 chr1  6283067      -      1.05 36e3c55f-c41f-4bd6-b371-54368d013008
## 3 sample1 chr1  7975278      -      1.39 6f6cbc59-af4c-4dfa-8e48-ef4ac4eeb13b
## 4 sample1 chr1 10230292      -      2.19 fbe53b38-e264-4c7a-824e-2651c22f8ea6
## 5 sample1 chr1 13127127      -      2.51 7660ba1f-9b44-4783-b901-ed79b2f0481b
## 6 sample1 chr1 13127134      -      2.51 7660ba1f-9b44-4783-b901-ed79b2f0481b

Now methy_tabix will be the path to a tabix object that is ready for use with NanoMethViz. Please head over to the “Introduction” vignette to see how to use this data for visualisation!

Exporting data

The methylation data can be exported into formats appropriate for bsseq, DSS, or edgeR.

bsseq and DSS

Both bsseq and DSS make use of the BSSeq object, and these can be obtained from the NanoMethResult objects using the methy_to_bsseq() function.

nmr <- load_example_nanomethresult()
bss <- methy_to_bsseq(nmr)

bss

## An object of type 'BSseq' with
##   4778 methylation loci
##   6 samples
## has not been smoothed
## All assays are in-memory

edgeR

edgeR can also be used to perform differential methylation analysis: https://f1000research.com/articles/6-2055. BSSeq objects can be converted into the appropriate format using the bsseq_to_edger() function. This can be used to count reads on a per-site basis or over regions.

gene_regions <- exons_to_genes(NanoMethViz::exons(nmr))
edger_mat <- bsseq_to_edger(bss, gene_regions)

edger_mat

##        B6Cast_Prom_1_bl6_Me B6Cast_Prom_1_bl6_Un B6Cast_Prom_1_cast_Me
## 12189                  3259                 2033                  2628
## 12190                  2384                 1349                  1604
## 16210                  2696                 1173                  1564
## 17263                  1752                 1672                  2156
## 18616                  1812                  595                  1436
## 213742                 1264                  803                   848
##        B6Cast_Prom_1_cast_Un B6Cast_Prom_2_bl6_Me B6Cast_Prom_2_bl6_Un
## 12189                   1970                 3380                 1764
## 12190                   1627                 2564                 1853
## 16210                   1788                 2544                  895
## 17263                   1831                 2027                 1698
## 18616                   1184                 1690                  573
## 213742                  1465                 1012                  596
##        B6Cast_Prom_2_cast_Me B6Cast_Prom_2_cast_Un B6Cast_Prom_3_bl6_Me
## 12189                   2663                  2043                 3658
## 12190                   1860                  1573                 2110
## 16210                   1630                  1693                 1955
## 17263                   1349                  1153                 1787
## 18616                   1442                  1520                 3011
## 213742                  1072                  1040                 1432
##        B6Cast_Prom_3_bl6_Un B6Cast_Prom_3_cast_Me B6Cast_Prom_3_cast_Un
## 12189                  2341                  2565                  1968
## 12190                  1502                  1884                  1663
## 16210                   769                  1466                  1787
## 17263                  1880                  1883                  1694
## 18616                  1331                   948                   931
## 213742                  728                   653                  1189