library(NanoMethViz)

In order to use this package, your data must be converted from the output of methylation calling software to a tabix indexed bgzipped format. The data needs to be sorted by genomic position to respect the requirements of the samtools tabix indexing tool. On Linux and macOS systems this is done using the bash sort utility, which is memory efficient, but on Windows this is done by loading the entire table and sorting within R.

We currently support output from

• Nanopolish
• f5c
• Megalodon

## Data example

The conversion can be done using the create_tabix_file() function. We provide example data of nanopolish output within the package, we can look inside to see how the data looks coming out of nanopolish

methy_calls <- system.file(package = "NanoMethViz",
c("sample1_nanopolish.tsv.gz", "sample2_nanopolish.tsv.gz"))

# have a look at the first 10 rows of methy_data
methy_calls[1], sep = "\t", header = TRUE, nrows = 6)

methy_calls_example
##   chromosome strand     start       end                            read_name
## 1       chr1      - 127732476 127732476 e648c4e3-ca6a-4671-af17-86dab4c819eb
## 2      chr11      - 115423144 115423144 726dd8b5-1531-4279-9cf0-a7e4d5ea0478
## 3      chr11      +  69150806  69150814 34f9ee3e-4b27-4d2d-a203-4067f0662044
## 4       chr1      + 170484965 170484965 d8309c06-375f-4dfe-b22e-0c47af888cd9
## 5       chrY      -   4082060   4082060 f68940f6-4236-4f0f-9af7-a81b5c2911b6
## 6       chr8      + 120733312 120733312 13ae181f-b88b-4d6c-a815-553ff2e25312
##   log_lik_ratio log_lik_methylated log_lik_unmethylated num_calling_strands
## 1         -5.91            -100.38               -94.47                   1
## 2         -8.07            -115.21              -107.13                   1
## 3         -1.65            -183.12              -181.47                   1
## 4          2.74            -112.14              -114.88                   1
## 5         -1.78            -135.09              -133.32                   1
## 6          5.02            -129.31              -134.33                   1
##   num_motifs            sequence
## 1          1         CATTACGTTTC
## 2          1         AACTTCGTTGA
## 3          2 GGTCACGGGAATCCGGTTC
## 4          1         AGAAGCGCTAA
## 5          1         CTCACCGTATA
## 6          1         TCTGACGTTGA

We then create a temporary path to store a converted file, this will be deleted once you exit your R session. Once create_tabix_file() is run, it will create a .bgz file along with its tabix index. Because we have a small amount of data, we can read in a small portion of it for inspection, do not do this with large datasets as it decompresses all the data and will take very long to run.

### Megalodon Data

To import data from Megalodon’s modification calls, the per-read modified bases file must be generated. This can be done by either adding --write-mods-text argument to Megalodon run or using the megalodon_extras per_read_text modified_bases utility.

## Importing data

methy_tabix <- file.path(tempdir(), "methy_data.bgz")
samples <- c("sample1", "sample2")

# you should see messages when running this yourself
create_tabix_file(methy_calls, methy_tabix, samples)

# don't do this with actual data
# we have to use gzfile to tell R that we have a gzip compressed file
methy_data
##    sample  chr      pos strand statistic                            read_name
## 6 sample1 chr1 13127134      -      2.51 7660ba1f-9b44-4783-b901-ed79b2f0481b
Now methy_tabix will be the path to a tabix object that is ready for use with NanoMethViz. Please head over to the “Introduction” vignette to see how to use this data for visualisation!