1 Introduction

This package provides basic functions for analyzing next-generation sequencing data of cell-free DNA (cfDNA). The package focuses on extracting the length of cfDNA sample and visualization of genome-wide enrichment of short-fragmented cfDNA. The aberration of fragmentation profile, denoted modified ctDNA estimation score (CES), allows quantification of circulating tumor DNA (ctDNA). We recommend using this package to analysis shallow whole-genome sequencing data (~0.3X or more). This package was complemented by Bioconductor packages e.g. QDNAseq, Rsamtools and GenomicRanges which could further expand the functionality of this package in the future.

1.1 Installation

1.1.1 Install via the Bioconductor repository

if (!requireNamespace("BiocManager", quietly = TRUE))
  install.packages("BiocManager")

BiocManager::install("cfdnakit")

1.1.2 Install the latest version via github

To install this package is via this github repository, please follow the instruction below.

Install prerequisites packages

if(! "devtools" %in% rownames(installed.packages()))
    install.packages("devtools")
if(! "BiocManager" %in% rownames(installed.packages()))
    install.packages("BiocManager")

Install cfdnakit package

library(devtools)  ### use devtools
install_github("Pitithat-pu/cfdnakit") ### install cfDNAKit 

The installation should work fine without non-zero exit status. Try load cfdnakit package into current R session

library(cfdnakit) ### Load cfdnakit package

2 Prepare input BAM

A coordination-sorted BAM file of cfDNA from any liquid biopsy source (e.g. blood-plasma, CSF, urine) should be applicable. You should examine if the sequencing coverage reach coverage threshold before the analysis. Please make sure that the sequencing reads are mapped onto the following version of reference genome. cfdnakit supports the human reference genome GRCh37(hg19) and GRCh38(hg38). Preliminary fragment-length analysis can be performed using mouse cfDNA when mapping onto the mouse reference genome GRCm38(mm10).

3 Read the BAM file with read_bamfile

Let’s read sequence alignment file (.bam) using function read_bamfile. A BAM index file (.bai) is necessary for efficiently reading the file. If it doesn’t exist in the same directory, this function will automatically create one. This function will split sequencing reads into equal-size non-overlapping windows. Possible size of bin are 100, 500, and 1000 KB. A path to the input file is given to the function read_bamfile. A SampleBam object will be created as the result.

For demonstration, we read an example sequence file ex.plasma.bam.

library(cfdnakit)
sample_bamfile <- system.file("extdata",
                             "ex.plasma.bam",
                package = "cfdnakit")
plasma_SampleBam <- read_bamfile(sample_bamfile,
                                apply_blacklist = FALSE)
#> The BAM index file (.bai) is missing. Creating an index file
#> Bam index file created.
#> Reading bamfile

By default, running read_bamfile will split reads into 1000 KB non-overlapping bins based-on the human reference genome GRCh37.

Reading the file should take a while depending on the size of your BAM. We recommend to save the result as RDS file using saveRDS function.

### Optional
saveRDS(plasma_SampleBam, file = "patientcfDNA_SampleBam.RDS")

4 Analyse the Fragment Length Distribution

Fragment-length distribution of the sample can be visualized with function plot_fragment_dist. In the top-right legend, the modal length (bp) is shown in the parenthesis behind each sample name. The x-axis is the length of cfDNA; y-axis is the proportion of cfDNA fragment having a specific length.

plot_fragment_dist(list("Plasma.Sample"=plasma_SampleBam))

This document will demonstate by using SampleBam object (example_patientcfDNA_SampleBam.RDS). Now we load this file into R session.

example_RDS <- "example_patientcfDNA_SampleBam.RDS"
example_RDS_file <-
    system.file("extdata",example_RDS,
                package = "cfdnakit")
sample_bins <- readRDS(example_RDS_file)

In general, plasma cfDNA should show non-random fragmentation pattern. The modal length of cfDNA is 167 bp and 10-bp periodic peaks. However, tumor-derived cfDNA are observed to be shorter (<150 bp) than cfDNA of non-tumor origin. Here, we compare the fragment length distribution of patient’s cfDNA with healthy individual. We derived a healthy cfDNA from Snyder et al. (2016) and create a RData file “BH01_chr15.RDS”. This file can be loaded in R environment with readRDS function.

control_rds<-"BH01_CHR15.SampleBam.rds"
control_RDS_file <-
    system.file("extdata",control_rds,
                package = "cfdnakit")
control_bins <-
    readRDS(control_RDS_file)

To provide visual comparison of cell-free DNA fragmentation, cfdnakit provide a function that allows plotting multiple distribution plots. We create a list of SampleBam files and plot their distribution together with function plot_fragment_dist. Each element in the list must be given a distinct sample name (e.g. Healthy.cfDNA).

comparing_list <- list("Healthy.cfDNA"=control_bins,
                      "Patient.1"=sample_bins)
plot_fragment_dist(comparing_list)