correctCoverageBias {PureCN}R Documentation

Correct for GC bias

Description

Takes as input coverage data and a mapping file for GC content. Will then normalize coverage data for GC-bias. Optionally plots the pre and post normalization GC profiles.

Usage

correctCoverageBias(coverage.file, gc.gene.file, output.file = NULL,
  method = c("LOESS", "POLYNOMIAL"), plot.gc.bias = FALSE,
  plot.max.density = 50000)

Arguments

coverage.file

Coverage file or coverage data parsed with the readCoverageFile function.

gc.gene.file

File providing GC content for each exon in the coverage files. First column in format CHR:START-END. Second column GC content (0 to 1). Third column provides gene symbols, which are optional, but used in runAbsoluteCN to generate gene level calls. This file can be generated with GATK GCContentByInterval tool or with the calculateGCContentByInterval function.

output.file

Optionally, write file with GC corrected coverage. Can be read with the readCoverageFile function.

method

Two options for normalization are available: The default "LOESS" largely follows the GC correction of the TitanCNA package. The "POLYNOMIAL" method models the coverage data as a polynomial of degree three and normalizes using the EM approach. The "POLYNOMIAL" is expected to be more robust for very small targeted panels. "POLYNOMIAL" does not support off-target reads.

plot.gc.bias

Optionally, plot GC profiles of the pre-normalized and post-normalized coverage. Provides a quick visual check of coverage bias.

plot.max.density

By default, if the number of intervals in the probe-set is > 50000, uses a kernel density estimate to plot the coverage distribution. This uses the stat_density function from the ggplot2 package. Using this parameter, change the threshold at which density estimation is applied. If the plot.gc.bias parameter is set as FALSE, this will be ignored.

Author(s)

Angad Singh, Markus Riester

See Also

calculateGCContentByInterval

Examples


normal.coverage.file <- system.file("extdata", "example_normal.txt", 
    package="PureCN")
gc.gene.file <- system.file("extdata", "example_gc.gene.file.txt", 
    package="PureCN")
# normalize using default LOESS method
coverage <- correctCoverageBias(normal.coverage.file, gc.gene.file)
# normalize with POLYNOMIAL method for small panels
coverage <- correctCoverageBias(normal.coverage.file, gc.gene.file, 
    method="POLYNOMIAL", plot.gc.bias=TRUE)


[Package PureCN version 1.8.1 Index]