The CNVRanger package implements a comprehensive tool suite for the analysis of copy number variation (CNV). This includes functionality for summarizing individual CNV calls across a population, assessing overlap with functional genomic regions, and association analysis with gene expression and quantitative phenotypes.
CNVRanger 1.21.0
Report issues on https://github.com/waldronlab/CNVRanger/issues
Copy number variation (CNV) is a frequently observed deviation from the diploid state due to duplication or deletion of genomic regions. CNVs can be experimentally detected based on comparative genomic hybridization, and computationally inferred from SNP-arrays or next-generation sequencing data. These technologies for CNV detection have in common that they report, for each sample under study, genomic regions that are duplicated or deleted with respect to a reference. Such regions are denoted as CNV calls in the following and will be considered the starting point for analysis with the CNVRanger package.
The following figure provides an overview of the analysis capabilities of CNVRanger
.
(A) The CNVRanger package
imports CNV calls from a simple file format into
R
, and stores them in dedicated Bioconductor data structures, and
(B) implements three frequently used approaches for summarizing CNV calls
across a population:
(i) the CNVRuler procedure
that trims region margins based on regional density
Kim et al., 2012,
(ii) the reciprocal overlap procedure that requires sufficient mutual overlap
between calls
Conrad et al., 2010, and
(iii) the GISTIC procedure
that identifies recurrent CNV regions
Beroukhim et al., 2007.
(C) CNVRanger
builds on regioneR
for overlap analysis of CNVs with functional genomic regions,
(D) implements RNA-seq expression Quantitative Trait Loci (eQTL) analysis
for CNVs by interfacing with edgeR, and
(E) performs linear regression for
genome-wide association studies (GWAS) that intend to link CNVs and quantitative phenotypes.
The key parts of the functionality implemented in CNVRanger were developed, described, and applied in several previous studies:
Genome-wide detection of CNVs and their association with meat tenderness in Nelore cattle da Silva et al., 2016
Widespread modulation of gene expression by copy number variation in skeletal muscle Geistlinger et al., 2018
CNVs are associated with genomic architecture in a songbird da Silva et al., 2018
As described in the above publications, CNVRanger has been
developed and extensively tested for SNP-based CNV calls as obtained with
PennCNV.
We also tested CNVRanger
for sequencing-based CNV calls as obtained with
CNVnator (a read-depth approach) or
LUMPY (an approach that combines
evidence from split-reads and discordant read-pairs).
In general, CNVRanger can be applied to CNV calls associated with integer copy number states, where we assume the states to be encoded as:
0
: homozygous deletion (2-copy loss)1
: heterozygous deletion (1-copy loss)2
: normal diploid state3
: 1-copy gain4
: amplification (>= 2-copy gain)Note that for CNV calling software that uses a different encoding or that does not provide integer copy number states, it is often possible to (at least approximately) transform the output to a format that is compatible with the input format of CNVRanger. See Section 4.1 Input data format for details.
CNVRanger is designed to work with CNV calls from one tool at a time. See EnsembleCNV and FusorSV for combining CNV calls from multiple SNP-based callers or multiple sequencing-based callers, respectively.
CNVRanger assumes CNV calls provided as input to be already filtered by quality, using the software that was used for CNV calling, or specific tools for that purpose. CNVRanger provides downstream summarization and association analysis for CNV calls, it does not implement functions for CNV calling or quality control. CNVRanger is applicable for diploid species only.
Analysis step | Function |
---|---|
(A) Input | GenomicRanges::makeGRangesListFromDataFrame |
(B) Summarization | populationRanges |
(C) Overlap analysis | regioneR::overlapPermTest |
(D) Expression analysis | cnvEQTL |
(E) Phenotype analysis | cnvGWAS |
Note: we use the package::function
notation for functions from other packages.
For functions from this package and base R functions, we use the function name
without preceding package name.