Contents

1 Introduction

Copy number variation (CNV) is a frequently observed deviation from the diploid state due to duplication or deletion of genomic regions. CNVs can be experimentally detected based on comparative genomic hybridization, and computationally inferred from SNP-arrays or next-generation sequencing data. These technologies for CNV detection have in common that they report, for each sample under study, genomic regions that are duplicated or deleted with respect to a reference. Such regions are denoted as CNV calls in the following and will be considered the starting point for analysis with the CNVRanger package.

The following figure provides an overview of the analysis capabilities of CNVRanger.

(A) The CNVRanger package imports CNV calls from a simple file format into R, and stores them in dedicated Bioconductor data structures, and (B) implements three frequently used approaches for summarizing CNV calls across a population: (i) the CNVRuler procedure that trims region margins based on regional density Kim et al., 2012, (ii) the reciprocal overlap procedure that requires sufficient mutual overlap between calls Conrad et al., 2010, and (iii) the GISTIC procedure that identifies recurrent CNV regions Beroukhim et al., 2007. (C) CNVRanger builds on regioneR for overlap analysis of CNVs with functional genomic regions, (D) implements RNA-seq expression Quantitative Trait Loci (eQTL) analysis for CNVs by interfacing with edgeR, and (E) interfaces with PLINK for traditional genome-wide association studies (GWAS) between CNVs and quantitative phenotypes.

The key parts of the functionality implemented in CNVRanger were developed, described, and applied in several previous studies:

2 Applicability and Scope

As described in the above publications, CNVRanger has been developed and extensively tested for SNP-based CNV calls as obtained with PennCNV. We also tested CNVRanger for sequencing-based CNV calls as obtained with CNVnator (a read-depth approach) or LUMPY (an approach that combines evidence from split-reads and discordant read-pairs).

In general, CNVRanger can be applied to CNV calls associated with integer copy number states, where we assume the states to be encoded as:

Note that for CNV calling software that uses a different encoding or that does not provide integer copy number states, it is often possible to (at least approximately) transform the output to a format that is compatible with the input format of CNVRanger. See Section 4.1 Input data format for details.

CNVRanger is designed to work with CNV calls from one tool at a time. See EnsembleCNV and FusorSV for combining CNV calls from multiple SNP-based callers or multiple sequencing-based callers, respectively.

CNVRanger assumes CNV calls provided as input to be already filtered by quality, using the software that was used for CNV calling, or specific tools for that purpose. CNVRanger provides downstream summarization and association analysis for CNV calls, it does not implement functions for CNV calling or quality control. CNVRanger is applicable for diploid species only.

3 Key functions

Analysis step Function
(A) Input GenomicRanges::makeGRangesListFromDataFrame
(B) Summarization populationRanges
(C) Overlap analysis regioneR::overlapPermTest
(D) Expression analysis cnvEQTL
(E) Phenotype analysis cnvGWAS

Note: we use the package::function notation for functions from other packages. For functions from this package and base R functions, we use the function name without preceding package name.

4 Reading and accessing CNV data