CopyNumberPlots 1.16.0
Data visualisation is a powerful tool used for data analysis and exploration in many fields. Genomics data analysis is one of these fields where good visualisation tools can be of great help. The aim of CopyNumberPlots is to offer the user an easy way to create copy-number related plots using the infrastructure provided by the R package karyoploteR.
In addition to a set of specialized plotting functions for copy-number analysis
data and results (plotBAF
, plotCopyNumberCalls
, …),
CopyNumberPlots contains a number of data loading
functions to help parsing and loading the results of widely used
copy-number calling software such as DNAcopy,
DECoN or
CNVkit.
Finally, since CopyNumberPlots extends the functionality of karyoploteR, it is possible to combine the plotting functions of both packages to get the perfect figure for your data.
CopyNumberPlots is a Bioconductor package and to install it we have to use BiocManager.
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("CopyNumberPlots")
We can also install the package from github to get the latest devel version, but beware that it might be incompatible with the release version of Bioconductor!
BiocManager::install("bernatgel/CopyNumberPlots")
To start working with CopyNumberPlots we will need
to use the plotKaryoptype
function from karyoploteR.
If you want more information on how to customize it, use for other organisms
or genome version, etc… you can take a look at the
karyoploteR tutorial and
specifically at the section on
how to plot ideograms.
For this quick start example we’ll plot SNP-array data simulating a cancer genome. The data is in a file included with the package. You can use almost any table-like file format, including the Final Report file you would get from Illumina’s Genome Studio. In this case, to keep the example small, we have data only for chomosome 1.
To load the data we’ll use loadSNPData
which will detect the right columns,
read the data and build a GRanges object for us.
If data uses Ensembl-style chromosome names (1,2,3,…,X,Y) instead of
default karyoploteR UCSC chromosome names (chr1,chr2,chr3,…,chrX,chrY)
we could change the chromosome style to UCSC with the function UCSCStyle
.
library(CopyNumberPlots)
## Loading required package: karyoploteR
## Loading required package: regioneR
## Loading required package: GenomicRanges
## Loading required package: stats4
## Loading required package: BiocGenerics
##
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:stats':
##
## IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
##
## Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
## as.data.frame, basename, cbind, colnames, dirname, do.call,
## duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
## lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
## pmin.int, rank, rbind, rownames, sapply, setdiff, sort, table,
## tapply, union, unique, unsplit, which.max, which.min
## Loading required package: S4Vectors
##
## Attaching package: 'S4Vectors'
## The following object is masked from 'package:utils':
##
## findMatches
## The following objects are masked from 'package:base':
##
## I, expand.grid, unname
## Loading required package: IRanges
## Loading required package: GenomeInfoDb
## Warning: replacing previous import 'utils::findMatches' by
## 'S4Vectors::findMatches' when loading 'AnnotationDbi'
s1.file <- system.file("extdata", "S1.rawdata.txt", package = "CopyNumberPlots", mustWork = TRUE)
s1 <- loadSNPData(s1.file)
## Reading data from /tmp/RtmpN7iJXU/Rinstb16ec4bdf2eb2/CopyNumberPlots/extdata/S1.rawdata.txt
## The column identified as Chromosome is: chr
## The column identified as Start is: start
## The column identified as End is: end
## The column identified as B-Allele Frequency is: baf
## The column identified as Log Ratio is: lrr
s1
## GRanges object with 965 ranges and 2 metadata columns:
## seqnames ranges strand | lrr baf
## <Rle> <IRanges> <Rle> | <numeric> <numeric>
## 253 chr1 480818 * | -0.949246 1
## 678 chr1 595283 * | -0.882367 0
## 643 chr1 632319 * | -0.769292 1
## 41 chr1 1036550 * | -1.128100 1
## 88 chr1 1115414 * | -0.842099 0
## ... ... ... ... . ... ...
## 575 chr1 248120086 * | 0.714653 0.751899
## 510 chr1 248245181 * | 0.446138 0.312570
## 654 chr1 248488745 * | 0.794984 0.000000
## 171 chr1 248630472 * | 0.758302 1.000000
## 938 chr1 248704671 * | 0.994605 0.227549
## -------
## seqinfo: 1 sequence from an unspecified genome; no seqlengths
Once we have our data loaded we can start plotting. We’ll start by creating
a karyoplot using plotKaryotype
. If we were plotting more than one
chromosome, we could use plot.type=4
to get all chromosomes in a single line
one next to the other. You can get more information on the available plot types
at
the karyoploteR tutorial.
kp <- plotKaryotype(chromosomes="chr1")
And once we have a karyoplot we can start adding out data. We can plot the
B-allele frequency using plotBAF
kp <- plotKaryotype(chromosomes="chr1")
plotBAF(kp, s1)
We can plot LRR using plotLRR
kp <- plotKaryotype(chromosomes="chr1")
plotLRR(kp, s1)
And we can see in this plot that points with a LRR below -4 (and above 2) are plotted in red at -4 (and at 2) so we don’t lose them.
We can also use the
data positioning parameters r0
and r1
to add more than one data type
on the same plot.
kp <- plotKaryotype(chromosomes="chr1")
plotBAF(kp, s1, r0=0.55, r1=1)
plotLRR(kp, s1, r0=0, r1=0.45)
Finally, we can load a copy number calling made on this data and plot it.
To load the copy number calls in this file we can use the function
loadCopyNumberCalls
that will read the data, identify the correct columns and
create a GRanges object for us.
s1.calls.file <- system.file("extdata", "S1.segments.txt", package = "CopyNumberPlots", mustWork = TRUE)
s1.calls <- loadCopyNumberCalls(s1.calls.file)
## Reading data from /tmp/RtmpN7iJXU/Rinstb16ec4bdf2eb2/CopyNumberPlots/extdata/S1.segments.txt
## The column identified as Copy Number is: cn
## The column identified as LOH is: loh
s1.calls
## GRanges object with 13 ranges and 2 metadata columns:
## seqnames ranges strand | cn loh
## <Rle> <IRanges> <Rle> | <integer> <integer>
## 1 chr1 1-60000000 * | 1 1
## 2 chr1 60000001-60000999 * | 2 0
## 3 chr1 60001000-62990000 * | 0 1
## 4 chr1 62990001-62999999 * | 2 0
## 5 chr1 63000000-121500000 * | 1 1
## .. ... ... ... . ... ...
## 9 chr1 189600352-220352872 * | 3 0
## 10 chr1 220352873-220352971 * | 2 0
## 11 chr1 220352972-234920000 * | 5 0
## 12 chr1 234920001-234999999 * | 2 0
## 13 chr1 235000000-249250621 * | 3 0
## -------
## seqinfo: 1 sequence from an unspecified genome; no seqlengths
And then use plotCopyNumberCalls
to add them to the previous plot.
kp <- plotKaryotype(chromosomes="chr1")
plotBAF(kp, s1, r0=0.6, r1=1)
plotLRR(kp, s1, r0=0.15, r1=0.55)
#plotCopyNumberCalls(kp, s1.calls, r0=0, r1=0.10)
With that the main functionality of CopyNumberPlots is covered. It is important to take into account that since we are extending the functionality of karyoploteR, we can use all karyoploteR functions to add more data and other data types into these plots.
In the following pages you will find more information on how to load data to use with CopyNumberPlots, how to create other plot types and how to customize them.
The plotting functions in CopyNumberPlots expect
data to be in a GRanges
with a few columns with specific names:
You can create these structures yourself, but CopyNumberPlots has functions to help in loading both raw data (mainly SNP-array and aCGH data) and copy-number calls.
The main function to load raw data is loadSNPData
. It will take either
a file or an R object (data.frame
or similar) and will load it, detect
the columns with the needed information (chromosome, position, log-ratio,
B-allele frequency) based on the column names and build a GRanges
object
ready to use by the plotting functions.
raw.data.file <- system.file("extdata", "snp.data_test.csv", package = "CopyNumberPlots", mustWork=TRUE)
snps <- loadSNPData(raw.data.file)
## Reading data from /tmp/RtmpN7iJXU/Rinstb16ec4bdf2eb2/CopyNumberPlots/extdata/snp.data_test.csv
## The column identified as Chromosome is: Chr
## The column identified as Position is: Position
## The column identified as B-Allele Frequency is: B.Allele.Freq
## The column identified as Log Ratio is: Log.R.Ratio
## The column identified as Identifier is: SNP.Name
snps
## GRanges object with 6 ranges and 11 metadata columns:
## seqnames ranges strand | Sample.ID id SNP.Index SNP
## <Rle> <IRanges> <Rle> | <character> <character> <integer> <character>
## 1 X 68757767 * | S001 rs7060463 1 [A/G]
## 2 9 86682315 * | S001 rs1898321 2 [T/C]
## 3 11 92711948 * | S001 kgp12808645 3 [A/G]
## 4 12 55233823 * | S001 rs7299872 4 [A/G]
## 5 2 147722211 * | S001 rs2176056 5 [A/G]
## 6 19 32605173 * | S001 rs17597441 6 [T/C]
## Plus.Minus.Strand Allele1...Plus Allele2...Plus GC.Score GT.Score
## <character> <character> <character> <numeric> <numeric>
## 1 - C C 0.9244 0.8872
## 2 + T C 0.9643 0.9367
## 3 - T T 0.8770 0.8885
## 4 + A G 0.8852 0.8508
## 5 + G G 0.9499 0.9167
## 6 - G G 0.8025 0.8332
## baf lrr
## <numeric> <numeric>
## 1 1.0000 -0.3530
## 2 0.5004 0.0740
## 3 0.0054 -0.0537
## 4 0.5088 -0.2337
## 5 1.0000 0.0886
## 6 0.9986 0.0779
## -------
## seqinfo: 6 sequences from an unspecified genome; no seqlengths
When run, the function will tell us the columns it identified and will proceed
load the data. To identify the columns it will internally use a set of
regular expressions that work in most cases including on the ‘Final Report’
files created by Illumina’s Genome Studio. If for any reason the automatic
identification of the columns failed, it is possible to specify the exact column
names using the appropiate parameters (chr.col
, start.col
, end.col
…).
Another set of functions included in the package are functions to load
the results of copy-number calling algorithms, the copy number calls per se.
In this case we also have a generic function, loadCopyNumberCalls
, and a
few functions specialized in specific copy-number calling packages.
Again, the generic function can work with a file or an R object with a
table-like structure and will try to discover the right columns itself. It will
return a GRanges with the copy-number called segments and the optional columns
cn
for integer copy-number values, loh
for loss-of-heterozigosity regions and
segment.value
for values computed for the segments (for example, mean value
of the probes in the segment).
As an example we will generate a “random” calling
cn.data <- toGRanges(c("chr14:66459785-86459774", "chr17:68663111-88866308",
"chr10:43426998-83426994", "chr3:88892741-120892733",
"chr2:12464318-52464316", "chrX:7665575-27665562"))
cn.data$CopyNumberInteger <- sample(c(0,1,3,4), size = 6, replace = TRUE)
cn.data$LossHetero <- cn.data$CopyNumberInteger<2
cn.data
## GRanges object with 6 ranges and 2 metadata columns:
## seqnames ranges strand | CopyNumberInteger LossHetero
## <Rle> <IRanges> <Rle> | <numeric> <logical>
## 1 chr14 66459785-86459774 * | 3 FALSE
## 2 chr17 68663111-88866308 * | 1 TRUE
## 3 chr10 43426998-83426994 * | 4 FALSE
## 4 chr3 88892741-120892733 * | 1 TRUE
## 5 chr2 12464318-52464316 * | 1 TRUE
## 6 chrX 7665575-27665562 * | 4 FALSE
## -------
## seqinfo: 6 sequences from an unspecified genome; no seqlengths
and load it
cn.calls <- loadCopyNumberCalls(cn.data)
## The column identified as Copy Number is: CopyNumberInteger
## The column identified as LOH is: LossHetero
cn.calls
## GRanges object with 6 ranges and 2 metadata columns:
## seqnames ranges strand | cn loh
## <Rle> <IRanges> <Rle> | <numeric> <logical>
## 1 chr14 66459785-86459774 * | 3 FALSE
## 2 chr17 68663111-88866308 * | 1 TRUE
## 3 chr10 43426998-83426994 * | 4 FALSE
## 4 chr3 88892741-120892733 * | 1 TRUE
## 5 chr2 12464318-52464316 * | 1 TRUE
## 6 chrX 7665575-27665562 * | 4 FALSE
## -------
## seqinfo: 6 sequences from an unspecified genome; no seqlengths
we can see how the columns for cn and loh were correctly identified.
To plot this objet we can call, for example plotCopyNumberCalls
.
kp <- plotKaryotype(plot.type = 1)
#plotCopyNumberCalls(kp, cn.calls = cn.calls)
There are other specialized functions that will load either the R object produced by copy-number calling R packages or the files produced by either R or external copy-number calling software.
Currently there are specilized functions to load the data produced by:
Once we have data loaded (or directly created by us) we can plot it.
There are two functions to plot raw data (plotBAF
and plotLRR
) and three
functions to plot the copy-number calls (plotCopyNumberCalls
,
plotCopyNumberCallsAsLines
and plotCopyNumberSummary
).
To demonstrate the raw-data plotting functions we’ll use two example files included with the package
s1.file <- system.file("extdata", "S1.rawdata.txt", package = "CopyNumberPlots", mustWork = TRUE)
s1 <- loadSNPData(s1.file)
## Reading data from /tmp/RtmpN7iJXU/Rinstb16ec4bdf2eb2/CopyNumberPlots/extdata/S1.rawdata.txt
## The column identified as Chromosome is: chr
## The column identified as Start is: start
## The column identified as End is: end
## The column identified as B-Allele Frequency is: baf
## The column identified as Log Ratio is: lrr
head(s1)
## GRanges object with 6 ranges and 2 metadata columns:
## seqnames ranges strand | lrr baf
## <Rle> <IRanges> <Rle> | <numeric> <numeric>
## 253 chr1 480818 * | -0.949246 1.0000000
## 678 chr1 595283 * | -0.882367 0.0000000
## 643 chr1 632319 * | -0.769292 1.0000000
## 41 chr1 1036550 * | -1.128100 1.0000000
## 88 chr1 1115414 * | -0.842099 0.0000000
## 116 chr1 1559575 * | -1.346852 0.0141703
## -------
## seqinfo: 1 sequence from an unspecified genome; no seqlengths
s2.file <- system.file("extdata", "S2.rawdata.txt", package = "CopyNumberPlots", mustWork = TRUE)
s2 <- loadSNPData(s2.file)
## Reading data from /tmp/RtmpN7iJXU/Rinstb16ec4bdf2eb2/CopyNumberPlots/extdata/S2.rawdata.txt
## The column identified as Chromosome is: chr
## The column identified as Start is: start
## The column identified as End is: end
## The column identified as B-Allele Frequency is: baf
## The column identified as Log Ratio is: lrr
head(s2)
## GRanges object with 6 ranges and 2 metadata columns:
## seqnames ranges strand | lrr baf
## <Rle> <IRanges> <Rle> | <numeric> <numeric>
## 458 chr1 326751 * | 0.1076864 0.0000000
## 382 chr1 466084 * | 0.0898970 0.0562677
## 177 chr1 523654 * | -0.1805354 0.4927062
## 282 chr1 785305 * | 0.0373102 0.4035268
## 799 chr1 787101 * | 0.1487428 1.0000000
## 315 chr1 899495 * | -0.1578661 1.0000000
## -------
## seqinfo: 1 sequence from an unspecified genome; no seqlengths
To plot the B-Allele frequency (BAF) we’ll use plotBAF
. We’ll start creating
a karyoplot using
karyoploteR’s plotKaryotype and then add the BAF values into it.
kp <- plotKaryotype(chromosomes="chr1")
plotBAF(kp, snps=s1)