1 Introduction

This vignette illustrates existing and Bioconductor infrastructure for the visualisation of mass spectrometry and proteomics data. The code details the visualisations presented in

Gatto L, Breckels LM, Naake T, Gibb S. Visualisation of proteomics data using R and Bioconductor. Proteomics. 2015 Feb 18. doi: 10.1002/pmic.201400392. PubMed PMID: 25690415.

1.1 References

CRAN Task View: Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization: http://cran.r-project.org/web/views/Graphics.html
CRAN Task View: Web Technologies and Services: http://cran.r-project.org/web/views/WebTechnologies.html
ggplot2 book (syntax is slightly outdated) (code), web page and on-line docs
lattice book and web page
R Graphics book
R Cookbook and R Graphics Cookbook

1.2 Relevant packages

There are currently 87 Proteomics and 60 MassSpectrometry packages in Bioconductor version 3.4. Other non-Bioconductor packages are described in the RforProteomics vignette.

	Package	Title	Version
ASEB	ASEB	Predict Acetylated Lysine Sites	1.18.0
bioassayR	bioassayR	Cross-target analysis of small molecule bioactivity	1.12.0
biobroom	biobroom	Turn Bioconductor objects into tidy data frames	1.6.0
biosigner	biosigner	Signature discovery from omics data	1.2.0
BRAIN	BRAIN	Baffling Recursive Algorithm for Isotope distributioN calculations	1.20.0
Cardinal	Cardinal	A mass spectrometry imaging toolbox for statistical analysis	1.6.0
CellNOptR	CellNOptR	Training of boolean logic models of signalling networks using prior knowledge networks and perturbation data.	1.20.0
ChemmineOB	ChemmineOB	R interface to a subset of OpenBabel functionalities	1.12.0
ChemmineR	ChemmineR	Cheminformatics Toolkit for R	2.26.0
cisPath	cisPath	Visualization and management of the protein-protein interaction networks.	1.14.0
cleaver	cleaver	Cleavage of Polypeptide Sequences	1.12.0
clippda	clippda	A package for the clinical proteomic profiling data analysis	1.24.0
CNORdt	CNORdt	Add-on to CellNOptR: Discretized time treatments	1.16.0
CNORfeeder	CNORfeeder	Integration of CellNOptR to add missing links	1.14.0
CNORode	CNORode	ODE add-on to CellNOptR	1.16.0
customProDB	customProDB	Generate customized protein database from NGS data, with a focus on RNA-Seq data, for proteomics search.	1.14.0
DAPAR	DAPAR	Tools for the Differential Analysis of Proteins Abundance with R	1.6.0
deltaGseg	deltaGseg	deltaGseg	1.14.0
EGSEA	EGSEA	Ensemble of Gene Set Enrichment Analyses	1.2.0
eiR	eiR	Accelerated similarity searching of small molecules	1.14.0
EWCE	EWCE	Expression Weighted Celltype Enrichment	1.2.0
fCI	fCI	f-divergence Cutoff Index for Differential Expression Analysis in Transcriptomics and Proteomics	1.4.0
fmcsR	fmcsR	Mismatch Tolerant Maximum Common Substructure Searching	1.16.0
GraphPAC	GraphPAC	Identification of Mutational Clusters in Proteins via a Graph Theoretical Approach.	1.16.0
hpar	hpar	Human Protein Atlas in R	1.16.0
iPAC	iPAC	Identification of Protein Amino acid Clustering	1.18.0
IPPD	IPPD	Isotopic peak pattern deconvolution for Protein Mass Spectrometry by template matching	1.22.0
isobar	isobar	Analysis and quantitation of isobarically tagged MSMS proteomics data	1.20.0
kimod	kimod	A k-tables approach to integrate multiple Omics-Data	1.2.0
LPEadj	LPEadj	A correction of the local pooled error (LPE) method to replace the asymptotic variance adjustment with an unbiased adjustment based on sample size.	1.34.0
MassSpecWavelet	MassSpecWavelet	Mass spectrum processing by wavelet-based algorithms	1.40.0
MSGFgui	MSGFgui	A shiny GUI for MSGFplus	1.8.0
MSGFplus	MSGFplus	An interface between R and MS-GF+	1.8.0
msmsEDA	msmsEDA	Exploratory Data Analysis of LC-MS/MS data by spectral counts	1.12.0
msmsTests	msmsTests	LC-MS/MS Differential Expression Tests	1.12.0
MSnbase	MSnbase	Base Functions and Classes for MS-based Proteomics	2.0.0
MSnID	MSnID	Utilities for Exploration and Assessment of Confidence of LC-MSn Proteomics Identifications	1.8.0
mzID	mzID	An mzIdentML parser for R	1.12.0
mzR	mzR	parser for netCDF, mzXML, mzData and mzML and mzIdentML files (mass spectrometry data)	2.8.0
PAA	PAA	PAA (Protein Array Analyzer)	1.8.0
PAnnBuilder	PAnnBuilder	Protein annotation data package builder	1.38.0
Path2PPI	Path2PPI	Prediction of pathway-related protein-protein interaction networks	1.4.0
pathview	pathview	a tool set for pathway based data integration and visualization	1.14.0
Pbase	Pbase	Manipulating and exploring protein and proteomics data	0.14.0
PCpheno	PCpheno	Phenotypes and cellular organizational units	1.36.0
PECA	PECA	Probe-level Expression Change Averaging	1.10.0
pepXMLTab	pepXMLTab	Parsing pepXML files and filter based on peptide FDR.	1.8.0
PGA	PGA	An package for identification of novel peptides by customized database derived from RNA-Seq	1.4.0
plgem	plgem	Detect differential expression in microarray and proteomics datasets with the Power Law Global Error Model (PLGEM)	1.46.0
PLPE	PLPE	Local Pooled Error Test for Differential Expression with Paired High-throughput Data	1.34.0
ppiStats	ppiStats	Protein-Protein Interaction Statistical Package	1.40.0
proBAMr	proBAMr	Generating SAM file for PSMs in shotgun proteomics data	1.8.0
PROcess	PROcess	Ciphergen SELDI-TOF Processing	1.50.0
procoil	procoil	Prediction of Oligomerization of Coiled Coil Proteins	2.2.0
ProCoNA	ProCoNA	Protein co-expression network analysis (ProCoNA).	1.12.0
pRoloc	pRoloc	A unifying bioinformatics framework for spatial proteomics	1.14.0
pRolocGUI	pRolocGUI	Interactive visualisation of spatial proteomics data	1.8.0
Prostar	Prostar	Provides a GUI for DAPAR	1.6.1
prot2D	prot2D	Statistical Tools for volume data from 2D Gel Electrophoresis	1.12.0
ProteomicsAnnotationHubData	ProteomicsAnnotationHubData	Transform public proteomics data resources into Bioconductor Data Structures	1.4.0
proteoQC	proteoQC	An R package for proteomics data quality control	1.10.0
ProtGenerics	ProtGenerics	S4 generic functions for Bioconductor proteomics infrastructure	1.6.0
Pviz	Pviz	Peptide Annotation and Data Visualization using Gviz	1.8.0
qcmetrics	qcmetrics	A Framework for Quality Control	1.12.0
QuartPAC	QuartPAC	Identification of mutational clusters in protein quaternary structures.	1.6.0
rain	rain	Rhythmicity Analysis Incorporating Non-parametric Methods	1.8.0
RankProd	RankProd	Rank Product method for identifying differentially expressed genes with application in meta-analysis	3.0.0
RCASPAR	RCASPAR	A package for survival time prediction based on a piecewise baseline hazard Cox regression model.	1.20.0
Rchemcpp	Rchemcpp	Similarity measures for chemical compounds	2.12.0
Rcpi	Rcpi	Toolkit for Compound-Protein Interaction in Drug Discovery	1.10.0
readat	readat	Functionality to Read and Manipulate SomaLogic ADAT files	1.0.0
ropls	ropls	PCA, PLS(-DA) and OPLS(-DA) for multivariate analysis and feature selection of omics data	1.6.0
ROTS	ROTS	Reproducibility-Optimized Test Statistic	1.2.0
RpsiXML	RpsiXML	R interface to PSI-MI 2.5 files	2.16.0
rpx	rpx	R Interface to the ProteomeXchange Repository	1.10.0
rTANDEM	rTANDEM	Interfaces the tandem protein identification algorithm in R	1.14.0
sapFinder	sapFinder	A package for variant peptides detection and visualization in shotgun proteomics.	1.12.0
ScISI	ScISI	In Silico Interactome	1.46.0
shinyTANDEM	shinyTANDEM	Provides a GUI for rTANDEM	1.12.0
SLGI	SLGI	Synthetic Lethal Genetic Interaction	1.34.0
SpacePAC	SpacePAC	Identification of Mutational Clusters in 3D Protein Space via Simulation.	1.12.0
specL	specL	specL - Prepare Peptide Spectrum Matches for Use in Targeted Proteomics	1.8.0
spliceSites	spliceSites	A bioconductor package for exploration of alignment gap positions from RNA-seq data	1.22.0
SWATH2stats	SWATH2stats	Transform and Filter SWATH Data for Statistical Packages	1.4.1
synapter	synapter	Label-free data analysis pipeline for optimal identification and quantitation	1.16.0
tofsims	tofsims	Import, process and analysis of Time-of-Flight Secondary Ion Mass Spectrometry (ToF-SIMS) imaging data	1.2.0
TPP	TPP	Analyze thermal proteome profiling (TPP) experiments	3.0.0

	Package	Title	Version
apComplex	apComplex	Estimate protein complex membership using AP-MS protein data	2.40.0
BRAIN	BRAIN	Baffling Recursive Algorithm for Isotope distributioN calculations	1.20.0
CAMERA	CAMERA	Collection of annotation related methods for mass spectrometry data	1.30.0
Cardinal	Cardinal	A mass spectrometry imaging toolbox for statistical analysis	1.6.0
cosmiq	cosmiq	cosmiq - COmbining Single Masses Into Quantities	1.8.0
customProDB	customProDB	Generate customized protein database from NGS data, with a focus on RNA-Seq data, for proteomics search.	1.14.0
DAPAR	DAPAR	Tools for the Differential Analysis of Proteins Abundance with R	1.6.0
flagme	flagme	Analysis of Metabolomics GC/MS Data	1.30.0
gaga	gaga	GaGa hierarchical model for high-throughput data analysis	2.20.0
iontree	iontree	Data management and analysis of ion trees from ion-trap mass spectrometry	1.20.0
IPO	IPO	Automated Optimization of XCMS Data Processing parameters	1.0.0
isobar	isobar	Analysis and quantitation of isobarically tagged MSMS proteomics data	1.20.0
LOBSTAHS	LOBSTAHS	Lipid and Oxylipin Biomarker Screening through Adduct Hierarchy Sequences	1.0.0
MAIT	MAIT	Statistical Analysis of Metabolomic Data	1.8.0
MassArray	MassArray	Analytical Tools for MassArray Data	1.26.0
MassSpecWavelet	MassSpecWavelet	Mass spectrum processing by wavelet-based algorithms	1.40.0
Metab	Metab	Metab: An R Package for a High-Throughput Analysis of Metabolomics Data Generated by GC-MS.	1.8.0
metabomxtr	metabomxtr	A package to run mixture models for truncated metabolomics data with normal or lognormal distributions	1.8.0
metaMS	metaMS	MS-based metabolomics annotation pipeline	1.10.0
MetCirc	MetCirc	MetCirc - a workflow to analyse and visualise metabolomics data	1.0.0
MSGFgui	MSGFgui	A shiny GUI for MSGFplus	1.8.0
MSGFplus	MSGFplus	An interface between R and MS-GF+	1.8.0
msmsEDA	msmsEDA	Exploratory Data Analysis of LC-MS/MS data by spectral counts	1.12.0
msmsTests	msmsTests	LC-MS/MS Differential Expression Tests	1.12.0
MSnbase	MSnbase	Base Functions and Classes for MS-based Proteomics	2.0.0
MSnID	MSnID	Utilities for Exploration and Assessment of Confidence of LC-MSn Proteomics Identifications	1.8.0
msPurity	msPurity	Automated Evaluation of Precursor Ion Purity for Mass Spectrometry Based Fragmentation in Metabolomics	1.0.0
mzID	mzID	An mzIdentML parser for R	1.12.0
mzR	mzR	parser for netCDF, mzXML, mzData and mzML and mzIdentML files (mass spectrometry data)	2.8.0
PAPi	PAPi	Predict metabolic pathway activity based on metabolomics data	1.14.0
Pbase	Pbase	Manipulating and exploring protein and proteomics data	0.14.0
pepXMLTab	pepXMLTab	Parsing pepXML files and filter based on peptide FDR.	1.8.0
PGA	PGA	An package for identification of novel peptides by customized database derived from RNA-Seq	1.4.0
plgem	plgem	Detect differential expression in microarray and proteomics datasets with the Power Law Global Error Model (PLGEM)	1.46.0
proBAMr	proBAMr	Generating SAM file for PSMs in shotgun proteomics data	1.8.0
PROcess	PROcess	Ciphergen SELDI-TOF Processing	1.50.0
pRoloc	pRoloc	A unifying bioinformatics framework for spatial proteomics	1.14.0
Prostar	Prostar	Provides a GUI for DAPAR	1.6.1
proteoQC	proteoQC	An R package for proteomics data quality control	1.10.0
ProtGenerics	ProtGenerics	S4 generic functions for Bioconductor proteomics infrastructure	1.6.0
qcmetrics	qcmetrics	A Framework for Quality Control	1.12.0
Rdisop	Rdisop	Decomposition of Isotopic Patterns	1.34.0
Risa	Risa	Converting experimental metadata from ISA-tab into Bioconductor data structures	1.16.0
RMassBank	RMassBank	Workflow to process tandem MS files and build MassBank records	2.2.0
rols	rols	An R interface to the Ontology Lookup Service	2.2.0
ropls	ropls	PCA, PLS(-DA) and OPLS(-DA) for multivariate analysis and feature selection of omics data	1.6.0
rpx	rpx	R Interface to the ProteomeXchange Repository	1.10.0
rTANDEM	rTANDEM	Interfaces the tandem protein identification algorithm in R	1.14.0
sapFinder	sapFinder	A package for variant peptides detection and visualization in shotgun proteomics.	1.12.0
shinyTANDEM	shinyTANDEM	Provides a GUI for rTANDEM	1.12.0
SIMAT	SIMAT	GC-SIM-MS data processing and alaysis tool	1.6.0
specL	specL	specL - Prepare Peptide Spectrum Matches for Use in Targeted Proteomics	1.8.0
statTarget	statTarget	Statistical Analysis of Metabolite Profile	1.4.0
SWATH2stats	SWATH2stats	Transform and Filter SWATH Data for Statistical Packages	1.4.1
synapter	synapter	Label-free data analysis pipeline for optimal identification and quantitation	1.16.0
TargetSearch	TargetSearch	A package for the analysis of GC-MS metabolite profiling data	1.30.0
tofsims	tofsims	Import, process and analysis of Time-of-Flight Secondary Ion Mass Spectrometry (ToF-SIMS) imaging data	1.2.0
TPP	TPP	Analyze thermal proteome profiling (TPP) experiments	3.0.0
xcms	xcms	LC/MS and GC/MS Data Analysis	1.50.0
yamss	yamss	Tools for high-throughput metabolomics	1.0.0

2 Ascombe’s quartet

x1	x2	x3	x4	y1	y2	y3	y4
10	10	10	8	8.04	9.14	7.46	6.58
8	8	8	8	6.95	8.14	6.77	5.76
13	13	13	8	7.58	8.74	12.74	7.71
9	9	9	8	8.81	8.77	7.11	8.84
11	11	11	8	8.33	9.26	7.81	8.47
14	14	14	8	9.96	8.10	8.84	7.04
6	6	6	8	7.24	6.13	6.08	5.25
4	4	4	19	4.26	3.10	5.39	12.50
12	12	12	8	10.84	9.13	8.15	5.56
7	7	7	8	4.82	7.26	6.42	7.91
5	5	5	8	5.68	4.74	5.73	6.89

tab <- matrix(NA, 5, 4)
colnames(tab) <- 1:4
rownames(tab) <- c("var(x)", "mean(x)",
                   "var(y)", "mean(y)",
                   "cor(x,y)")

for (i in 1:4)
    tab[, i] <- c(var(anscombe[, i]),
                  mean(anscombe[, i]),
                  var(anscombe[, i+4]),
                  mean(anscombe[, i+4]),
                  cor(anscombe[, i], anscombe[, i+4]))

	1	2	3	4
var(x)	11.0000000	11.0000000	11.0000000	11.0000000
mean(x)	9.0000000	9.0000000	9.0000000	9.0000000
var(y)	4.1272691	4.1276291	4.1226200	4.1232491
mean(y)	7.5009091	7.5009091	7.5000000	7.5009091
cor(x,y)	0.8164205	0.8162365	0.8162867	0.8165214

While the residuals of the linear regression clearly indicate fundamental differences in these data, the most simple and straightforward approach is visualisation to highlight the fundamental differences in the datasets.

ff <- y ~ x

mods <- setNames(as.list(1:4), paste0("lm", 1:4))

par(mfrow = c(2, 2), mar = c(4, 4, 1, 1))
for (i in 1:4) {
    ff[2:3] <- lapply(paste0(c("y","x"), i), as.name)
    plot(ff, data = anscombe, pch = 19, xlim = c(3, 19), ylim = c(3, 13))
    mods[[i]] <- lm(ff, data = anscombe)
    abline(mods[[i]])
}

lm1	lm2	lm3	lm4
0.0390000	1.1390909	-0.5397273	-0.421
-0.0508182	1.1390909	-0.2302727	-1.241
-1.9212727	-0.7609091	3.2410909	0.709
1.3090909	1.2690909	-0.3900000	1.839
-0.1710909	0.7590909	-0.6894545	1.469
-0.0413636	-1.9009091	-1.1586364	0.039
1.2393636	0.1290909	0.0791818	-1.751
-0.7404545	-1.9009091	0.3886364	0.000
1.8388182	0.1290909	-0.8491818	-1.441
-1.6807273	0.7590909	-0.0805455	0.909
0.1794545	-0.7609091	0.2289091	-0.111

3 The MA plot example

The following code chunk connects to the PXD000001 data set on the ProteomeXchange repository and fetches the mzTab file. After missing values filtering, we extract relevant data (log2 fold-changes and log10 mean expression intensities) into data.frames.

library("rpx")
px1 <- PXDataset("PXD000001")
mztab <- pxget(px1, "PXD000001_mztab.txt")

## Downloading 1 file

library("MSnbase")
## here, we need to specify the (old) mzTab version 0.9
qnt <- readMzTabData(mztab, what = "PEP", version = "0.9")

## Detected a metadata section

## Detected a peptide section

sampleNames(qnt) <- reporterNames(TMT6)
qnt <- filterNA(qnt)
## may be combineFeatuers

spikes <- c("P02769", "P00924", "P62894", "P00489")
protclasses <- as.character(fData(qnt)$accession)
protclasses[!protclasses %in% spikes] <- "Background"


madata42 <- data.frame(A = rowMeans(log(exprs(qnt[, c(4, 2)]), 10)),
                       M = log(exprs(qnt)[, 4], 2) - log(exprs(qnt)[, 2], 2),
                       data = rep("4vs2", nrow(qnt)),
                       protein = fData(qnt)$accession,
                       class = protclasses)

madata62 <- data.frame(A = rowMeans(log(exprs(qnt[, c(6, 2)]), 10)),
                       M = log(exprs(qnt)[, 6], 2) - log(exprs(qnt)[, 2], 2),
                       data = rep("6vs2", nrow(qnt)),
                       protein = fData(qnt)$accession,
                       class = protclasses)


madata <- rbind(madata42, madata62)

3.1 The traditional plotting system

par(mfrow = c(1, 2))
plot(M ~ A, data = madata42, main = "4vs2",
     xlab = "A", ylab = "M", col = madata62$class)
plot(M ~ A, data = madata62, main = "6vs2",
     xlab = "A", ylab = "M", col = madata62$class)

3.2 lattice

library("lattice")
latma <- xyplot(M ~ A | data, data = madata,
                groups = madata$class,
                auto.key = TRUE)
print(latma)

3.3 ggplot2

library("ggplot2")
ggma <- ggplot(aes(x = A, y = M, colour = class), data = madata,
               colour = class) +
                   geom_point() +
                       facet_grid(. ~ data)
print(ggma)

3.4 Customization

library("RColorBrewer")
bcols <- brewer.pal(4, "Set1")
cls <- c("Background" = "#12121230",
         "P02769" = bcols[1],
         "P00924" = bcols[2],
         "P62894" = bcols[3],
         "P00489" = bcols[4])

ggma2 <- ggplot(aes(x = A, y = M, colour = class),
                data = madata) + geom_point(shape = 19) +
                    facet_grid(. ~ data) + scale_colour_manual(values = cls) +
                        guides(colour = guide_legend(override.aes = list(alpha = 1)))
print(ggma2)

3.5 The `MAplot` method for `MSnSet` instances

MAplot(qnt, cex = .8)

3.6 An interactive shiny app for MA plots

This app is based on Mike Love’s shinyMA application, adapted for a proteomics data. A screen shot is displayed below. To start the application:

shinyMA()

shinyMA screeshot

The application is also available online at https://lgatto.shinyapps.io/shinyMA/.

See the excellent shiny web page for tutorials.

3.7 Volcano plots

Below, using the msmsTest package, we load a example MSnSet data with spectral couting data (from the r Biocpkg("msmsEDA") package) and run a statistical test to obtain (adjusted) p-values and fold-changes.

library("msmsEDA")
library("msmsTests")
data(msms.dataset)
## Pre-process expression matrix
e <- pp.msms.data(msms.dataset)
## Models and normalizing condition
null.f <- "y~batch"
alt.f <- "y~treat+batch"
div <- apply(exprs(e),2,sum)
## Test
res <- msms.glm.qlll(e,alt.f,null.f,div=div)
lst <- test.results(res,e,pData(e)$treat,"U600","U200",div,
                    alpha=0.05,minSpC=2,minLFC=log2(1.8),
                    method="BH")

Here, we produce the volcano plot by hand, with the plot function. In the second plot, we limit the x axis limits and add grid lines.

plot(lst$tres$LogFC, -log10(lst$tres$p.value))

plot(lst$tres$LogFC, -log10(lst$tres$p.value),
     xlim = c(-3, 3))
grid()

Below, we use the res.volcanoplot function from the r Biocpkg("msmsTests") package. This functions uses the sample annotation stored with the quantitative data in the MSnSet object to colour the samples according to their phenotypes.

## Plot
res.volcanoplot(lst$tres,
                max.pval=0.05,
                min.LFC=1,
                maxx=3,
                maxy=NULL,
                ylbls=4)

3.8 A PCA plot

Using the counts.pca function from the msmsEDA package:

library("msmsEDA")
data(msms.dataset)
msnset <- pp.msms.data(msms.dataset)
lst <- counts.pca(msnset, wait=FALSE)

It is also possible to generate the PCA data using the prcomp. Below, we extract the coordinates of PC1 and PC2 from the counts.pca result and plot them using the plot function.

pcadata <- lst$pca$x[, 1:2]
head(pcadata)

##                  PC1       PC2
## U2.2502.1 -120.26080 -53.55270
## U2.2502.2  -99.90618 -53.89979
## U2.2502.3 -127.35928 -49.29906
## U2.2502.4 -166.04611 -39.27557
## U6.2502.1 -127.18423  37.11614
## U6.2502.2 -117.97016  47.03702

plot(pcadata[, 1], pcadata[, 2],
     xlab = "PCA1", ylab = "PCA2")
grid()

4 Plotting with R

kable(plotfuns)

plot type	traditional	lattice	ggplot2
scatterplots	plot	xyplot	geom_point
histograms	hist	histgram	geom_histogram
density plots	plot(density())	densityplot	geom_density
boxplots	boxplot	bwplot	geom_boxplot
violin plots	vioplot::vioplot	bwplot(…, panel = panel.violin)	geom_violin
line plots	plot, matplot	xyploy, parallelplot	geom_line
bar plots	barplot	barchart	geom_bar
pie charts	pie		geom_bar with polar coordinates
dot plots	dotchart	dotplot	geom_point
stip plots	stripchart	stripplot	goem_point
dendrogramms	plot(hclust())	latticeExtra package	ggdendro package
heatmaps	image, heatmap	levelplot	geom_tile

Below, we are going to use a data from the r Biocexptpkg("pRolocdata") to illustrate the plotting functions.

library("pRolocdata")
data(tan2009r1)

4.1 Scatter plots

See the MA and volcano plot examples.

The default plot type is p, for points. Other important types are l for lines and h for histogram (see below).

4.2 Historams and density plots

We extract the (normalised) intensities of the first sample

x <- exprs(tan2009r1)[, 1]

and plot the distribution with a histogram and a density plot next to each other on the same figure (using the mfrow par plotting paramter)

par(mfrow = c(1, 2))
hist(x)
plot(density(x))

4.3 box plots and violin plots

we first extract the 888 proteins by r ncol(tan2009r1) samples data matrix and plot the sample distributions next to each other using boxplot and beanplot (from the beanplot package).

library("beanplot")
x <- exprs(tan2009r1)
par(mfrow = c(2, 1))
boxplot(x)
beanplot(x[, 1], x[, 2], x[, 3], x[, 4], log = "")

4.4 line plots

below, we produce line plots that describe the protein quantitative profiles for two sets of proteins, namely er and mitochondrial proteins using matplot.

we need to transpose the matrix (with t) and set the type to both (b), to display points and lines, the colours to red and steel blue, the point characters to 1 (an empty point) and the line type to 1 (a solid line).

er <- fData(tan2009r1)$markers == "ER"
mt <- fData(tan2009r1)$markers == "mitochondrion"

par(mfrow = c(2, 1))
matplot(t(x[er, ]), type = "b", col = "red", pch = 1, lty = 1)
matplot(t(x[mt, ]), type = "b", col = "steelblue", pch = 1, lty = 1)

In the last section, about spatial proteomics, we use the specialised plotDist function from the pRoloc package to generate such figures.

4.5 Bar and dot charts

To illustrate bar and dot charts, we cound the number of proteins in the respective class using table.

x <- table(fData(tan2009r1)$markers)
x

## 
##  Cytoskeleton            ER         Golgi      Lysosome       Nucleus 
##             7            28            13             8            21 
##            PM    Peroxisome    Proteasome  Ribosome 40S  Ribosome 60S 
##            34             4            15            20            32 
## mitochondrion       unknown 
##            29           677

par(mfrow = c(1, 2))
barplot(x)
dotchart(x)

## Warning in dotchart(x): 'x' is neither a vector nor a matrix: using
## as.numeric(x)

4.6 Heatmaps

The easiest to produce a complete heatmap is with the heatmap function:

heatmap(exprs(tan2009r1))

To produce the a heatmap without the dendrograms, one can use the image function on a matrix or the specialised version for MSnSet objects from the MSnbase package.

par(mfrow = c(1, 2))
x <- matrix(1:9, ncol = 3)
image(x)
image(tan2009r1)

See also gplots’s heatmap.2 function and the Heatplus Bioconductor package for more advanced heatmaps and the corrplot package for correlation matrices.

4.7 Dendrograms

The easiest way to produce and plot a dendrogram is:

d <- dist(t(exprs(tan2009r1))) ## distance between samples
hc <- hclust(d) ## hierarchical clustering
plot(hc) ## visualisation

See also dendextend and this post to illustrate latticeExtra and ggdendro.

4.8 Venn diagrams

The limma package.
The VennDiagram package.

5 Visualising mass spectrometry data

5.1 Direct access to the raw data

library("mzR")
mzf <- pxget(px1, 6)

## Downloading 1 file

ms <- openMSfile(mzf)

hd <- header(ms)
ms1 <- which(hd$msLevel == 1)

rtsel <- hd$retentionTime[ms1] / 60 > 30 & hd$retentionTime[ms1] / 60 < 35
library("MSnbase")
(M <- MSmap(ms, ms1[rtsel], 521, 523, .005, hd))

## 1

## Object of class "MSmap"
##  Map [75, 401]
##   [1]  Retention time: 30:1 - 34:58 
##   [2]  M/Z: 521 - 523 (res 0.005)

library("lattice")
ff <- colorRampPalette(c("yellow", "steelblue"))
trellis.par.set(regions=list(col=ff(100)))
plot(M, aspect = 1, allTicks = FALSE)

M@map[msMap(M) == 0] <- NA
plot3D(M, rgl = FALSE)

To produce a version that can be reoriented interactively on the screen discplay, use the rgl

library("rgl")
plot3D(M, rgl = TRUE)

lout <- matrix(NA, ncol = 10, nrow = 8)
lout[1:2, ] <- 1
for (ii in 3:4)
    lout[ii, ] <- c(2, 2, 2, 2, 2, 2, 3, 3, 3, 3)
lout[5, ] <- rep(4:8, each = 2)
lout[6, ] <- rep(4:8, each = 2)
lout[7, ] <- rep(9:13, each = 2)
lout[8, ] <- rep(9:13, each = 2)

i <- ms1[which(rtsel)][1]
j <- ms1[which(rtsel)][2]
ms2 <- (i+1):(j-1)

layout(lout)

par(mar=c(4,2,1,1))
chromatogram(ms)
abline(v = hd[i, "retentionTime"], col = "red")


par(mar = c(3, 2, 1, 0))
plot(peaks(ms, i), type = "l", xlim = c(400, 1000))
legend("topright", bty = "n",
       legend = paste0(
           "Acquisition ", hd[i, "acquisitionNum"],  "\n",
           "Retention time ", formatRt(hd[i, "retentionTime"])))
abline(h = 0)
abline(v = hd[ms2, "precursorMZ"],
       col = c("#FF000080",
           rep("#12121280", 9)))

par(mar = c(3, 0.5, 1, 1))
plot(peaks(ms, i), type = "l", xlim = c(521, 522.5),
     yaxt = "n")
abline(h = 0)
abline(v = hd[ms2, "precursorMZ"], col = "#FF000080")

##par(mar = omar)
par(mar = c(2, 2, 0, 1))
for (ii in ms2) {
    p <- peaks(ms, ii)
    plot(p, xlab = "", ylab = "", type = "h", cex.axis = .6)
    legend("topright", legend = paste0("Prec M/Z\n",
                           round(hd[ii, "precursorMZ"], 2)),
           bty = "n", cex = .8)
}

M2 <- MSmap(ms, i:j, 100, 1000, 1, hd)

## 1

plot3D(M2)

5.1.1 MS barcoding

par(mar=c(4,1,1,1))
image(t(matrix(hd$msLevel, 1, nrow(hd))),
      xlab="Retention time",
      xaxt="n", yaxt="n", col=c("black","steelblue"))
k <- round(range(hd$retentionTime) / 60)
nk <- 5
axis(side=1, at=seq(0,1,1/nk), labels=seq(k[1],k[2],k[2]/nk))

5.1.2 Animation

The following animation scrolls over 5 minutes of retention time for a MZ range between 521 and 523.

library("animation")
an1 <- function() {
    for (i in seq(0, 5, 0.2)) {
        rtsel <- hd$retentionTime[ms1] / 60 > (30 + i) &
            hd$retentionTime[ms1] / 60 < (35 + i)
        M <- MSmap(ms, ms1[rtsel], 521, 523, .005, hd)
        M@map[msMap(M) == 0] <- NA
        print(plot3D(M, rgl = FALSE))
    }
}

saveGIF(an1(), movie.name = "msanim1.gif")

MS animation 1

The code chunk below scrolls of a slice of retention times while keeping the retention time constant between 30 and 35 minutes.

an2 <- function() {
    for (i in seq(0, 2.5, 0.1)) {
        rtsel <- hd$retentionTime[ms1] / 60 > 30 & hd$retentionTime[ms1] / 60 < 35
        mz1 <- 520 + i
        mz2 <- 522 + i
        M <- MSmap(ms, ms1[rtsel], mz1, mz2, .005, hd)
        M@map[msMap(M) == 0] <- NA
        print(plot3D(M, rgl = FALSE))
    }
}

saveGIF(an2(), movie.name = "msanim2.gif")

MS animation 2

5.2 The MSnbase infrastructure

library("MSnbase")
data(itraqdata)
itraqdata2 <- pickPeaks(itraqdata, verbose = FALSE)
plot(itraqdata[[25]], full=TRUE, reporters = iTRAQ4)

par(oma = c(0, 0, 0, 0))
par(mar = c(4, 4, 1, 1))
plot(itraqdata2[[25]], itraqdata2[[28]], sequences = rep("IMIDLDGTENK", 2))

## Modifications used: C=57.02146
## Modifications used: C=57.02146

5.3 The protViz package

library("protViz")
data(msms)

fi <- fragmentIon("TAFDEAIAELDTLNEESYK")
fi.cyz <- as.data.frame(cbind(c=fi[[1]]$c, y=fi[[1]]$y, z=fi[[1]]$z))
     
p <- peakplot("TAFDEAIAELDTLNEESYK",
              spec = msms[[1]],
              fi = fi.cyz,
              itol = 0.6,
              ion.axes = FALSE)

The peakplot function return the annotation of the MSMS spectrum that is plotted:

str(p)

## List of 7
##  $ mZ.Da.error : num [1:57] 215.3 144.27 -2.8 -17.06 2.03 ...
##  $ mZ.ppm.error: num [1:57] 1808046 758830 -8306 -37724 3501 ...
##  $ idx         : num [1:57] 1 1 1 3 16 24 41 52 67 88 ...
##  $ label       : chr [1:57] "c1" "c2" "c3" "c4" ...
##  $ score       : num -1
##  $ sequence    : chr "TAFDEAIAELDTLNEESYK"
##  $ fragmentIon :'data.frame':    19 obs. of  3 variables:
##   ..$ c: num [1:19] 119 190 337 452 581 ...
##   ..$ y: num [1:19] 147 310 397 526 655 ...
##   ..$ z: num [1:19] 130 293 380 509 638 ...

5.4 Preprocessing of MALDI-MS spectra

The following code chunks demonstrate the usage of the mass spectrometry preprocessing and plotting routines in the r CRANpkg("MALDIquant") package. MALDIquant uses the traditional graphics system. Therefore MALDIquant overloads the traditional functions plot, lines and points for its own data types. These data types represents spectrum and peak lists as S4 classes. Please see the MALDIquant vignette and the corresponding website for more details.

After loading some example data a simple plot draws the raw spectrum.

library("MALDIquant")

data("fiedler2009subset", package="MALDIquant")

plot(fiedler2009subset[[14]])

After some preprocessing, namely variance stabilization and smoothing, we use lines to draw our baseline estimate in our processed spectrum.

transformedSpectra <- transformIntensity(fiedler2009subset, method = "sqrt")
smoothedSpectra <- smoothIntensity(transformedSpectra, method = "SavitzkyGolay")

plot(smoothedSpectra[[14]])
lines(estimateBaseline(smoothedSpectra[[14]]), lwd = 2, col = "red")

After removing the background removal we could use plot again to draw our baseline corrected spectrum.

rbSpectra <- removeBaseline(smoothedSpectra)
plot(rbSpectra[[14]])

detectPeaks returns a MassPeaks object that offers the same traditional graphics functions. The next code chunk demonstrates how to mark the detected peaks in a spectrum.

cbSpectra <- calibrateIntensity(rbSpectra, method = "TIC")
peaks <- detectPeaks(cbSpectra, SNR = 5)

plot(cbSpectra[[14]])
points(peaks[[14]], col = "red", pch = 4, lwd = 2)

Additional there is a special function labelPeaks that allows to draw the M/Z values above the corresponding peaks. Next we mark the 5 top peaks in the spectrum.

top5 <- intensity(peaks[[14]]) %in% sort(intensity(peaks[[14]]),
                                         decreasing = TRUE)[1:5]
labelPeaks(peaks[[14]], index = top5, avoidOverlap = TRUE)

Often multiple spectra have to be recalibrated to be comparable. Therefore MALDIquant warps the spectra according to so called reference or landmark peaks. For debugging the determineWarpingFunctions function offers some warping plots. Here we show only the last 4 plots:

par(mfrow = c(2, 2))
warpingFunctions <-
    determineWarpingFunctions(peaks,
                              tolerance = 0.001,
                              plot = TRUE,
                              plotInteractive = TRUE)

par(mfrow = c(1, 1))
warpedSpectra <- warpMassSpectra(cbSpectra, warpingFunctions)
warpedPeaks <- warpMassPeaks(peaks, warpingFunctions)

In the next code chunk we visualise the need and the effect of the recalibration.

sel <- c(2, 10, 14, 16)
xlim <- c(4180, 4240)
ylim <- c(0, 1.9e-3)
lty <- c(1, 4, 2, 6)

par(mfrow = c(1, 2))
plot(cbSpectra[[1]], xlim = xlim, ylim = ylim, type = "n")

for (i in seq(along = sel)) {
  lines(peaks[[sel[i]]], lty = lty[i], col = i)
  lines(cbSpectra[[sel[i]]], lty = lty[i], col = i)
}

plot(cbSpectra[[1]], xlim = xlim, ylim = ylim, type = "n")

for (i in seq(along = sel)) {
  lines(warpedPeaks[[sel[i]]], lty = lty[i], col = i)
  lines(warpedSpectra[[sel[i]]], lty = lty[i], col = i)
}

par(mfrow = c(1, 1))

The code chunks above generate plots that are very similar to the figure 7 in the corresponding paper “Visualisation of proteomics data using R”. Please find the code to exactly reproduce the figure at: https://github.com/sgibb/MALDIquantExamples/blob/master/R/createFigure1_color.R

6 Genomic and protein sequences

These visualisations originate from the Pbase Pbase-data and mapping vignettes.

7 Mass spectrometry imaging

The following code chunk downloads a MALDI imaging dataset from a mouse kidney shared by Adrien Nyakas and Stefan Schurch and generates a plot with the mean spectrum and three slices of interesting M/Z regions.

library("MALDIquant")
library("MALDIquantForeign")

spectra <- importBrukerFlex("http://files.figshare.com/1106682/MouseKidney_IMS_testdata.zip", verbose = FALSE)

spectra <- smoothIntensity(spectra, "SavitzkyGolay",  halfWindowSize = 8)
spectra <- removeBaseline(spectra, method = "TopHat", halfWindowSize = 16)
spectra <- calibrateIntensity(spectra, method = "TIC")
avgSpectrum <- averageMassSpectra(spectra)
avgPeaks <- detectPeaks(avgSpectrum, SNR = 5)

avgPeaks <- avgPeaks[intensity(avgPeaks) > 0.0015]

oldPar <- par(no.readonly = TRUE)
layout(matrix(c(1,1,1,2,3,4), nrow = 2, byrow = TRUE))
plot(avgSpectrum, main = "mean spectrum",
     xlim = c(3000, 6000), ylim = c(0, 0.007))
lines(avgPeaks, col = "red")
labelPeaks(avgPeaks, cex = 1)

par(mar = c(0.5, 0.5, 1.5, 0.5))
plotMsiSlice(spectra,
             center = mass(avgPeaks),
             tolerance = 1,
             plotInteractive = TRUE)
par(oldPar)

mqmsi )]

7.1 An interactive shiny app for Imaging mass spectrometry

There is also an interactive MALDIquant IMS shiny app for demonstration purposes. A screen shot is displayed below. To start the application:

library("shiny")
runGitHub("sgibb/ims-shiny")

ims-shiny screeshot

8 Spatial proteomics

library("pRoloc")
library("pRolocdata")

data(tan2009r1)

## these params use class weights
fn <- dir(system.file("extdata", package = "pRoloc"),
          full.names = TRUE, pattern = "params2.rda")
load(fn)

setStockcol(NULL)
setStockcol(paste0(getStockcol(), 90))

w <- table(fData(tan2009r1)[, "pd.markers"])
(w <- 1/w[names(w) != "unknown"])

## 
##  Cytoskeleton            ER         Golgi      Lysosome       Nucleus 
##    0.14285714    0.05000000    0.16666667    0.12500000    0.05000000 
##            PM    Peroxisome    Proteasome  Ribosome 40S  Ribosome 60S 
##    0.06666667    0.25000000    0.09090909    0.07142857    0.04000000 
## mitochondrion 
##    0.07142857

tan2009r1 <- svmClassification(tan2009r1, params2,
                               class.weights = w,
                               fcol = "pd.markers")
ptsze <- exp(fData(tan2009r1)$svm.scores) - 1

lout <- matrix(c(1:4, rep(5, 4)), ncol = 4, nrow = 2)
layout(lout)
cls <- getStockcol()
par(mar = c(4, 4, 1, 1))
plotDist(tan2009r1[which(fData(tan2009r1)$PLSDA == "mitochondrion"), ],
         markers = featureNames(tan2009r1)[which(fData(tan2009r1)$markers.orig == "mitochondrion")],
         mcol = cls[5])
legend("topright", legend = "mitochondrion", bty = "n")
plotDist(tan2009r1[which(fData(tan2009r1)$PLSDA == "ER/Golgi"), ],
         markers = featureNames(tan2009r1)[which(fData(tan2009r1)$markers.orig == "ER")],
         mcol = cls[2])
legend("topright", legend = "ER", bty = "n")
plotDist(tan2009r1[which(fData(tan2009r1)$PLSDA == "ER/Golgi"), ],
         markers = featureNames(tan2009r1)[which(fData(tan2009r1)$markers.orig == "Golgi")],
         mcol = cls[3])
legend("topright", legend = "Golgi", bty = "n")
plotDist(tan2009r1[which(fData(tan2009r1)$PLSDA == "PM"), ],
         markers = featureNames(tan2009r1)[which(fData(tan2009r1)$markers.orig == "PM")],
         mcol = cls[8])
legend("topright", legend = "PM", bty = "n")
plot2D(tan2009r1, fcol = "svm", cex = ptsze, method = "kpca")
addLegend(tan2009r1, where = "bottomleft", fcol = "svm", bty = "n")

See the pRoloc-tutorial vignette (pdf) from the pRoloc package for details about spatial proteomics data analysis and visualisation.

9 Session information

print(sessionInfo(), locale = FALSE)

## R version 3.3.1 (2016-06-21)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.1 LTS
## 
## attached base packages:
## [1] stats4    parallel  stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] protViz_0.2.15         beanplot_1.2           ggplot2_2.1.0         
##  [4] lattice_0.20-34        msmsTests_1.12.0       msmsEDA_1.12.0        
##  [7] pRolocdata_1.12.0      pRoloc_1.14.0          MLInterfaces_1.54.0   
## [10] cluster_2.0.5          annotate_1.52.0        XML_3.98-1.4          
## [13] AnnotationDbi_1.36.0   IRanges_2.8.0          S4Vectors_0.12.0      
## [16] MALDIquantForeign_0.10 MALDIquant_1.15        RColorBrewer_1.1-2    
## [19] xtable_1.8-2           rpx_1.10.0             knitr_1.14            
## [22] BiocInstaller_1.24.0   RforProteomics_1.12.0  MSnbase_2.0.0         
## [25] ProtGenerics_1.6.0     BiocParallel_1.8.0     mzR_2.8.0             
## [28] Rcpp_0.12.7            Biobase_2.34.0         BiocGenerics_0.20.0   
## [31] BiocStyle_2.2.0       
## 
## loaded via a namespace (and not attached):
##   [1] plyr_1.8.4                    GSEABase_1.36.0              
##   [3] splines_3.3.1                 ggvis_0.4.3                  
##   [5] digest_0.6.10                 foreach_1.4.3                
##   [7] htmltools_0.3.5               gdata_2.17.0                 
##   [9] magrittr_1.5                  doParallel_1.0.10            
##  [11] sfsmisc_1.1-0                 limma_3.30.0                 
##  [13] rda_1.0.2-2                   R.utils_2.4.0                
##  [15] lpSolve_5.6.13                colorspace_1.2-7             
##  [17] dplyr_0.5.0                   RCurl_1.95-4.8               
##  [19] jsonlite_1.1                  readBrukerFlexData_1.8.2     
##  [21] graph_1.52.0                  genefilter_1.56.0            
##  [23] lme4_1.1-12                   impute_1.48.0                
##  [25] survival_2.39-5               iterators_1.0.8              
##  [27] gtable_0.2.0                  zlibbioc_1.20.0              
##  [29] readMzXmlData_2.8.1           MatrixModels_0.4-1           
##  [31] car_2.1-3                     kernlab_0.9-25               
##  [33] prabclus_2.2-6                DEoptimR_1.0-6               
##  [35] SparseM_1.72                  scales_0.4.0                 
##  [37] vsn_3.42.2                    mvtnorm_1.0-5                
##  [39] edgeR_3.16.0                  DBI_0.5-1                    
##  [41] proxy_0.4-16                  mclust_5.2                   
##  [43] preprocessCore_1.36.0         htmlwidgets_0.7              
##  [45] sampling_2.7                  threejs_0.2.2                
##  [47] gplots_3.0.1                  FNN_1.1                      
##  [49] fpc_2.1-10                    modeltools_0.2-21            
##  [51] R.methodsS3_1.7.1             flexmix_2.3-13               
##  [53] nnet_7.3-12                   locfit_1.5-9.1               
##  [55] RJSONIO_1.3-0                 caret_6.0-71                 
##  [57] labeling_0.3                  reshape2_1.4.1               
##  [59] munsell_0.4.3                 mlbench_2.1-1                
##  [61] biocViews_1.42.0              tools_3.3.1                  
##  [63] RSQLite_1.0.0                 pls_2.5-0                    
##  [65] evaluate_0.10                 stringr_1.1.0                
##  [67] mzID_1.12.0                   yaml_2.1.13                  
##  [69] robustbase_0.92-6             caTools_1.17.1               
##  [71] randomForest_4.6-12           dendextend_1.3.0             
##  [73] RBGL_1.50.0                   nlme_3.1-128                 
##  [75] whisker_0.3-2                 mime_0.5                     
##  [77] quantreg_5.29                 formatR_1.4                  
##  [79] R.oo_1.20.0                   biomaRt_2.30.0               
##  [81] pbkrtest_0.4-6                interactiveDisplayBase_1.12.0
##  [83] e1071_1.6-7                   affyio_1.44.0                
##  [85] tibble_1.2                    stringi_1.1.2                
##  [87] highr_0.6                     trimcluster_0.1-2            
##  [89] Matrix_1.2-7.1                nloptr_1.0.4                 
##  [91] gbm_2.1.1                     RUnit_0.4.31                 
##  [93] bitops_1.0-6                  qvalue_2.6.0                 
##  [95] httpuv_1.3.3                  R6_2.2.0                     
##  [97] pcaMethods_1.66.0             affy_1.52.0                  
##  [99] hwriter_1.3.2                 KernSmooth_2.23-15           
## [101] gridSVG_1.5-0                 codetools_0.2-15             
## [103] MASS_7.3-45                   gtools_3.5.0                 
## [105] assertthat_0.1                interactiveDisplay_1.12.0    
## [107] Category_2.40.0               diptest_0.75-7               
## [109] mgcv_1.8-15                   grid_3.3.1                   
## [111] rpart_4.1-10                  class_7.3-14                 
## [113] minqa_1.2.4                   rmarkdown_1.1                
## [115] shiny_0.14.1                  base64enc_0.1-3

Visualisation of proteomics data using R and Bioconductor

Contents