Contents

1 Introduction

Mass spectrometry measures data in so called profile mode, were the signal corresponding to a specific ion is distributed around the ion’s actual m/z value (Smith et al. 2014). The accuracy of that signal depends on the resolution and settings of the instrument. Profile mode data can be processed into centroid data by retaining only a single, representative value, typically the local maximum of the distribution of data points. This centroiding substantially reduces the amount of data without much loss of information. Certain algorithms, such as the centWave method in the xcms package for chromatographic peak detection in LC-MS experiments or proteomics search engines that match MS2 spectra to peptides, require the data to be in centroid mode. In this vignette, we will focus on metabolomics data.

Many manufacturers apply centroiding of the profile data, either directly during the acquisition or immediately thereafter so that the user immediately receives processed data. Alternatively, third party software, such as msconvert from the proteowizard suite (Chambers et al. 2012) allow to apply various data centroiding algorithms, including vendor methods. In some cases however, the software provided by some vendors generate centroided data of poor quality. MSnbase also provides some functionality to perform centroiding of profile MS data. These processed data can then be further quantified or analysed within R or serialised to mzML files, and used as input for other software.

2 Centroiding of profile-mode MS data

In this vignette we use a subset of a metabolomics profile-mode LC-MS data of pooled human serum samples measured on a AB Sciex TripleTOF 5600+ mass spectrometer (the employed chromatography was a hydrophilic interaction high-performance liquid chromatography (HILIC HPLC)). The mzML file contains profile mode data for an m/z range from 105 to 130 and a retention time from 0 to 240 seconds. For more details on the sample see ?msdata::sciexdata. Below we load the required packages and read the MS data.

library("MSnbase")
library("msdata")
library("magrittr")

fl <- dir(system.file("sciex", package = "msdata"), full.names = TRUE)[2]
basename(fl)
## [1] "20171016_POOL_POS_3_105-134.mzML"
data_prof <- readMSData(fl, mode = "onDisk", centroided = FALSE)

We next extract the profile MS data for the [M+H]+ adduct of serine with the expected m/z of 106.049871. We thus filter the data_prof object using an m/z range containing the signal for the metabolite and a retention time window from 175 to 187 seconds corresponding to the time when the analyte elutes from the LC.

## Define the mz and retention time ranges
serine_mz <- 106.049871
mzr <- c(serine_mz - 0.01, serine_mz + 0.01)
rtr <- c(175, 187)

## Filtering the object
serine <- data_prof %>%
    filterRt(rtr) %>%
    filterMz(mzr)

We can now plot the profile MS data for serine.

plot(serine, type = "XIC")
abline(h = serine_mz, col = "red", lty = 2)
MS profile data for serine. Upper panel shows the base peak chromatogram (BPC), lower panel the individual signals in the retention time - m/z space. The horizontal dashed red line indicates the theoretical m/z of the [M+H]+ adduct of serine.

Figure 1: MS profile data for serine
Upper panel shows the base peak chromatogram (BPC), lower panel the individual signals in the retention time - m/z space. The horizontal dashed red line indicates the theoretical m/z of the [M+H]+ adduct of serine.

The lower panel in the plot above shows all the individual signal intensities measured by the mass spectrometer over the retention time and the m/z ranges of interest. The upper panel displays the base peak chromatogram (BPC), which represents the maximum signal (across the range of m/z values) for each discrete retention time. The rows of points in this lower panel indicate the resolution of the mass spectrometer while the columns of data points (i.e. the data collected for a discrete retention time point) represents the signal for the ion in one spectrum.

Below we plot the signal for one of of the 43 spectra containing signal for serine, the one at retention time 181.07

plot(serine[[22]])