xcms 4.0.2
Package: xcms
Authors: Johannes Rainer, Michael Witting
Modified: 2023-10-24 14:06:21.204332
Compiled: Tue Jan 16 19:49:24 2024
Metabolite identification is an important step in non-targeted metabolomics and requires different steps. One involves the use of tandem mass spectrometry to generate fragmentation spectra of detected metabolites (LC-MS/MS), which are then compared to fragmentation spectra of known metabolites. Different approaches exist for the generation of these fragmentation spectra, whereas the most used is data dependent acquisition (DDA) also known as the top-n method. In this method the top N most intense ions (m/z values) from a MS1 scan are selected for fragmentation in the next N scans before the cycle starts again. This method allows to generate clean MS2 fragmentation spectra on the fly during acquisition without the need for further experiments, but suffers from poor coverage of the detected metabolites (since only a limited number of ions are fragmented) and less accurate quantification of the compounds (since fewer MS1 scans are generated).
Data independent approaches (DIA) like Bruker bbCID, Agilent AllIons or Waters MSe don’t use such a preselection, but rather fragment all detected molecules at once. They are using alternating schemes with scan of low and high collision energy to collect MS1 and MS2 data. Using this approach, there is no problem in coverage, but the relation between the precursor and fragment masses is lost leading to chimeric spectra. Sequential Window Acquisition of all Theoretical Mass Spectra (or SWATH [1]) combines both approaches through a middle-way approach. There is no precursor selection and acquisition is independent of acquired data, but rather than isolating all precusors at once, defined windows (i.e. ranges of m/z values) are used and scanned. This reduces the overlap of fragment spectra while still keeping a high coverage.
This document showcases the analysis of two small LC-MS/MS data sets using xcms. The data files used are reversed-phase LC-MS/MS runs from the Agilent Pesticide mix obtained from a Sciex 6600 Triple ToF operated in SWATH acquisition mode. For comparison a DDA file from the same sample is included.
Below we load the example DDA data set and create a total ion chromatogram of its MS1 data.
library(xcms)
library(MsExperiment)
dda_file <- system.file("TripleTOF-SWATH", "PestMix1_DDA.mzML",
package = "msdata")
dda_data <- readMsExperiment(dda_file)
chr <- chromatogram(dda_data, aggregationFun = "sum", msLevel = 1L)
According to the TIC most of the signal is measured between ~ 200 and 600 seconds (see plot below). We thus filter the DDA data to this retention time range.
plot(chr)
abline(v = c(230, 610))
## filter the data
dda_data <- filterRt(dda_data, rt = c(230, 610))
## Filter spectra
The variable dda_data
contains now all MS1 and MS2 spectra from the specified
mzML file. The number of spectra for each MS level is listed below. Note that we
subset the experiment to the first data file (using [1]
) and then access
directly the spectra within this sample with the spectra
function (which
returns a Spectra
object from the Spectra package). Note that
we use the pipe operator |>
for better readability.
dda_data[1] |>
spectra() |>
msLevel() |>
table()
##
## 1 2
## 1389 2214
For the MS2 spectra we can get the m/z of the precursor ion with the
precursorMz
function. Below we first subset the data set again to a single
sample and filter to spectra from MS level 2 extracting then their precursor m/z
values.
dda_data[1] |>
spectra() |>
filterMsLevel(2) |>
precursorMz() |>
head()
## [1] 182.18777 182.18893 55.00579 182.19032 237.12296 237.11987
With the precursorIntensity
function it is also possible to extract the
intensity of the precursor ion.
dda_data[1] |>
spectra() |>
filterMsLevel(2) |>
precursorIntensity() |>
head()
## [1] 0 0 0 0 0 0
Some manufacturers (like Sciex) don’t define/export the precursor intensity and
thus either NA
or 0
is reported. We can however use the
estimatePrecursorIntensity
function from the Spectra package to
determine the precursor intensity for a MS 2 spectrum based on the intensity of
the respective ion in the previous MS1 scan (note that with method = "interpolation"
the precursor intensity would be defined based on interpolation
between the intensity in the previous and subsequent MS1 scan). Below we
estimate the precursor intensities, on the full data (for MS1 spectra a NA
value is reported).
prec_int <- estimatePrecursorIntensity(spectra(dda_data))
We next set the precursor intensity in the spectrum metadata of dda_data
. So
that it can be extracted later with the precursorIntensity
function.
spectra(dda_data)$precursorIntensity <- prec_int
dda_data[1] |>
spectra() |>
filterMsLevel(2) |>
precursorIntensity() |>
head()
## [1] NA 9.198155 2.773988 27.590797 3.443145 7.621923
Next we perform the chromatographic peak detection on the MS level 1 data with
the findChromPeaks
method. Below we define the settings for a centWave-based
peak detection and perform the analysis.
cwp <- CentWaveParam(snthresh = 5, noise = 100, ppm = 10,
peakwidth = c(3, 30))
dda_data <- findChromPeaks(dda_data, param = cwp, msLevel = 1L)
In total 114 peaks were identified in the present data set.
The advantage of LC-MS/MS data is that (MS1) ions are fragmented and the
corresponding MS2 spectra measured. Thus, for some of the ions (identified as
MS1 chromatographic peaks) MS2 spectra are available. These can facilitate the
annotation of the respective MS1 chromatographic peaks (or MS1 features after a
correspondence analysis). Spectra for identified chromatographic peaks can be
extracted with the chromPeakSpectra
method. MS2 spectra with their precursor
m/z and retention time within the rt and m/z range of the chromatographic peak
are returned.
library(Spectra)
dda_spectra <- chromPeakSpectra(dda_data, msLevel = 2L)
dda_spectra
## MSn data (Spectra) with 142 spectra in a MsBackendMzR backend:
## msLevel rtime scanIndex
## <integer> <numeric> <integer>
## 1 2 237.869 1812
## 2 2 241.299 1846
## 3 2 325.763 2439
## 4 2 326.583 2446
## 5 2 327.113 2450
## ... ... ... ...
## 138 2 574.725 5110
## 139 2 575.255 5115
## 140 2 596.584 5272
## 141 2 592.424 5236
## 142 2 596.054 5266
## ... 34 more variables/columns.
##
## file(s):
## PestMix1_DDA.mzML
## Processing:
## Filter: select retention time [230..610] on MS level(s) 1 2 [Tue Jan 16 19:49:26 2024]
## Filter: select MS level(s) 2 [Tue Jan 16 19:49:36 2024]
## Merge 1 Spectra into one [Tue Jan 16 19:49:36 2024]
By default chromPeakSpectra
returns all spectra associated with a MS1
chromatographic peak, but parameter method
allows to choose and return only
one spectrum per peak (have a look at the ?chromPeakSpectra
help page for more
details). Also, it would be possible to extract MS1 spectra for each peak by
specifying msLevel = 1L
in the call above (e.g. to evaluate the full MS1
signal at the peak’s apex position).
The returned Spectra
contains also the reference to the respective
chromatographic peak as additional spectra variable "peak_id"
that contains
the identifier for the chromatographic peak (i.e. its row name in the
chromPeaks
matrix).
dda_spectra$peak_id
## [1] "CP004" "CP004" "CP005" "CP005" "CP006" "CP006" "CP008" "CP008" "CP011"
## [10] "CP011" "CP012" "CP012" "CP013" "CP013" "CP013" "CP013" "CP014" "CP014"
## [19] "CP014" "CP014" "CP018" "CP022" "CP022" "CP022" "CP025" "CP025" "CP025"
## [28] "CP025" "CP026" "CP026" "CP026" "CP026" "CP033" "CP033" "CP034" "CP034"
## [37] "CP034" "CP034" "CP034" "CP035" "CP035" "CP035" "CP041" "CP041" "CP041"
## [46] "CP042" "CP042" "CP042" "CP043" "CP047" "CP047" "CP049" "CP049" "CP049"
## [55] "CP049" "CP050" "CP050" "CP050" "CP051" "CP051" "CP051" "CP054" "CP055"
## [64] "CP055" "CP055" "CP056" "CP056" "CP056" "CP056" "CP056" "CP060" "CP060"
## [73] "CP060" "CP060" "CP064" "CP064" "CP065" "CP065" "CP066" "CP066" "CP069"
## [82] "CP069" "CP069" "CP070" "CP070" "CP070" "CP072" "CP072" "CP072" "CP073"
## [91] "CP074" "CP074" "CP074" "CP074" "CP075" "CP075" "CP075" "CP077" "CP077"
## [100] "CP077" "CP079" "CP079" "CP079" "CP079" "CP080" "CP080" "CP080" "CP081"
## [109] "CP086" "CP086" "CP086" "CP086" "CP086" "CP088" "CP088" "CP088" "CP089"
## [118] "CP089" "CP091" "CP091" "CP093" "CP093" "CP094" "CP094" "CP094" "CP095"
## [127] "CP095" "CP095" "CP096" "CP096" "CP096" "CP098" "CP098" "CP098" "CP098"
## [136] "CP098" "CP099" "CP099" "CP099" "CP100" "CP101" "CP101"
Note also that with return.type = "List"
a list parallel to the chromPeaks
matrix would be returned, i.e. each element in that list would contain the
spectra for the chromatographic peak with the same index. Such data
representation might eventually simplify further processing.
We next use the MS2 information to aid in the annotation of a chromatographic peak. As an example we use a chromatographic peak of an ion with an m/z of 304.1131 which we extract in the code block below.
ex_mz <- 304.1131
chromPeaks(dda_data, mz = ex_mz, ppm = 20)
## mz mzmin mzmax rt rtmin rtmax into intb
## CP056 304.1133 304.1126 304.1143 425.024 417.985 441.773 13040.8 13007.79
## maxo sn sample
## CP056 3978.987 232 1
A search of potential ions with a similar m/z in a reference database (e.g. Metlin) returned a large list of potential hits, most with a very small ppm. For two of the hits, Flumazenil (Metlin ID 2724) and Fenamiphos (Metlin ID 72445) experimental MS2 spectra are available. Thus, we could match the MS2 spectrum for the identified chromatographic peak against these to annotate our ion. Below we extract all MS2 spectra that were associated with the candidate chromatographic peak using the ID of the peak in the present data set.
ex_id <- rownames(chromPeaks(dda_data, mz = ex_mz, ppm = 20))
ex_spectra <- dda_spectra[dda_spectra$peak_id == ex_id]
ex_spectra
## MSn data (Spectra) with 5 spectra in a MsBackendMzR backend:
## msLevel rtime scanIndex
## <integer> <numeric> <integer>
## 1 2 418.926 3505
## 2 2 419.306 3510
## 3 2 423.036 3582
## 4 2 423.966 3603
## 5 2 424.296 3609
## ... 34 more variables/columns.
##
## file(s):
## PestMix1_DDA.mzML
## Processing:
## Filter: select retention time [230..610] on MS level(s) 1 2 [Tue Jan 16 19:49:26 2024]
## Filter: select MS level(s) 2 [Tue Jan 16 19:49:36 2024]
## Merge 1 Spectra into one [Tue Jan 16 19:49:36 2024]
There are 5 MS2 spectra representing fragmentation of the ion(s) measured
in our candidate chromatographic peak. We next reduce this to a single MS2
spectrum using the combineSpectra
method employing the combinePeaks
function to determine which peaks to keep in the resulting spectrum (have a look
at the ?combinePeaks
help page for details). Parameter f
allows to specify
which spectra in the input object should be combined into one. Note that this
combination of multiple fragment spectra into a single spectrum might not be
generally the best approach or suggested for all types of data.
ex_spectrum <- combineSpectra(ex_spectra, FUN = combinePeaks, ppm = 20,
peaks = "intersect", minProp = 0.8,
intensityFun = median, mzFun = median,
f = ex_spectra$peak_id)
## Warning: 'combinePeaks' for lists of peak matrices is deprecated; please use
## 'combinePeaksData' instead.
ex_spectrum
## MSn data (Spectra) with 1 spectra in a MsBackendMemory backend:
## msLevel rtime scanIndex
## <integer> <numeric> <integer>
## 1 2 418.926 3505
## ... 34 more variables/columns.
## Processing:
## Filter: select retention time [230..610] on MS level(s) 1 2 [Tue Jan 16 19:49:26 2024]
## Filter: select MS level(s) 2 [Tue Jan 16 19:49:36 2024]
## Merge 1 Spectra into one [Tue Jan 16 19:49:36 2024]
## ...1 more processings. Use 'processingLog' to list all.
Mass peaks from all input spectra with a difference in m/z smaller 20 ppm
(parameter ppm
) were combined into one peak and the median m/z and intensity
is reported for these. Due to parameter minProp = 0.8
, the resulting MS2
spectrum contains only peaks that were present in 80% of the input spectra.
A plot of this consensus spectrum is shown below.
plotSpectra(ex_spectrum)
We could now match the consensus spectrum against a database of MS2 spectra. In
our example we simply load MS2 spectra for the two compounds with matching m/z
exported from Metlin. For each of the compounds MS2 spectra created with
collision energies of 0V, 10V, 20V and 40V are available. Below we import the
respective data and plot our candidate spectrum against the MS2 spectra of
Flumanezil and Fenamiphos (from a collision energy of 20V). To import files in
MGF format we have to load the MsBackendMgf Bioconductor package
which adds MGF file support to the Spectra
package.
Prior plotting we scale our experimental spectra to replace all peak intensities with values relative to the maximum peak intensity (which is set to a value of 100).
scale_fun <- function(z, ...) {
z[, "intensity"] <- z[, "intensity"] /
max(z[, "intensity"], na.rm = TRUE) * 100
z
}
ex_spectrum <- addProcessing(ex_spectrum, FUN = scale_fun)
library(MsBackendMgf)
flumanezil <- Spectra(
system.file("mgf", "metlin-2724.mgf", package = "xcms"),
source = MsBackendMgf())
## Start data import from 1 files ... done
fenamiphos <- Spectra(
system.file("mgf", "metlin-72445.mgf", package = "xcms"),
source = MsBackendMgf())
## Start data import from 1 files ... done
par(mfrow = c(1, 2))
plotSpectraMirror(ex_spectrum, flumanezil[3], main = "against Flumanezil",
ppm = 40)
plotSpectraMirror(ex_spectrum, fenamiphos[3], main = "against Fenamiphos",
ppm = 40)