xcms 3.18.0
Package: xcms
Authors: Johannes Rainer
Modified: 2022-04-26 13:56:36
Compiled: Tue Apr 26 18:26:04 2022
This documents describes data import, exploration, preprocessing and analysis of
LCMS experiments with xcms
version >= 3. The examples and basic workflow was
adapted from the original LC/MS Preprocessing and Analysis with xcms vignette
from Colin A. Smith.
The new user interface and methods use the XCMSnExp
object (instead of the
old xcmsSet
object) as a container for the pre-processing results. To
support packages and pipelines relying on the xcmsSet
object, it is however
possible to convert an XCMSnExp
into a xcmsSet
object using the as
(i.e.xset <- as(x, "xcmsSet")
, with x
being an XCMSnExp
supports analysis of LC/MS data from files in (AIA/ANDI) NetCDF,
mzXML and mzML format. For the actual data import Bioconductor’s
mzR is used. For demonstration purpose we will analyze a
subset of the data from [1] in which the metabolic consequences of
knocking out the fatty acid amide hydrolase (FAAH) gene in mice was
investigated. The raw data files (in NetCDF format) are provided with the
data package. The data set consists of samples from the spinal cords of
6 knock-out and 6 wild-type mice. Each file contains data in centroid mode
acquired in positive ion mode form 200-600 m/z and 2500-4500 seconds. To speed
up processing of this vignette we will restrict the analysis to only 8 files and
to the retention time range from 2500 to 3500 seconds.
Below we load all required packages, locate the raw CDF files within the
package and build a phenodata data frame describing the experimental
setup. Note that for real experiments it is suggested to define a file (table)
that contains the file names of the raw data files along with descriptions of
the samples for each file as additional columns. Such a file could then be
imported with e.g. read.table
as variable pd
(instead of being defined
within R as in the example below) and the file names could be passed along to
the readMSData
function below with e.g.
files = paste0(MZML_PATH, "/", pd$mzML_file)
would be the
path to directory in which the files are located and "mzML_file"
the name of
the column in the phenodata file that contains the file names.
## Get the full path to the CDF files
cdfs <- dir(system.file("cdf", package = "faahKO"), full.names = TRUE,
recursive = TRUE)[c(1, 2, 5, 6, 7, 8, 11, 12)]
## Create a phenodata data.frame
pd <- data.frame(sample_name = sub(basename(cdfs), pattern = ".CDF",
replacement = "", fixed = TRUE),
sample_group = c(rep("KO", 4), rep("WT", 4)),
stringsAsFactors = FALSE)
Subsequently we load the raw data as an OnDiskMSnExp
object using the
method from the MSnbase package. The MSnbase
provides based structures and infrastructure for the processing of mass
spectrometry data. Also, MSnbase
can be used to centroid profile-mode MS
data (see the corresponding vignette in the MSnbase
raw_data <- readMSData(files = cdfs, pdata = new("NAnnotatedDataFrame", pd),
mode = "onDisk")
We next restrict the data set to the retention time range from 2500 to 3500 seconds. This is merely to reduce the processing time of this vignette.
raw_data <- filterRt(raw_data, c(2500, 3500))
The resulting OnDiskMSnExp
object contains general information about the
number of spectra, retention times, the measured total ion current etc, but does
not contain the full raw data (i.e. the m/z and intensity values from each
measured spectrum). Its memory footprint is thus rather small making it an ideal
object to represent large metabolomics experiments while allowing to perform
simple quality controls, data inspection and exploration as well as data
sub-setting operations. The m/z and intensity values are imported from the raw
data files on demand, hence the location of the raw data files should not be
changed after initial data import.
The OnDiskMSnExp
organizes the MS data by spectrum and provides the methods
, mz
and rtime
to access the raw data from the files (the measured
intensity values, the corresponding m/z and retention time values). In addition,
the spectra
method could be used to return all data encapsulated in Spectrum
objects. Below we extract the retention time values from the object.
## F1.S0001 F1.S0002 F1.S0003 F1.S0004 F1.S0005 F1.S0006
## 2501.378 2502.943 2504.508 2506.073 2507.638 2509.203
All data is returned as one-dimensional vectors (a numeric vector for rtime
and a list
of numeric vectors for mz
and intensity
, each containing the
values from one spectrum), even if the experiment consists of multiple
files/samples. The fromFile
function returns an integer vector providing
the mapping of the values to the originating file. Below we use the fromFile
indices to organize the mz
values by file.
mzs <- mz(raw_data)
## Split the list by file
mzs_by_file <- split(mzs, f = fromFile(raw_data))
## [1] 8
As a first evaluation of the data we plot below the base peak chromatogram (BPC)
for each file in our experiment. We use the chromatogram
method and set the
to "max"
to return for each spectrum the maximal intensity
and hence create the BPC from the raw data. To create a total ion chromatogram
we could set aggregationFun
to sum
## Get the base peak chromatograms. This reads data from the files.
bpis <- chromatogram(raw_data, aggregationFun = "max")
## Define colors for the two groups
group_colors <- paste0(brewer.pal(3, "Set1")[1:2], "60")
names(group_colors) <- c("KO", "WT")
## Plot all chromatograms.
plot(bpis, col = group_colors[raw_data$sample_group])