`MetNet`

, a novel `R`

package, that is compatible with the output of the `xcms`

/`CAMERA`

suite and that uses the data-rich output of mass spectrometry metabolomics to putatively link features on their relation to other features in the data set. `MetNet`

uses both structural and quantitative information of metabolomics data for network inference that will guide metabolite annotation.
MetNet 1.16.0

Among the main challenges in mass spectrometric metabolomic analysis is the
high-throughput analysis of metabolic features, their fast detection and
annotation.
By contrast to the screening of known, previously characterized,
metabolic features in these data, the putative annotation of unknown
features is often cumbersome and requires a lot of manual work, hindering
the biological information retrieval of these data.
High-resolution mass spectrometric data is often very rich in information
content and metabolic conversions, and reactions can be derived from structural
properties of features (Breitling et al. 2006).
In addition to that, statistical associations between
features (based on their intensity values) can be a valuable resource to find
co-synthesized or co-regulated metabolites, which are synthesized in the same
biosynthetic pathways. Given that an analysis tool within the `R`

framework
is still lacking that is
integrating the two features of mass spectrometric information commonly
acquired with mass spectrometers (m/z and intensity values), I developed
`MetNet`

to close this gap.
The `MetNet`

package comprises functionalities to infer network
topologies from high-resolution mass spectrometry data. `MetNet`

combines information from both structural data (differences in m/z values
of features) and statistical associations (intensity values of features per
sample) to propose putative metabolic networks that can be used for further
exploration.

The idea of using high-resolution mass spectrometry data for network construction was first proposed in Breitling et al. (2006) and followed soon afterwards by a Cytoscape plugin, MetaNetter (Jourdan et al. 2007), that is based on the inference of metabolic networks on molecular weight differences and correlation (Pearson correlation and partial correlation).

Inspired by the paper of Marbach et al. (2012) different algorithms for network
were implemented in `MetNet`

to account for
biases that are inherent in these statistical methods, followed by the
calculation of a consensus adjacency matrix using the differently computed
individual adjacency matrices.

The two main functionalities of the package include the creation of adjacency matrices from structural properties, based on losses/addition of functional groups defined by the user, and statistical associations. Currently, the following statistical models are implemented to infer a statistical adjacency matrix: Least absolute shrinkage and selection operator (LASSO, L1-norm regression, (Tibshirani 1994)), Random Forest (Breiman 2001), Pearson and Spearman correlation (including partial and semipartial correlation, see Steuer (2006) for a discussion on correlation-based metabolic networks), correlation based on Gaussian Graphical Models (GGM, see Krumsiek et al. (2011);Benedetti et al. (2020) for the advantages of using GGM instead of Pearson and partial pearson correlation), context likelihood of relatedness (CLR, (Faith et al. 2007)), the algorithm for the reconstruction of accurate cellular networks (ARACNE, (Margolin et al. 2006)) and constraint-based structure learning (Bayes, (Scutari 2010)). Since all of these methods have advantages and disadvantages, the user has the possibility to select several of these methods, compute adjacency matrices from these models and create a consensus matrix from the different statistical frameworks.

After creating the statistical and structural adjacency matrices these two matrices can be combined to form a consensus matrix that has information from both structural and statistical properties of the data. This can be followed by network analyses (e.g.Â calculation of topological parameters), integration with other data sources (e.g.Â genomic information or transcriptomic data) and/or visualization.

Central to `MetNet`

is the `AdjacencyMatrix`

class, derived from the
`SummarizedExperiment`

S4 class. The `AdjacencyMatrix`

host the adjacency
matrices creates during the different steps within the `assays`

slot. They
will furthermore store information on the `type`

of the `AdjacencyMatrix`

,
i.e.Â if it was derived from `structural`

or `statistical`

properties or if
it used the combined information from these layers (`combine`

). It also
stores information if the information was `thresholded`

, e.g.Â by
applying the `rtCorrection`

or `threshold`

function. Furthermore, the
`AdjacencyMatrix`

object stores information on if the graphs are directed
or undirected (within the `directed`

slot).

`MetNet`

is currently under active development. If you
discover any bugs, typos or develop ideas of improving
`MetNet`

feel free to raise an issue via
Github or
send a mail to the developer.

To install `MetNet`

enter the following to the `R`

console

```
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("MetNet")
```

Before starting with the analysis, load the `MetNet`

package. This
will also load the required packages `glmnet`

, `stabs`

, `GENIE3`

, `mpmi`

,
`parmigene`

, `Hmisc`

, `ppcor`

and `bnlearn`

that are needed
for functions in the statistical adjacency matrix inference.

`library(MetNet)`

The data format that is compatible with the `MetNet`

framework is
a `xcms`

/`CAMERA`

output-like \(m~\times~n\) matrix, where
columns denote the different samples \(n\) and where \(m\) features are present.
In such a matrix, information about the masses of the features and quantitative
information of the features (intensity or concentration values) are needed.
The information about the m/z values has to be stored in a vector of
length \(\vert m \vert\) in the column `"mz"`

.

`MetNet`

does not impose any requirements for
data normalization, filtering, etc. However, the user has to make sure that
the data is properly preprocessed. These include division by internal standard,
`log2`

or `vsn`

transformation, noise filtering, removal of features that do not
represent mass features/metabolites, removal of isotopes, etc.

We will load here the object `x_test`

that contains m/z values
(in the column `"mz"`

), together with the corresponding retention time
(in the column `"rt"`

) and intensity values. We will use here the object
`x_test`

for guidance through the workflow of `MetNet`

.

```
data("x_test", package = "MetNet")
x_test <- as.matrix(x_test)
```

The function `structural`

will create an `AdjacencyMatrix`

object of
`type`

`structural`

containing the adjacency
matrices based on structural properties (m/z values) of the features.

The function expects a matrix with a column `"mz"`

that contains the
mass information of a feature (typically the m/z value). Furthermore,
`structural`

takes a `data.frame`

object as argument `transformation`

with the `colnames`

`"mass"`

and additional columns (e.g.Â `"group"`

, `"formula"`

or `"rt"`

).
`structural`

looks for transformations (in the
sense of additions/losses of functional groups mediated by biochemical,
enzymatic reactions) in the data using the mass information.

Following the work of Breitling et al. (2006) and Jourdan et al. (2007),
molecular weight difference w_{X} is defined by
\(w_X = \vert w_A - w_B \vert\)

where w_{A} is the molecular weight
of substrate A, and w_{B} is the molecular weight of product B
(typically, m/z values will be used as a proxy for the molecular weight since
the molecular weight is not directly derivable from mass spectrometric data).
As exemplified in Jourdan et al. (2007), specific enzymatic reactions refer to
specific changes in the molecular weight, e.g.Â carboxylation reactions
will result in a mass difference of 43.98983 (molecular weight of CO_{2})
between metabolic features.

The search space for these transformation is adjustable by the
`transformation`

argument in
`structural`

allowing to look for specific
enzymatic transformations. Hereby,
`structural`

will take into account the
`ppm`

value, to adjust for inaccuracies in m/z values due to technical
reasons according to the formula

\[ppm = \frac{m_{exp} - m_{calc}}{m_{exp}} \cdot 10^{-6}\]

with m_{exp} the experimentally determined m/z value and m_{calc} the
calculated accurate mass of a molecule. Within the function, a lower and upper
range is calculated depending on the supplied `ppm`

value, differences
between the m/z feature values are calculated and matched against the
`"mass"`

es of the `transformation`

argument. If any
of the additions/losses defined in `transformation`

is found in the
data, it will be reported as an (unweighted) connection in the assay
`"binary"`

of the returned `AdjacencyMatrix`

object.

Together with this assay, additional `character`

adjacency matrices can be
written to the assay slot of the `AdjacencyMatri`

object.
E.g. we can write the type of
connection/transformation (derived e.g.Â from the column `"group"`

in the
`transformation`

object) as a character matrix to the
assay `"group"`

by setting `var = "group"`

.

Before creating the `structural`

`AdjacencyMatrix`

, one must define the
search space, i.e.Â the transformation that will be looked for in the mass spectrometric
data, by creating here the `transformations`

object.

```
## define the search space for biochemical transformation
transformations <- rbind(
c("Hydroxylation (-H)", "O", 15.9949146221, "-"),
c("Malonyl group (-H2O)", "C3H2O3", 86.0003939305, "+"),
c("D-ribose (-H2O) (ribosylation)", "C5H8O4", 132.0422587452, "-"),
c("C6H10O6", "C6H10O6", 178.0477380536, "-"),
c("Rhamnose (-H20)", "C6H10O4", 146.057910, "-"),
c("Monosaccharide (-H2O)", "C6H10O5", 162.0528234315, "-"),
c("Disaccharide (-H2O) #1", "C12H20O10", 324.105649, "-"),
c("Disaccharide (-H2O) #2", "C12H20O11", 340.1005614851, "-"),
c("Trisaccharide (-H2O)", "C18H30O15", 486.1584702945, "-"),
c("Glucuronic acid (-H2O)", "C6H8O6", 176.0320879894, "?"),
c("coumaroyl (-H2O)", "C9H6O2", 146.0367794368, "?"),
c("feruloyl (-H2O)", "C9H6O2OCH2", 176.0473441231, "?"),
c("sinapoyl (-H2O)", "C9H6O2OCH2OCH2", 206.0579088094, "?"),
c("putrescine to spermidine (+C3H7N)", "C3H7N", 57.0578492299, "?"))
## convert to data frame
transformations <- data.frame(
group = transformations[, 1],
formula = transformations[, 2],
mass = as.numeric(transformations[, 3]),
rt = transformations[, 4])
```

The function `structural`

will then check for those
m/z differences that are stored in the column `"mass"`

in the
object `transformations`

. To create the `AdjacencyMatrix`

object derived
from these structural information we enter

```
struct_adj <- structural(x = x_test, transformation = transformations,
var = c("group", "formula", "mass"), ppm = 10)
```

in the `R`

console.

As we set `var = c("group", "formula", "mass")`

, the `AdjacencyMatrix`

object
will contain the assays `"group"`

, `"formula"`

, and `"mass"`

that store the
`character`

adjacency matrices with the information defined in
the columns of `transformations`

.

By default, the `structural`

`AdjacencyMatrix`

object and the contained
adjacency matrices are undirected (the
argument in `structural`

is set to `directed = FALSE`

by default; i.e.Â the
matrices are symmetric). `MetNet`

,
however, also allows to include the information on the directionality of
the transformation (e.g.Â to distinguish between additions and losses).
This behaviour can be specified by setting `directed = TRUE`

:

```
struct_adj_dir <- structural(x = x_test, transformation = transformations,
var = c("group", "formula", "mass"), ppm = 10, directed = TRUE)
```

In the following we will visualize the results from the undirected and directed structural network.

We will set the mode of the `igraph`

object
to `"directed"`

in both cases to make the distinction between the returned
outputs of `structural`

for setting `directed = FALSE`

and `directed = TRUE`

.
Alternatively, we could also set the `mode`

for the first `igraph`

object
(using the undirected output of `structural`

) to `"undirected"`

which results
in an `igraph`

object where the directionality of the edges is not retained.

```
g_undirected <- igraph::graph_from_adjacency_matrix(
assay(struct_adj, "binary"), mode = "directed", weighted = NULL)
plot(g_undirected, edge.width = 1, edge.arrow.size = 0.5,
vertex.label.cex = 0.5, edge.color = "grey")
```

```
g_directed <- igraph::graph_from_adjacency_matrix(
assay(struct_adj_dir, "binary"), mode = "directed", weighted = NULL)
plot(g_directed, edge.width = 1, edge.arrow.size = 0.5,
vertex.label.cex = 0.5, edge.color = "grey")
```