Contents

Compiled date: 2023-04-25

Last edited: 2022-08-01

License: GPL-3

1 Installation

Run the following code to install the Bioconductor version of package.

# install.packages("BiocManager")
BiocManager::install("POMA")

2 Load POMA

library(POMA)

You can also load some additional packages that will be very useful in this vignette.

library(ggplot2)
library(ggraph)
library(plotly)

3 The POMA Workflow

POMA functions can be divided in three sequential well separated blocks: Data Preparation, Pre-processing and Statistical Analysis.

3.1 Data Preparation

The SummarizedExperiment Bioconductor package provides a well defined computational data structures to represent omics experiment data types (Morgan et al. 2020). Since data structures can mean a marked improvement in data analysis, POMA functions use SummarizedExperiment objects from SummarizedExperiment package, allowing the reusability of existing methods for this class and contributing to the improvement of robust and reproducible workflows.

The first step of workflow will be load or create a SummarizedExperiment object. Often, you will have your data stored in separated matrix and/or data frames and you will want to create your SummarizedExperiment object. The PomaSummarizedExperiment function makes this step fast and easy building this SummarizedExperiment object for you.

# create an SummarizedExperiment object from two separated data frames
target <- readr::read_csv("your_target.csv")
features <- readr::read_csv("your_features.csv")

data <- PomaSummarizedExperiment(target = target, features = features)

Alternatively, if your data is already stored in a SummarizedExperiment object, you can skip this step and go directly to the pre-processing step. In this vignette we will use the sample data provided in POMA.

# load example data
data("st000336")
st000336
> class: SummarizedExperiment 
> dim: 31 57 
> metadata(0):
> assays(1): ''
> rownames(31): x1_methylhistidine x3_methylhistidine ... pyruvate
>   succinate
> rowData names(0):
> colnames(57): DMD004.1.U02 DMD005.1.U02 ... DMD167.5.U02 DMD173.1.U02
> colData names(2): group steroids

3.1.1 Brief Description of Example Data

This example data is composed of 57 samples, 31 metabolites, 1 covariate and 2 experimental groups (Controls and DMD) from a targeted LC/MS study.

Duchenne Muscular Dystrophy (DMD) is an X-linked recessive form of muscular dystrophy that affects males via a mutation in the gene for the muscle protein, dystrophin. Progression of the disease results in severe muscle loss, ultimately leading to paralysis and death. Steroid therapy has been a commonly employed method for reducing the severity of symptoms. This study aims to quantify the urine levels of amino acids and organic acids in patients with DMD both with and without steroid treatment. Track the progression of DMD in patients who have provided multiple urine samples.

This data was collected from here.

3.2 Pre Processing

This is a critical point in the workflow because all final statistical results will depend on the decisions made here. Again, this block can be divided in 3 steps: Missing Value Imputation, Normalization and Outlier Detection.

3.2.1 Missing Value Imputation

Often, due to biological and technical reasons, some features can not be identified or quantified in some samples in MS (Armitage et al. 2015). POMA offers 7 different imputation methods to deal with this situation. Just run the following line of code to impute your missings!

imputed <- PomaImpute(st000336, ZerosAsNA = TRUE, RemoveNA = TRUE, cutoff = 20, method = "knn")
imputed
> class: SummarizedExperiment 
> dim: 30 57 
> metadata(0):
> assays(1): ''
> rownames(30): x1_methylhistidine x3_methylhistidine ... pyruvate
>   succinate
> rowData names(0):
> colnames(57): DMD004.1.U02 DMD005.1.U02 ... DMD167.5.U02 DMD173.1.U02
> colData names(2): group steroids

3.2.2 Normalization

The next step of this block is the data normalization. Often, some factors can introduce variability in some types of MS data having a critical influence on the final statistical results, making normalization a key step in the workflow (Berg et al. 2006). Again, POMA offers several methods to normalize the data by running just the following line of code:

normalized <- PomaNorm(imputed, method = "log_pareto")
normalized
> class: SummarizedExperiment 
> dim: 30 57 
> metadata(0):
> assays(1): ''
> rownames(30): x1_methylhistidine x3_methylhistidine ... pyruvate
>   succinate
> rowData names(0):
> colnames(57): DMD004.1.U02 DMD005.1.U02 ... DMD167.5.U02 DMD173.1.U02
> colData names(2): group steroids

3.2.2.1 Normalization effect

Sometimes, you will be interested in how the normalization process affect your data?

To answer this question, POMA offers two exploratory functions, PomaBoxplots and PomaDensity, that can help to understand the normalization process.

PomaBoxplots generates boxplots for all samples or features (depending on the group factor) of a SummarizedExperiment object. Here, we can compare objects before and after normalization step.

PomaBoxplots(imputed, group = "samples", 
             jitter = FALSE,
             legend_position = "none") +
  ggplot2::ggtitle("Not Normalized") # data before normalization

PomaBoxplots(normalized, 
             group = "samples", 
             jitter = FALSE,
             legend_position = "none") +
  ggplot2::ggtitle("Normalized") # data after normalization

On the other hand, PomaDensity shows the distribution of all features before and after the normalization process.

PomaDensity(imputed, 
            group = "features",
            legend_position = "none") +
  ggplot2::ggtitle("Not Normalized") # data before normalization

PomaDensity(normalized, 
            group = "features") +
  ggplot2::ggtitle("Normalized") # data after normalization