lipidr
implements a series of functions to facilitate inspection, analysis and visualization of targeted lipidomics datasets. lipidr
takes exported Skyline CSV as input, allowing for multiple methods to be analyzed together.
lipidr
represents Skyline files as SummarizedExperiment objects, which can easily be integrated with a wide variety of Bioconductor packages. Sample annotations, such as sample group or other clinical information can be loaded.
lipidr
generates various plots, such as PCA score plots and box plots, for quality control of samples and measured lipids. Normalization methods with and without internal standards are also supported.
Differential analysis can be performed using any of the loaded clinical variables, which can be readily visualized as volcano plots. A novel lipid set enrichment analysis (LSEA) is implemented to detect preferential enrichment of certain lipid classes, total chain lengths or unsaturation patterns. Plots for the visualization of enrichment results are also implemented.
This vignette provides a step by step guide for downstream analysis of targeted lipidomics data, exported from Skyline.
In R console, type:
if (!requireNamespace("BiocManager", quietly=TRUE))
install.packages("BiocManager")
BiocManager::install("lipidr")
In R console, type:
library(devtools)
install_github("ahmohamed/lipidr")
In this workflow, we will use serum lipidomics data from mice fed a normal or high-fat diet. Mice were fed a normal or high-fat diet (Diet
column) and had access to normal drinking water or drinking water containing the bile acid deoxycholic acid (BileAcid
column). Lipid peaks were integrated using Skyline and exported as CSV files.
Integrated peaks should be exported from each Skyline file through File => Export => Report. Selecting Transition Results
ensures that necessary information is exported from Skyline. Otherwise, you should ensure that peak Area
or Height
or a similar measure is exported. Regardless of the measure
you choose for intensity, you can use lipidr
workflow.
Replicates
should either be exported, or the Pivot Replicate Name
option must be used.
lipidr
can read multiple CSV files from different analysis methods together. Using our example dataset, three Skyline CSV files are used as input to read.skyline
.
datadir = system.file("extdata", package="lipidr")
filelist = list.files(datadir, "data.csv", full.names = TRUE) # all csv files
d = read_skyline(filelist)
print(d)
## class: LipidomicsExperiment
## dim: 279 58
## metadata(2): summarized dimnames
## assays(3): Retention Time Area Background
## rownames(279): 1 2 ... 278 279
## rowData names(26): filename Molecule ... total_cs Class
## colnames(58): S1A S2A ... TQC_11 TQC_12
## colData names(0):
Datasets are represented in R as SummarizedExperiment
s to facilitate integration other Bioconductor packages.
Sample annotation can be prepared in Excel and saved as CSV. The table should have at least two columns, first indicating sample names and other columns indicating clinical variables.
clinical_file = system.file("extdata", "clin.csv", package="lipidr")
d = add_sample_annotation(d, clinical_file)
colData(d)
## DataFrame with 58 rows and 3 columns
## group Diet BileAcid
## <character> <character> <character>
## S1A NormalDiet_water Normal water
## S2A NormalDiet_water Normal water
## S3A NormalDiet_water Normal water
## S4A NormalDiet_water Normal water
## S5A NormalDiet_water Normal water
## ... ... ... ...
## TQC_8 QC QC QC
## TQC_9 QC QC QC
## TQC_10 QC QC QC
## TQC_11 QC QC QC
## TQC_12 QC QC QC
It is helpful to imagine LipidomicsExperiment object as a table with lipid molecules as rows and samples as columns. We can subset this table by selecting specific rows and columns. The general syntax is d[rows, cols]
.
In the example below we select the first 10 transitions and 10 samples. We can check the rowData
and colData
.
d_subset = d[1:10, 1:10]
rowData(d_subset)
## DataFrame with 10 rows and 26 columns
## filename Molecule Precursor.Mz Precursor.Charge Product.Mz
## <character> <character> <numeric> <integer> <numeric>
## 1 A1_data.csv PE 32:0 692.5 1 551.5
## 2 A1_data.csv PE 32:1 690.5 1 549.5
## 3 A1_data.csv PE 32:2 688.5 1 547.5
## 4 A1_data.csv PE 34:1 718.5 1 577.5
## 5 A1_data.csv PE 34:1 NEG 716.5 1 196.0
## 6 A1_data.csv PE 34:2 716.5 1 575.5
## 7 A1_data.csv PE 34:3 714.5 1 573.5
## 8 A1_data.csv PE 36:0 748.6 1 607.6
## 9 A1_data.csv PE 36:1 746.6 1 605.6
## 10 A1_data.csv PE 36:1 NEG 744.6 1 196.0
## Product.Charge clean_name ambig not_matched istd class_stub
## <integer> <factor> <logical> <logical> <logical> <character>
## 1 1 PE 32:0 FALSE FALSE FALSE PE
## 2 1 PE 32:1 FALSE FALSE FALSE PE
## 3 1 PE 32:2 FALSE FALSE FALSE PE
## 4 1 PE 34:1 FALSE FALSE FALSE PE
## 5 1 PE 34:1 FALSE FALSE FALSE PE
## 6 1 PE 34:2 FALSE FALSE FALSE PE
## 7 1 PE 34:3 FALSE FALSE FALSE PE
## 8 1 PE 36:0 FALSE FALSE FALSE PE
## 9 1 PE 36:1 FALSE FALSE FALSE PE
## 10 1 PE 36:1 FALSE FALSE FALSE PE
## chain1 l_1 s_1 chain2 l_2 s_2 chain3
## <character> <integer> <integer> <character> <integer> <integer> <character>
## 1 32:0 32 0 NA NA
## 2 32:1 32 1 NA NA
## 3 32:2 32 2 NA NA
## 4 34:1 34 1 NA NA
## 5 34:1 34 1 NA NA
## 6 34:2 34 2 NA NA
## 7 34:3 34 3 NA NA
## 8 36:0 36 0 NA NA
## 9 36:1 36 1 NA NA
## 10 36:1 36 1 NA NA
## l_3 s_3 chain4 l_4 s_4 total_cl total_cs
## <logical> <logical> <character> <logical> <logical> <integer> <integer>
## 1 NA NA NA NA 32 0
## 2 NA NA NA NA 32 1
## 3 NA NA NA NA 32 2
## 4 NA NA NA NA 34 1
## 5 NA NA NA NA 34 1
## 6 NA NA NA NA 34 2
## 7 NA NA NA NA 34 3
## 8 NA NA NA NA 36 0
## 9 NA NA NA NA 36 1
## 10 NA NA NA NA 36 1
## Class
## <character>
## 1 PE
## 2 PE
## 3 PE
## 4 PE
## 5 PE
## 6 PE
## 7 PE
## 8 PE
## 9 PE
## 10 PE
colData(d)
## DataFrame with 58 rows and 3 columns
## group Diet BileAcid
## <character> <character> <character>
## S1A NormalDiet_water Normal water
## S2A NormalDiet_water Normal water
## S3A NormalDiet_water Normal water
## S4A NormalDiet_water Normal water
## S5A NormalDiet_water Normal water
## ... ... ... ...
## TQC_8 QC QC QC
## TQC_9 QC QC QC
## TQC_10 QC QC QC
## TQC_11 QC QC QC
## TQC_12 QC QC QC
We can also apply conditional selections (indexing). For example, we can select all quality control samples.
d_qc = d[, d$group == "QC"]
rowData(d_qc)
## DataFrame with 279 rows and 26 columns
## filename Molecule Precursor.Mz Precursor.Charge Product.Mz
## <character> <character> <numeric> <integer> <numeric>
## 1 A1_data.csv PE 32:0 692.5 1 551.5
## 2 A1_data.csv PE 32:1 690.5 1 549.5
## 3 A1_data.csv PE 32:2 688.5 1 547.5
## 4 A1_data.csv PE 34:1 718.5 1 577.5
## 5 A1_data.csv PE 34:1 NEG 716.5 1 196.0
## ... ... ... ... ... ...
## 275 F2_data.csv PC(P-40:3) 824.600 1 184.10
## 276 F2_data.csv PC(P-40:4) 822.600 1 184.10
## 277 F2_data.csv PC(P-40:5) 820.600 1 184.10
## 278 F2_data.csv PC(P-40:6) 818.600 1 184.10
## 279 F2_data.csv 15:0-18:1(d7) PC 753.615 1 184.07
## Product.Charge clean_name ambig not_matched istd class_stub
## <integer> <factor> <logical> <logical> <logical> <character>
## 1 1 PE 32:0 FALSE FALSE FALSE PE
## 2 1 PE 32:1 FALSE FALSE FALSE PE
## 3 1 PE 32:2 FALSE FALSE FALSE PE
## 4 1 PE 34:1 FALSE FALSE FALSE PE
## 5 1 PE 34:1 FALSE FALSE FALSE PE
## ... ... ... ... ... ... ...
## 275 1 PCP-40:3 FALSE FALSE FALSE PCP
## 276 1 PCP-40:4 FALSE FALSE FALSE PCP
## 277 1 PCP-40:5 FALSE FALSE FALSE PCP
## 278 1 PCP-40:6 FALSE FALSE FALSE PCP
## 279 1 PC 15:0-18:1(d7) FALSE FALSE TRUE PC
## chain1 l_1 s_1 chain2 l_2 s_2 chain3
## <character> <integer> <integer> <character> <integer> <integer> <character>
## 1 32:0 32 0 NA NA
## 2 32:1 32 1 NA NA
## 3 32:2 32 2 NA NA
## 4 34:1 34 1 NA NA
## 5 34:1 34 1 NA NA
## ... ... ... ... ... ... ... ...
## 275 40:3 40 3 NA NA
## 276 40:4 40 4 NA NA
## 277 40:5 40 5 NA NA
## 278 40:6 40 6 NA NA
## 279 15:0 15 0 18:1 18 1
## l_3 s_3 chain4 l_4 s_4 total_cl total_cs
## <logical> <logical> <character> <logical> <logical> <integer> <integer>
## 1 NA NA NA NA 32 0
## 2 NA NA NA NA 32 1
## 3 NA NA NA NA 32 2
## 4 NA NA NA NA 34 1
## 5 NA NA NA NA 34 1
## ... ... ... ... ... ... ... ...
## 275 NA NA NA NA 40 3
## 276 NA NA NA NA 40 4
## 277 NA NA NA NA 40 5
## 278 NA NA NA NA 40 6
## 279 NA NA NA NA 33 1
## Class
## <character>
## 1 PE
## 2 PE
## 3 PE
## 4 PE
## 5 PE
## ... ...
## 275 PC
## 276 PC
## 277 PC
## 278 PC
## 279 PC
colData(d_qc)
## DataFrame with 12 rows and 3 columns
## group Diet BileAcid
## <character> <character> <character>
## TQC_1 QC QC QC
## TQC_2 QC QC QC
## TQC_3 QC QC QC
## TQC_4 QC QC QC
## TQC_5 QC QC QC
## ... ... ... ...
## TQC_8 QC QC QC
## TQC_9 QC QC QC
## TQC_10 QC QC QC
## TQC_11 QC QC QC
## TQC_12 QC QC QC
Note that we leave rows index empty (d[,cols]
) to select all lipids. We can also subset based on lipid annotations, selecting a specific class for example.
pc_lipids = rowData(d)$Class %in% c("PC", "PCO", "PCP")
d_pc = d[pc_lipids,]
rowData(d_pc)
## DataFrame with 82 rows and 26 columns
## filename Molecule Precursor.Mz Precursor.Charge Product.Mz
## <character> <character> <numeric> <integer> <numeric>
## 160 F1_data.csv PC 30:0 706.5 1 184.1
## 161 F1_data.csv PC 30:1 704.5 1 184.1
## 162 F1_data.csv PC 30:2 702.5 1 184.1
## 163 F1_data.csv PC 32:0 734.6 1 184.1
## 164 F1_data.csv PC 32:1 732.6 1 184.1
## ... ... ... ... ... ...
## 275 F2_data.csv PC(P-40:3) 824.600 1 184.10
## 276 F2_data.csv PC(P-40:4) 822.600 1 184.10
## 277 F2_data.csv PC(P-40:5) 820.600 1 184.10
## 278 F2_data.csv PC(P-40:6) 818.600 1 184.10
## 279 F2_data.csv 15:0-18:1(d7) PC 753.615 1 184.07
## Product.Charge clean_name ambig not_matched istd class_stub
## <integer> <factor> <logical> <logical> <logical> <character>
## 160 1 PC 30:0 FALSE FALSE FALSE PC
## 161 1 PC 30:1 FALSE FALSE FALSE PC
## 162 1 PC 30:2 FALSE FALSE FALSE PC
## 163 1 PC 32:0 FALSE FALSE FALSE PC
## 164 1 PC 32:1 FALSE FALSE FALSE PC
## ... ... ... ... ... ... ...
## 275 1 PCP-40:3 FALSE FALSE FALSE PCP
## 276 1 PCP-40:4 FALSE FALSE FALSE PCP
## 277 1 PCP-40:5 FALSE FALSE FALSE PCP
## 278 1 PCP-40:6 FALSE FALSE FALSE PCP
## 279 1 PC 15:0-18:1(d7) FALSE FALSE TRUE PC
## chain1 l_1 s_1 chain2 l_2 s_2 chain3
## <character> <integer> <integer> <character> <integer> <integer> <character>
## 160 30:0 30 0 NA NA
## 161 30:1 30 1 NA NA
## 162 30:2 30 2 NA NA
## 163 32:0 32 0 NA NA
## 164 32:1 32 1 NA NA
## ... ... ... ... ... ... ... ...
## 275 40:3 40 3 NA NA
## 276 40:4 40 4 NA NA
## 277 40:5 40 5 NA NA
## 278 40:6 40 6 NA NA
## 279 15:0 15 0 18:1 18 1
## l_3 s_3 chain4 l_4 s_4 total_cl total_cs
## <logical> <logical> <character> <logical> <logical> <integer> <integer>
## 160 NA NA NA NA 30 0
## 161 NA NA NA NA 30 1
## 162 NA NA NA NA 30 2
## 163 NA NA NA NA 32 0
## 164 NA NA NA NA 32 1
## ... ... ... ... ... ... ... ...
## 275 NA NA NA NA 40 3
## 276 NA NA NA NA 40 4
## 277 NA NA NA NA 40 5
## 278 NA NA NA NA 40 6
## 279 NA NA NA NA 33 1
## Class
## <character>
## 160 PC
## 161 PC
## 162 PC
## 163 PC
## 164 PC
## ... ...
## 275 PC
## 276 PC
## 277 PC
## 278 PC
## 279 PC
colData(d_pc)
## DataFrame with 58 rows and 3 columns
## group Diet BileAcid
## <character> <character> <character>
## S1A NormalDiet_water Normal water
## S2A NormalDiet_water Normal water
## S3A NormalDiet_water Normal water
## S4A NormalDiet_water Normal water
## S5A NormalDiet_water Normal water
## ... ... ... ...
## TQC_8 QC QC QC
## TQC_9 QC QC QC
## TQC_10 QC QC QC
## TQC_11 QC QC QC
## TQC_12 QC QC QC
For demonstration purposes, we select only 3 lipids classes, Ceramides (Cer
), PhosphatidylCholines (PC
) and LysoPhosphatidylCholines (LPC
). We also BileAcid
treated group from our dataset.
lipid_classes = rowData(d)$Class %in% c("Cer", "PC", "LPC")
groups = d$BileAcid != "DCA"
d = d[lipid_classes, groups]
#QC sample subset
d_qc = d[, d$group == "QC"]
To ensure data quality, we can look at total lipid intensity as bar chart or distribution of samples as a boxplot.
plot_samples(d, type = "tic", log = TRUE)
We can also look at intensity and retention time distributions for each lipid molecule using plot_molecules(type = boxplot)
. It is recommended to assess the variation across quality control samples.
plot_molecules(d_qc, "sd", measure = "Retention Time", log = FALSE)